| Thread overview | ||||||||
|---|---|---|---|---|---|---|---|---|
|
September 12, 2014 Is º an unicode alphabetic character? | ||||
|---|---|---|---|---|
| ||||
what's an unicode alphabetic character? I misunderstood isAlpha(), I used to think it's to validate letters like a, b, è, é .. z etc but isAlpha('º') from std.uni module return true. How can I validate only the letters of an unicode alphabet in D or should I write one?
I know I can do:
bool is_id(dchar c)
{
return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'z' || c >= 0xc0;
}
but I'm looking for a native, if any
| ||||
September 12, 2014 Re: Is º an unicode alphabetic character? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to AsmMan | On 09/11/2014 08:04 PM, AsmMan wrote: > what's an unicode alphabetic character? Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic, all of which are explained here: http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values > I misunderstood isAlpha(), I > used to think it's to validate letters like a, b, è, é .. z etc but > isAlpha('º') from std.uni module return true. º happens to be in the "Letter, Lowercase" category so yes, it is isAlpha(). > How can I validate only > the letters of an unicode alphabet in D or should I write one? There are so many alphabets in the world. It is likely that a Unicode character will be a part of one. > I know I can do: > > bool is_id(dchar c) > { > return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'z' || c >= 0xc0; > } There is a misunderstanding. There are so many Unicode characters that are >= 0xc0 but not a part of the Alphabetic category. For example: ← (U+2190 LEFTWARDS ARROW). Ali | |||
September 12, 2014 Re: Is º an unicode alphabetic character? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Friday, 12 September 2014 at 04:04:22 UTC, Ali Çehreli wrote:
> On 09/11/2014 08:04 PM, AsmMan wrote:
>
> > what's an unicode alphabetic character?
>
> Alphabetic is defined as Lu + Ll + Lt + Lm + Lo + Nl + Other_Alphabetic, all of which are explained here:
>
> http://www.unicode.org/Public/5.1.0/ucd/UCD.html#General_Category_Values
>
> > I misunderstood isAlpha(), I
> > used to think it's to validate letters like a, b, è, é .. z
> etc but
> > isAlpha('º') from std.uni module return true.
>
> º happens to be in the "Letter, Lowercase" category so yes, it is isAlpha().
>
> > How can I validate only
> > the letters of an unicode alphabet in D or should I write one?
>
> There are so many alphabets in the world. It is likely that a Unicode character will be a part of one.
>
> > I know I can do:
> >
> > bool is_id(dchar c)
> > {
> > return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'z' || c
> >= 0xc0;
> > }
>
> There is a misunderstanding. There are so many Unicode characters that are >= 0xc0 but not a part of the Alphabetic category. For example: ← (U+2190 LEFTWARDS ARROW).
>
> Ali
If I want ASCII and latin only alphabet which range should I use?
ie, how should I rewrite is_id() function?
| |||
September 12, 2014 Re: Is º an unicode alphabetic character? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to AsmMan | On 09/11/2014 11:38 PM, AsmMan wrote:
> If I want ASCII and latin only alphabet which range should I use?
> ie, how should I rewrite is_id() function?
This seems to be it:
import std.stdio;
import std.uni;
void main()
{
alias latin = unicode.script.latin;
assert('ç' in latin);
assert('7' !in latin);
writeln(latin);
}
Ali
| |||
September 12, 2014 Re: Is º an unicode alphabetic character? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Friday, 12 September 2014 at 07:57:43 UTC, Ali Çehreli wrote: > On 09/11/2014 11:38 PM, AsmMan wrote: > > > If I want ASCII and latin only alphabet which range should I > use? > > ie, how should I rewrite is_id() function? > > This seems to be it: > > import std.stdio; > import std.uni; > > void main() > { > alias latin = unicode.script.latin; > assert('ç' in latin); > assert('7' !in latin); > > writeln(latin); > } > > Ali Sorry, I shouldn't asked for latin but an alphabet like French instead of: http://www.importanceoflanguages.com/Images/French/FrenchAlphabet.jpg (including the diacritics, of course) As you mentioned, º happend to be a letter so it still pass in: assert('º' in latin); so isn't different from isAlpha(). Is the UTF-8 table organized so that I can use a range (like we do for ASCII ch >= 'a' && ch <= 'z' || ch >= 'A' && ch <= 'Z') or should I put these alpha characters myself on table and then do look up? | |||
September 12, 2014 Re: Is º an unicode alphabetic character? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to AsmMan | Thanks Ali, I think I get close:
bool is_id(dchar c)
{
return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= 0xc0 && c <= 0x0d || c >= 0xd8 && c <= 0xf6 || c >= 0xf8 && c <= 0xff;
}
this doesn't include some math symbols. like c >= 0xc0 did.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply