Thread overview
OS X bug: universal alpha indentifiers
Jan 27, 2005
Thomas Kuehne
Jan 27, 2005
Thomas Kuehne
January 25, 2005
Programs with non-ascii identifiers do not
link, under Mac OS X 10.3 using GDC 0.10...


They use the mangled name as a label for
the assembler, and then choke on the UTF-8:

unialpha.d:
> void anders() {}
> void björklund() {}
> 
> void main()
> {
>   anders(); björklund();
> }

Gives the error:
> /var/tmp//ccW4WWE0.s:44:Invalid mnemonic '?rklundFZv'

Here's the disasm:
> 	bl __D8unialpha10björklundFZv


Similar errors for variables:

unialpha2.d:
> int anders;
> int björklund;
> 
> void main()
> {
>   anders = 1; björklund = 2;
> }

gdc:
> /var/tmp//cc7ur4wd.s:31:Parameter syntax error (parameter 3)
> /var/tmp//cc7ur4wd.s:31:Invalid mnemonic '?rklundi-L1$pb)'
> /var/tmp//cc7ur4wd.s:32:Parameter error: expression must be absolute (parameter 2)
> /var/tmp//cc7ur4wd.s:32:Invalid mnemonic '?rklundi-L1$pb)(r9)'

asm:
> 	addis r9,r31,ha16(__D9unialpha210björklundi-L1$pb)
> 	la r9,lo16(__D9unialpha210björklundi-L1$pb)(r9)


Not sure how this can be fixed, without
changing the way that D mangles the names...

Both programs compile just fine on Linux.

--anders


PS: Assembler is:
> Apple Computer, Inc. version cctools-525.obj~1, GNU assembler version 1.38
http://www.opensource.apple.com/darwinsource/DevToolsAug2004/cctools-525/
January 27, 2005
Added to DStress as http://dstress.kuehne.cn/run/unicode_03.d http://dstress.kuehne.cn/run/unicode_04.d http://dstress.kuehne.cn/run/unicode_05.d http://dstress.kuehne.cn/run/unicode_06.d http://dstress.kuehne.cn/run/unicode_07.d

Thomas

Anders F Björklund schrieb in news:ct428n$2qoe$1@digitaldaemon.com :
> Programs with non-ascii identifiers do not
> link, under Mac OS X 10.3 using GDC 0.10...
>
>
> They use the mangled name as a label for
> the assembler, and then choke on the UTF-8:
>
> unialpha.d:
> > void anders() {}
> > void björklund() {}
> >
> > void main()
> > {
> >   anders(); björklund();
> > }
>
> Gives the error:
> > /var/tmp//ccW4WWE0.s:44:Invalid mnemonic '?rklundFZv'
>
> Here's the disasm:
> > bl __D8unialpha10bj??rklundFZv
>
>
> Similar errors for variables:
>
> unialpha2.d:
> > int anders;
> > int björklund;
> >
> > void main()
> > {
> >   anders = 1; björklund = 2;
> > }
>
> gdc:
> > /var/tmp//cc7ur4wd.s:31:Parameter syntax error (parameter 3)
> > /var/tmp//cc7ur4wd.s:31:Invalid mnemonic '?rklundi-L1$pb)'
> > /var/tmp//cc7ur4wd.s:32:Parameter error: expression must be absolute (parameter 2)
> > /var/tmp//cc7ur4wd.s:32:Invalid mnemonic '?rklundi-L1$pb)(r9)'
>
> asm:
> > addis r9,r31,ha16(__D9unialpha210bj??rklundi-L1$pb)
> > la r9,lo16(__D9unialpha210bj??rklundi-L1$pb)(r9)
>
>
> Not sure how this can be fixed, without
> changing the way that D mangles the names...
>
> Both programs compile just fine on Linux.
>
> --anders
>
>
> PS: Assembler is:
> > Apple Computer, Inc. version cctools-525.obj~1, GNU assembler version 1.38
> http://www.opensource.apple.com/darwinsource/DevToolsAug2004/cctools-525/


January 27, 2005
Thomas Kuehne wrote:

> Added to DStress as
> http://dstress.kuehne.cn/run/unicode_03.d
> http://dstress.kuehne.cn/run/unicode_04.d
> http://dstress.kuehne.cn/run/unicode_05.d
> http://dstress.kuehne.cn/run/unicode_06.d
> http://dstress.kuehne.cn/run/unicode_07.d

If you are feeling like testing or something,
here are the rest of the Universal Alphas :

http://www.algonet.se/~afb/d/universalalphas/


> Identifiers start with a letter, _, or unicode alpha, and are followed
> by any number of letters, _, digits, or universal alphas. Universal
> alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
> C99 Standard.)

http://www.digitalmars.com/d/lex.html#identifier

I think Walter officially acknowledged the phrase
"unicode alpha" as just a typo for universal...
(the meaning is that it can't start with a digit)

--anders
January 27, 2005
Anders F Björklund schrieb in news:ctairr$1ngb$1@digitaldaemon.com :
> If you are feeling like testing or something,
> here are the rest of the Universal Alphas :
>
> http://www.algonet.se/~afb/d/universalalphas/

I've been only testing the name mangling, thus it shouldn't be important what scripts I check.

> > Identifiers start with a letter, _, or unicode alpha, and are followed by any number of letters, _, digits, or universal alphas. Universal alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the C99 Standard.)
>
> http://www.digitalmars.com/d/lex.html#identifier
>
> I think Walter officially acknowledged the phrase
> "unicode alpha" as just a typo for universal...
> (the meaning is that it can't start with a digit)

This shoulde be clarified. I suppose that "digits" are only "0123456789" - there are loads of other digits in Unicode.

Why is an ancient (1999) version used in the documentation? I've tried codepoints that are assigned in the current standard bu
weren't in the 1999 one, and as you might have guessed even currently reserved codepoints weren't caught by the frontent...

Thomas


January 27, 2005
Thomas Kuehne wrote:

>>>Identifiers start with a letter, _, or unicode alpha, and are followed
>>>by any number of letters, _, digits, or universal alphas. Universal
>>>alphas are as defined in ISO/IEC 9899:1999(E) Appendix D. (This is the
>>>C99 Standard.)

> This shoulde be clarified. I suppose that "digits" are only "0123456789"
> - there are loads of other digits in Unicode.

Yes, the quoted C99 standard (which isn't all that "ancient") used:
> Digits: 0660-0669, 06F0-06F9, 0966-096F, 09E6-09EF, 0A66-0A6F,
> 0AE6-0AEF, 0B66-0B6F, 0BE7-0BEF, 0C66-0C6F, 0CE6-0CEF, 0D66-0D6F,
> 0E50-0E59, 0ED0-0ED9, 0F20-0F33

But I'm also thinking that a "digit" here meant [0-9]...
I think a "letter" to Walter is just [a-zA-Z], as well ?

And I agree, it would be a lot easier to just say that.

--anders