C99 Link Compatibility (was Re: D is too english centric) (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » C99 Link Compatibility (was Re: D is too english centric) (page 2)

May 28, 2003

C99 Link Compatibility (was Re: D is too english centric)

Posted by Mark Evans
in reply to Martin M. Pedersen

Mark Evans

Posted in reply to Martin M. Pedersen

I agree that D is too English-centric (even ASCII-centric).

Concern about C99 link compatibility leads me to reflect on C99's boolean type:

http://www.uic.edu/classes/mcs/mcs494/f01/transparencies/sec8.4.pdf

Mark

May 28, 2003

Re: D is too english centric

Posted by Walter
in reply to Martin M. Pedersen

Walter

Posted in reply to Martin M. Pedersen

"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:bb2l7u$s2i$1@digitaldaemon.com...
> So do we. Yet there are exceptions. If the customer pays us to develop and deliver source code, it is his requirements that counts, not our policy.

Yup. Listen to the customers, not the marketing department <g>.

May 28, 2003

Re: C99 Link Compatibility (was Re: D is too english centric)

Posted by Mark Evans
in reply to Mark Evans

Mark Evans

Posted in reply to Mark Evans

Actually I still think that link compatibility with Digital Mars C++ would be a huge win for D.  C++ also has a bool type.

Mark

May 29, 2003

Re: D is too english centric

Posted by Burton Radons
in reply to Walter

Burton Radons

Posted in reply to Walter

Walter wrote:

> "Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message
> news:bb2hou$oas$1@digitaldaemon.com...
> 
>>"Walter" <walter@digitalmars.com> wrote in message
>>news:bb1c8v$2e2l$1@digitaldaemon.com...
>>
>>>>I have noted that C99 allows *any* unicode character to be used in
>>>>identifiers using \u.
>>>
>>>No, only characters that fall into certain unicode ranges.
>>
>>I haven't found that, but I you are the export, so I believe you . It
> 
> makes
> 
>>sense too.
> 
> 
> "Each universal character name in an identifier shall designate a character
> whose encoding in ISO/IEC 10646 falls into one of the ranges specified in
> annex D." C99 6.4.2.1-3

This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier.  It allows weird obfuscations, yes, but why care about that?  I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it.  At worst it'd be one of those features that kids get into abusing before they smarten up.

C99's decision itself looks pretty bad.  I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance.

Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements.  If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that.  If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C).  There's no cause for following C99 exactly in the code itself.

May 29, 2003

Re: D is too english centric

Posted by Walter
in reply to Burton Radons

Walter

Posted in reply to Burton Radons

"Burton Radons" <loth@users.sourceforge.net> wrote in message news:bb3s9f$29qv$1@digitaldaemon.com...
> Walter wrote:
> > "Each universal character name in an identifier shall designate a
character
> > whose encoding in ISO/IEC 10646 falls into one of the ranges specified
in
> > annex D." C99 6.4.2.1-3
> This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier.  It allows weird obfuscations, yes, but why care about that?  I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it.  At worst it'd be one of those features that kids get into abusing before they smarten up.
>
> C99's decision itself looks pretty bad.  I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance.
>
> Whether this feature is implemented by any compilers and editors is
> certainly important to Martin's stated requirements.  If his clients
> can't read the code he's written, he hasn't fulfilled his contract.
> Much more successful would be to use an encoding like UTF-8 or one of
> the BOM'd encodings D supports; all programs developed for Finns will
> surely render that.  If it develops that C gets a link standard for
> UNICODE identifiers, then that can be emulated when mangling extern (C).
>   There's no cause for following C99 exactly in the code itself.

This is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace.

The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.

May 29, 2003

Re: D is too english centric

Posted by Martin M. Pedersen
in reply to Walter

Martin M. Pedersen

Posted in reply to Walter

"Walter" <walter@digitalmars.com> wrote in message news:bb1c8v$2e2l$1@digitaldaemon.com...
> > DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].
> > I find this unfortunate, and in contrast to the one of the main goals of
> D:
> > Link compability with C.
>
> It's a good idea to change it to match C for the reasons you state.

Another way of resolving this would be to give the programmer control of the external identifer. Something like this:

extern (C) {
      extern("foo\u4444") void foo() { bar(); }
      extern("bar\u4444") void bar();
}

That would also allow us to access mangled C++ identifiers, and identifiers containing '$'. It would not be easy, but that is not what I ask for. I only want it to be possible.

Regards,
Martin M. Pedersen

May 29, 2003

Re: D is too english centric

Posted by Bill Cox
in reply to Walter

Bill Cox

Posted in reply to Walter

I'll put in a vote for UTF-8 support.  It seems to have the best chance of getting support from Linux IDEs and debuggers.

Bill

Walter wrote:
> "Burton Radons" <loth@users.sourceforge.net> wrote in message
> news:bb3s9f$29qv$1@digitaldaemon.com...
> 
>>Walter wrote:
>>
>>>"Each universal character name in an identifier shall designate a
>>
> character
> 
>>>whose encoding in ISO/IEC 10646 falls into one of the ranges specified
>>
> in
> 
>>>annex D." C99 6.4.2.1-3
>>
>>This could be more easily done by encoding into UTF-8 and assuming any
>>byte with the eighth bit set is an identifier.  It allows weird
>>obfuscations, yes, but why care about that?  I won't write code that
>>uses one of UNICODE's whitespace characters, and anyone whose code would
>>be worth use by me would also not abuse it.  At worst it'd be one of
>>those features that kids get into abusing before they smarten up.
>>
>>C99's decision itself looks pretty bad.  I'd use \u escapes for codes
>>which I don't WANT rendered because either they have no rendering
>>(whitespaces), because they would screw up rendering (controls), don't
>>have a rendering in my code-writing font, or have special numeric
>>significance.
>>
>>Whether this feature is implemented by any compilers and editors is
>>certainly important to Martin's stated requirements.  If his clients
>>can't read the code he's written, he hasn't fulfilled his contract.
>>Much more successful would be to use an encoding like UTF-8 or one of
>>the BOM'd encodings D supports; all programs developed for Finns will
>>surely render that.  If it develops that C gets a link standard for
>>UNICODE identifiers, then that can be emulated when mangling extern (C).
>>  There's no cause for following C99 exactly in the code itself.
> 
> 
> This is C's third attempt at internationalizing C source code. In 15 years I
> have yet to see any C source outside of a test suite that used trigraphs or
> digraphs. I'm skeptical the \u scheme will catch on, either. I think the
> best way is to simply declare that the source text is UTF-8, UTF-16, or
> UTF-32. D already recognizes and automatically handles all three. Then, it
> is simply a matter of deciding which unicode characters to allow as
> identifiers and whitespace.
> 
> The advantage of that is you can edit the source in any text editor that
> supports unicode if you want to use more than ascii. There is no need for
> any special editors that recognize trigraphs, digraphs, or on-the-fly \u
> translation.
> 
>

May 29, 2003

Re: D is too english centric

Posted by Benji Smith
in reply to Bill Cox

Benji Smith

Posted in reply to Bill Cox

I agree. Source should be UTF-8.

--Benji


In article <3ED5FFE7.3040100@viasic.com>, Bill Cox says...
>
>I'll put in a vote for UTF-8 support.  It seems to have the best chance of getting support from Linux IDEs and debuggers.
>
>Bill
>
>Walter wrote:
>> "Burton Radons" <loth@users.sourceforge.net> wrote in message news:bb3s9f$29qv$1@digitaldaemon.com...
>> 
>>>Walter wrote:
>>>
>>>>"Each universal character name in an identifier shall designate a
>>>
>> character
>> 
>>>>whose encoding in ISO/IEC 10646 falls into one of the ranges specified
>>>
>> in
>> 
>>>>annex D." C99 6.4.2.1-3
>>>
>>>This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier.  It allows weird obfuscations, yes, but why care about that?  I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it.  At worst it'd be one of those features that kids get into abusing before they smarten up.
>>>
>>>C99's decision itself looks pretty bad.  I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance.
>>>
>>>Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements.  If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that.  If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C).
>>>  There's no cause for following C99 exactly in the code itself.
>> 
>> 
>> This is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace.
>> 
>> The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.
>> 
>> 
>

May 30, 2003

Re: D is too english centric

Posted by Mark T
in reply to Martin M. Pedersen

Mark T

Posted in reply to Martin M. Pedersen

>
>I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].
>

I don't think there is a full implementation of C99 yet. It was adopted in late 1999.  Maybe some of this stuff will disappear due to lack of use. Did ISO sack the trigraph crap from C89/C90?

May 30, 2003

Re: D is too english centric

Posted by Martin M. Pedersen
in reply to Mark T

Martin M. Pedersen

Posted in reply to Mark T

"Mark T" <Mark_member@pathlink.com> wrote in message news:bb6710$1v5d$1@digitaldaemon.com...
> I don't think there is a full implementation of C99 yet. It was adopted in
late
> 1999.  Maybe some of this stuff will disappear due to lack of use. Did ISO
sack
> the trigraph crap from C89/C90?

No, trigraphs are still there.


Regards,
Martin M. Pedersen

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation