May 28, 2003
I agree that D is too English-centric (even ASCII-centric).

Concern about C99 link compatibility leads me to reflect on C99's boolean type:

http://www.uic.edu/classes/mcs/mcs494/f01/transparencies/sec8.4.pdf

Mark


May 28, 2003
"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:bb2l7u$s2i$1@digitaldaemon.com...
> So do we. Yet there are exceptions. If the customer pays us to develop and deliver source code, it is his requirements that counts, not our policy.

Yup. Listen to the customers, not the marketing department <g>.


May 28, 2003
Actually I still think that link compatibility with Digital Mars C++ would be a huge win for D.  C++ also has a bool type.

Mark


May 29, 2003
Walter wrote:

> "Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message
> news:bb2hou$oas$1@digitaldaemon.com...
> 
>>"Walter" <walter@digitalmars.com> wrote in message
>>news:bb1c8v$2e2l$1@digitaldaemon.com...
>>
>>>>I have noted that C99 allows *any* unicode character to be used in
>>>>identifiers using \u.
>>>
>>>No, only characters that fall into certain unicode ranges.
>>
>>I haven't found that, but I you are the export, so I believe you . It
> 
> makes
> 
>>sense too.
> 
> 
> "Each universal character name in an identifier shall designate a character
> whose encoding in ISO/IEC 10646 falls into one of the ranges specified in
> annex D." C99 6.4.2.1-3

This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier.  It allows weird obfuscations, yes, but why care about that?  I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it.  At worst it'd be one of those features that kids get into abusing before they smarten up.

C99's decision itself looks pretty bad.  I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance.

Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements.  If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that.  If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C).  There's no cause for following C99 exactly in the code itself.

May 29, 2003
"Burton Radons" <loth@users.sourceforge.net> wrote in message news:bb3s9f$29qv$1@digitaldaemon.com...
> Walter wrote:
> > "Each universal character name in an identifier shall designate a
character
> > whose encoding in ISO/IEC 10646 falls into one of the ranges specified
in
> > annex D." C99 6.4.2.1-3
> This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier.  It allows weird obfuscations, yes, but why care about that?  I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it.  At worst it'd be one of those features that kids get into abusing before they smarten up.
>
> C99's decision itself looks pretty bad.  I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance.
>
> Whether this feature is implemented by any compilers and editors is
> certainly important to Martin's stated requirements.  If his clients
> can't read the code he's written, he hasn't fulfilled his contract.
> Much more successful would be to use an encoding like UTF-8 or one of
> the BOM'd encodings D supports; all programs developed for Finns will
> surely render that.  If it develops that C gets a link standard for
> UNICODE identifiers, then that can be emulated when mangling extern (C).
>   There's no cause for following C99 exactly in the code itself.

This is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace.

The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.


May 29, 2003
"Walter" <walter@digitalmars.com> wrote in message news:bb1c8v$2e2l$1@digitaldaemon.com...
> > DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].
> > I find this unfortunate, and in contrast to the one of the main goals of
> D:
> > Link compability with C.
>
> It's a good idea to change it to match C for the reasons you state.

Another way of resolving this would be to give the programmer control of the external identifer. Something like this:

extern (C) {
      extern("foo\u4444") void foo() { bar(); }
      extern("bar\u4444") void bar();
}

That would also allow us to access mangled C++ identifiers, and identifiers containing '$'. It would not be easy, but that is not what I ask for. I only want it to be possible.

Regards,
Martin M. Pedersen


May 29, 2003
I'll put in a vote for UTF-8 support.  It seems to have the best chance of getting support from Linux IDEs and debuggers.

Bill

Walter wrote:
> "Burton Radons" <loth@users.sourceforge.net> wrote in message
> news:bb3s9f$29qv$1@digitaldaemon.com...
> 
>>Walter wrote:
>>
>>>"Each universal character name in an identifier shall designate a
>>
> character
> 
>>>whose encoding in ISO/IEC 10646 falls into one of the ranges specified
>>
> in
> 
>>>annex D." C99 6.4.2.1-3
>>
>>This could be more easily done by encoding into UTF-8 and assuming any
>>byte with the eighth bit set is an identifier.  It allows weird
>>obfuscations, yes, but why care about that?  I won't write code that
>>uses one of UNICODE's whitespace characters, and anyone whose code would
>>be worth use by me would also not abuse it.  At worst it'd be one of
>>those features that kids get into abusing before they smarten up.
>>
>>C99's decision itself looks pretty bad.  I'd use \u escapes for codes
>>which I don't WANT rendered because either they have no rendering
>>(whitespaces), because they would screw up rendering (controls), don't
>>have a rendering in my code-writing font, or have special numeric
>>significance.
>>
>>Whether this feature is implemented by any compilers and editors is
>>certainly important to Martin's stated requirements.  If his clients
>>can't read the code he's written, he hasn't fulfilled his contract.
>>Much more successful would be to use an encoding like UTF-8 or one of
>>the BOM'd encodings D supports; all programs developed for Finns will
>>surely render that.  If it develops that C gets a link standard for
>>UNICODE identifiers, then that can be emulated when mangling extern (C).
>>  There's no cause for following C99 exactly in the code itself.
> 
> 
> This is C's third attempt at internationalizing C source code. In 15 years I
> have yet to see any C source outside of a test suite that used trigraphs or
> digraphs. I'm skeptical the \u scheme will catch on, either. I think the
> best way is to simply declare that the source text is UTF-8, UTF-16, or
> UTF-32. D already recognizes and automatically handles all three. Then, it
> is simply a matter of deciding which unicode characters to allow as
> identifiers and whitespace.
> 
> The advantage of that is you can edit the source in any text editor that
> supports unicode if you want to use more than ascii. There is no need for
> any special editors that recognize trigraphs, digraphs, or on-the-fly \u
> translation.
> 
> 

May 29, 2003
I agree. Source should be UTF-8.

--Benji


In article <3ED5FFE7.3040100@viasic.com>, Bill Cox says...
>
>I'll put in a vote for UTF-8 support.  It seems to have the best chance of getting support from Linux IDEs and debuggers.
>
>Bill
>
>Walter wrote:
>> "Burton Radons" <loth@users.sourceforge.net> wrote in message news:bb3s9f$29qv$1@digitaldaemon.com...
>> 
>>>Walter wrote:
>>>
>>>>"Each universal character name in an identifier shall designate a
>>>
>> character
>> 
>>>>whose encoding in ISO/IEC 10646 falls into one of the ranges specified
>>>
>> in
>> 
>>>>annex D." C99 6.4.2.1-3
>>>
>>>This could be more easily done by encoding into UTF-8 and assuming any byte with the eighth bit set is an identifier.  It allows weird obfuscations, yes, but why care about that?  I won't write code that uses one of UNICODE's whitespace characters, and anyone whose code would be worth use by me would also not abuse it.  At worst it'd be one of those features that kids get into abusing before they smarten up.
>>>
>>>C99's decision itself looks pretty bad.  I'd use \u escapes for codes which I don't WANT rendered because either they have no rendering (whitespaces), because they would screw up rendering (controls), don't have a rendering in my code-writing font, or have special numeric significance.
>>>
>>>Whether this feature is implemented by any compilers and editors is certainly important to Martin's stated requirements.  If his clients can't read the code he's written, he hasn't fulfilled his contract. Much more successful would be to use an encoding like UTF-8 or one of the BOM'd encodings D supports; all programs developed for Finns will surely render that.  If it develops that C gets a link standard for UNICODE identifiers, then that can be emulated when mangling extern (C).
>>>  There's no cause for following C99 exactly in the code itself.
>> 
>> 
>> This is C's third attempt at internationalizing C source code. In 15 years I have yet to see any C source outside of a test suite that used trigraphs or digraphs. I'm skeptical the \u scheme will catch on, either. I think the best way is to simply declare that the source text is UTF-8, UTF-16, or UTF-32. D already recognizes and automatically handles all three. Then, it is simply a matter of deciding which unicode characters to allow as identifiers and whitespace.
>> 
>> The advantage of that is you can edit the source in any text editor that supports unicode if you want to use more than ascii. There is no need for any special editors that recognize trigraphs, digraphs, or on-the-fly \u translation.
>> 
>> 
>


May 30, 2003
>
>I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].
>

I don't think there is a full implementation of C99 yet. It was adopted in late 1999.  Maybe some of this stuff will disappear due to lack of use. Did ISO sack the trigraph crap from C89/C90?


May 30, 2003
"Mark T" <Mark_member@pathlink.com> wrote in message news:bb6710$1v5d$1@digitaldaemon.com...
> I don't think there is a full implementation of C99 yet. It was adopted in
late
> 1999.  Maybe some of this stuff will disappear due to lack of use. Did ISO
sack
> the trigraph crap from C89/C90?

No, trigraphs are still there.


Regards,
Martin M. Pedersen


1 2
Next ›   Last »