Jump to page: 1 2
Thread overview
D is too english centric
May 27, 2003
Martin M. Pedersen
May 28, 2003
Walter
May 28, 2003
Martin M. Pedersen
May 28, 2003
Walter
May 29, 2003
Burton Radons
May 29, 2003
Walter
May 29, 2003
Bill Cox
May 29, 2003
Benji Smith
May 29, 2003
Martin M. Pedersen
May 28, 2003
Ilya Minkov
May 28, 2003
Martin M. Pedersen
May 28, 2003
Bill Cox
May 28, 2003
Bill Cox
May 28, 2003
Georg Wrede
May 28, 2003
Martin M. Pedersen
May 28, 2003
Walter
C99 Link Compatibility (was Re: D is too english centric)
May 28, 2003
Mark Evans
May 28, 2003
Mark Evans
May 30, 2003
Mark T
May 30, 2003
Martin M. Pedersen
May 27, 2003
Hi,

I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].

I find this unfortunate, and in contrast to the one of the main goals of D: Link compability with C.

It has previously been argued, that only english should be used for identifiers in order to support reuse better across language boundaries. But that argument isn't always valid. For example, half a decade ago, I was involved in building the IT-infrastructure for a nation-wide real estate network. One of the requirements was that *everything* was in dansh.. It involved lots of developers nation-wide, but noone outside Denmark. Of cause, identifiers couldn't be fully danish - and thereby introduced inconsistency in how things was names. But that was only a limitation of C back than, which might not be an issue a few years from now. If D has this limitation, it might be a valid reason to deselect D in favor of other languages. After all, english is only the native language of a miniority.

Regards,
Martin M. Pedersen


May 28, 2003
"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:bb0sqs$1t1k$1@digitaldaemon.com...
> I have noted that C99 allows *any* unicode character to be used in identifiers using \u.

No, only characters that fall into certain unicode ranges.

> The D specification limits characters in identifiers
> to letters, digits, and '_', but does not even define what a letter is.
The
> DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].
> I find this unfortunate, and in contrast to the one of the main goals of
D:
> Link compability with C.

It's a good idea to change it to match C for the reasons you state.


May 28, 2003
In article <bb0sqs$1t1k$1@digitaldaemon.com>, Martin M. Pedersen says...

>It has previously been argued, that only english should be used for identifiers in order to support reuse better across language boundaries. But that argument isn't always valid. For example, half a decade ago, I was involved in building the IT-infrastructure for a nation-wide real estate network. One of the requirements was that *everything* was in dansh.. It involved lots of developers nation-wide, but noone outside Denmark. Of cause, identifiers couldn't be fully danish - and thereby introduced inconsistency in how things was names. But that was only a limitation of C back than, which might not be an issue a few years from now. If D has this limitation, it might be a valid reason to deselect D in favor of other languages. After all, english is only the native language of a miniority.

Hello, i believe there was a flamewar to this topic a few months ago, starting from an old 1st april joke article from Bjarne Stroustrup about adding unicode identifiers to C++.

I believe that most people on this newsgroup are not native english speakers.
And nontheless, the idea has found very little support, since:
- for almost any language, a transliteration scheme exists which approximates
the language in terms of latin alphabet;
- keywords are english anyway, and in D there is no preprocessor to un-english
them. :) Using any language other than english would yuild to inclonsistency
anyway.
- i know quite a number of languages, but i have tremendous problems switching
between them. It may take minutes every time. And having seen a single english
keyword, i start thinking in english and you can be sure of all my subsequent
comments to be in english. Then, i also cant't read both code and comments
simultaneously. So i have to translate the comments into english to get going. I
even refuse to use any code with comments in my native language. I believe there
are plenty of people experiencing the same problem.

So, if you *really* want to mix your native language into a project, why don't
you write a scanner, which would:
- translate keywords from your language into D;
- transliterate all other identifiers into latin letters.

This would basically be an extended version of a lexer, and lexing D is really simple. Besides, there's a good readymade lexer to borrow. :)

>I have noted that C99 allows *any* unicode character to be used in identifiers using \u. The D specification limits characters in identifiers to letters, digits, and '_', but does not even define what a letter is. The DMD implementation defines a letter to be ['A'..'Z', 'a'..'z'].

It is defined in the library. :>

>I find this unfortunate, and in contrast to the one of the main goals of D: Link compability with C.

I have not seen a single piece of code using this silly feature. Is there any programmer's editor which has \u unicode support as of yet? And any IDE?

I would also like to see how many compilers implement that - and in what manner. Even if some does, it would probably be incompatible with that of other compilers. So would you say, C violates the requierement of link compatibility with itself as well? :>

-i.


May 28, 2003
"Walter" <walter@digitalmars.com> wrote in message news:bb1c8v$2e2l$1@digitaldaemon.com...
> > I have noted that C99 allows *any* unicode character to be used in identifiers using \u.
> No, only characters that fall into certain unicode ranges.

I haven't found that, but I you are the export, so I believe you . It makes sense too.

> > Link compability with C.
> It's a good idea to change it to match C for the reasons you state.

I'm glad we are in line here :-)

Regard,
Martin M. Pedersen


May 28, 2003
In article <bb0sqs$1t1k$1@digitaldaemon.com>, Martin M. Pedersen says...
>
>It has previously been argued, that only english should be used for identifiers in order to support reuse better across language boundaries. But that argument isn't always valid. For example, half a decade ago, I was involved in building the IT-infrastructure for a nation-wide real estate network. One of the requirements was that *everything* was in dansh.. It involved lots of developers nation-wide, but noone outside Denmark. Of cause, identifiers couldn't be fully danish - and thereby introduced inconsistency in how things was names.

Back in the bad old days, before MSDOS, we all used CP/M.
There was this Nationalist project in Finland, with the
goal of translating all operating system commands to
Finnish, or Finnish abbreviations. Ostensibly this would
be easier on people.

Turned out nobody wanted to use or learn the Finnish
version. Their explanation: since these commands are
"new words" to you anyway, the least of your troubles is
the spelling. Compared with trying to grasp the meaning
of these new concepts the spelling is a non-issue. And
if you then have to use a non Finnish version, you're
totally lost.

Sure, D code written in Chinese would be more compact,
maybe even more legible (in an absolute sense), with
its one character variable names and method names.
Maybe even parentheses and plus signs could be in
Chinese equivalents. But I don't believe they'd want it.

Most Finnish companies have a policy where all program
code and comments have to be in English. Even in those
companies where the programmers and staff speak hardly
any English at all.


May 28, 2003
"Ilya Minkov" <Ilya_member@pathlink.com> wrote in message news:bb2cup$in4$1@digitaldaemon.com...
> Hello, i believe there was a flamewar to this topic a few months ago,
starting
> from an old 1st april joke article from Bjarne Stroustrup about adding
unicode
> identifiers to C++.

I don't want to get into a flamewar, and I don't want to argue against your preferences for using english. My point is simply that sometimes it is not a choice one can make. For example, if you are supplied with libraries using unicode identifiers, that you are required to use. If it is necessary to wrap such functions in other C code, D cannot be said to be link compatible with C (C99). Likewise, you might also be required to implement an interface using such identifiers.


> I have not seen a single piece of code using this silly feature.

That is not really an argument. The feature exists, and will get support by compilers as times go by. Silly or not, compilers cannot be said to be C99 compliant if they do not support it. Any serious compiler vendor will go in that direction. And some will use this feature - there must have been a reason for its introduction.


> Is there any programmer's editor which has \u unicode support as of yet?
And any IDE?

They don't have to, as I read the document. They only need to support editing unicode. Translation phase 1 is:

 "Physical source file multibyte characters are mapped to the source
character set (introducing new-line characters for end-of-line indicators)
if necessary. Trigraph sequences are replaced by corresponding
single-character internal representation."

I believe this is also how DMD does things (except the trigraph stuff) - maps unicode chars \u-sequences, that is.


> I would also like to see how many compilers implement that - and in what
manner.

I don't know if the ABI is completely standardized, but the translation limits chapter gives me a clue how it is to be done:

"31 significant initial characters in an external identifier (each universal character name specifying a character short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a character short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)"

The numbers 6 and 10 indicates to me, that they will be encoded using "\uXXXX" and "\uXXXXXXXX" or something very similar. But that is only a guess.


Regards,
Martin M. Pedersen


May 28, 2003
"Georg Wrede" <Georg_member@pathlink.com> wrote in message
> Most Finnish companies have a policy where all program
> code and comments have to be in English. Even in those
> companies where the programmers and staff speak hardly
> any English at all.

So do we. Yet there are exceptions. If the customer pays us to develop and deliver source code, it is his requirements that counts, not our policy.

Regards,
Martin M. Pedersen


May 28, 2003
Hi, Ilya.

> I have not seen a single piece of code using this silly feature. Is there any
> programmer's editor which has \u unicode support as of yet? And any IDE?

The latest version of Vim supports UTF-8.  However, it requires a kernel patch that isn't in RedHat 7.3.  It is suppose to be in 8.0 on.  It also doesn't work in the last version of Cygwin I installed.  Anyone know how UTF support is comming along in emacs?

Bill

May 28, 2003
Err.... I read your post a little more carefully...  I don't know of any programming editors directly supporting the \u and \U features of C.

Bill Cox wrote:
> Hi, Ilya.
> 
>> I have not seen a single piece of code using this silly feature. Is there any
>> programmer's editor which has \u unicode support as of yet? And any IDE?
> 
> 
> The latest version of Vim supports UTF-8.  However, it requires a kernel patch that isn't in RedHat 7.3.  It is suppose to be in 8.0 on.  It also doesn't work in the last version of Cygwin I installed.  Anyone know how UTF support is comming along in emacs?
> 
> Bill
> 

May 28, 2003
"Martin M. Pedersen" <mmp@www.moeller-pedersen.dk> wrote in message news:bb2hou$oas$1@digitaldaemon.com...
> "Walter" <walter@digitalmars.com> wrote in message news:bb1c8v$2e2l$1@digitaldaemon.com...
> > > I have noted that C99 allows *any* unicode character to be used in identifiers using \u.
> > No, only characters that fall into certain unicode ranges.
>
> I haven't found that, but I you are the export, so I believe you . It
makes
> sense too.

"Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in annex D." C99 6.4.2.1-3


« First   ‹ Prev
1 2