A solution to the keyword problem - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » A solution to the keyword problem

Thread overview

A solution to the keyword problem
Jul 25, 2005 Greg Smith
Jul 26, 2005 Charles Hixson
Jul 26, 2005 AJG
Jul 26, 2005 Greg Smith
Jul 27, 2005 Regan Heath
Jul 29, 2005 Hasan Aljudy
Aug 03, 2005 Greg Smith
Aug 03, 2005 J C Calvarese
Aug 03, 2005 Regan Heath
Aug 04, 2005 J C Calvarese
Aug 04, 2005 Greg Smith
Aug 04, 2005 AJG
Aug 04, 2005 Greg Smith
Aug 04, 2005 AJG
Aug 04, 2005 Derek Parnell
Aug 05, 2005 AJG
Aug 05, 2005 Derek Parnell
Aug 05, 2005 J C Calvarese
Aug 05, 2005 AJG
Aug 05, 2005 Derek Parnell
Aug 05, 2005 AJG
Aug 05, 2005 Greg Smith
Aug 05, 2005 AJG
Aug 05, 2005 Greg Smith
Aug 04, 2005 J C Calvarese

July 25, 2005

A solution to the keyword problem

Posted by Greg Smith

Greg Smith

What problem? IMHO, there are too many, and more to the point, there are more than 20 of them which simply don't need to be keywords.
A couple of weeks ago I made the suggestion that all the built-in types
(int, double, wchar etc) should be predefined identifiers, like
'Object'; likewise, 'true' and 'false' do not need to be keywords. There were some responses suggesting that there was no need for such a change.

I just believe at a gut level that's it a bad idea to put things in the keyword table when they can be done as predefined identifiers. I don't know of any other well-thought-out language which does this. But, that's not a very strong argument. So, I've been looking through the spec, and have found some specific things which support a reduction of the number of keywords.  Also, I have a simple suggestion which will allow a workaround for the issues caused by the keywords.

So, weakest first:

(1) Version identifiers are in a separate namespace. If I write some code with version(remote){...}, and later on, 'remote' becomes a keyword, then not only will I have to change all the code, I'll need tp change the build scripts, too.

(2) the import statement, and module names, are tied to the file system.
I might create modules called 'supercomm.local' and 'supercomm.remote', if 'local' or 'remote' are added to the language as keywords, I have
to rename the directories to fix it. If this is 3rd-party code supplied as a library, I'm skunked.

(3) D can interface to C code. Unless the C code uses identifiers which are D keywords. At particular risk of being C function or variable names: align, bit, byte, delegate, export, final, interface, version.

Not all of these problems can be really solved by eliminating the roughly 20% of keywords which stand in for predefined types or values, of course; you just reduce the risk of a collision.

So the suggestion is this: provide a lexical convention for quoting identifiers, to prevent them from being recognized as keywords:

import supercomm.'local';  /* ... assuming 'local' became a keyword */

  /*
   * access to C function called 'final', which is a D keyword
   */
extern (C) int 'final'( struct foodle *p );


The proposed extension to the D lexer is that an 'identifier' of at least two characters can be enclosed in single quotes; this will not change its meaning as an identifier but will prevent it from being recognized as a keyword. VHDL has this feature; it's very important to VHDL since it's often necessary to give certain names to ports in VHDL (even if those names are VHDL keywords), to interface to external things. You can even have names starting with digits in VHDL, but they must be quoted. The VHDL  quouting convention is to use slashes:

    variable /7up/: integer;  -- variable name starts with digit
    variable /if/: integer;   -- variable name same as keyword
...
    if /if/ > 0 then
       /7up/ := /7up/ + 3;
    end if;

The problem with slashes is you need to properly handle things like "a=b/c/d;" I suspect, in VHDL, that the set of tokens which can legally precede an identifier is disjoint from the set which can legally precede the '/' operator, so the lexer can just look at the previous token. However, in C/C++/D, ')', for one, can legally precede either of these.

Another possibility is to put a single letter in front of the string quote, this is consistent with r"foo\bar" and x"0D0A": i.e.

import supercomm.w"local";

    extern (C) int w"final"( struct foodle *p );

Personally I prefer the single-quote approach, but I don't think it matters too much, as long as there's a way to do it. Another possibility: <final>, but that presumes that expressions like a<b>c are useless enough that you don't mind encumbering them with the need for spaces.

Scripts which automatically translate C headers to D could detect the reserved D words and automatically escape them; or even escape any C identifer which is in danger of becoming a D keyword, since there's no harm in escaping something that isn't actually a keyword.

Another example of a separate namespace which is unnecessarily harmed by defining built-in types as keywords: goto labels. If I have some code where I do 'goto pchar', and pchar later becomes a new type and a new keyword, this code will need to be changed. If 'pchar' is added as a new pre-defined identifier, no harm will be done.

A while ago I was trying to build some old C++ code; a class had been defined with member functions called 'and', 'or' and 'xor'. These have been, fairly recently, added to the C++ language as keywords, presumably for folks who don't have '^','|' and '&'. (There are some more to cover all the operators with these characters in them). What happened was, I got these baffling error messages, I ended up looking at the preprocessor output to see if there was something screwy in there; I eventually found the problem and created a header file like this

#define xor ident_xor
#define and ident_and
...


But, D doesn't have #define. Also, if the code I was working with had been in the form of a 3rd party object module, it would have been impossible to link to  it without getting the compiler to recognize 'xor' etc as identifiers.
(turns out there's a gcc compiler option to suppress recognition of these silly keywords, BTW).

So to summarize the suggestion:

  - keywords can get you into trouble, especially when you are interfacing with other environments. D interfaces with C, and the file system (module names), in this manner.

  - the more keywords you have, the more likely to get into this trouble. Keywords not necessary for parsing should be removed and implemented as predefined identifiers, like 'Object' is already. If you don't want it to be possible to redefine them in certain contexts, then implement that as needed (no need to prohibit 'wchar' from being used as a goto label, for instance, which is currently impossible). If builtin types were identifiers, I could "import mmedia.real.realaudio;", without redefining 'real' in any harmful way.

 - a lexical convention should be added to allow the construction of identifiers whose patterns are otherwise reserved as keywords; as a workaround when you run into a problem.


One more thing: if the lexical convention is adopted, then it makes a big difference whether or not the built-in types become identifiers. Because, for instance 'Object' and unquoted Object would mean the same thing; likewise 'creal' and unquoted creal would mean the same thing if creal were a predefined identifier; if creal were a keyword, 'creal' and creal would of course be completely independent.

-- Greg

July 26, 2005

Re: A solution to the keyword problem

Posted by Charles Hixson
in reply to Greg Smith

Charles Hixson

Posted in reply to Greg Smith

Greg Smith wrote:
> What problem? IMHO, there are too many, and more to the point, there are more than 20 of them which simply don't need to be keywords.
> A couple of weeks ago I made the suggestion that all the built-in types
> (int, double, wchar etc) should be predefined identifiers, like
> 'Object'; likewise, 'true' and 'false' do not need to be keywords. There were some responses suggesting that there was no need for such a change.
...
> One more thing: if the lexical convention is adopted, then it makes a big difference whether or not the built-in types become identifiers. Because, for instance 'Object' and unquoted Object would mean the same thing; likewise 'creal' and unquoted creal would mean the same thing if creal were a predefined identifier; if creal were a keyword, 'creal' and creal would of course be completely independent.
> 
> -- Greg
I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them.  This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them.

The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance.  (Or development time.)

In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design.  I may like it, but I consider it a lot less important than I do many other features.)

July 26, 2005

Re: A solution to the keyword problem

Posted by AJG
in reply to Charles Hixson

AJG

Posted in reply to Charles Hixson

In article <dc40t4$2o2k$1@digitaldaemon.com>, Charles Hixson says...
>
>Greg Smith wrote:
>> What problem? IMHO, there are too many, and more to the point, there are
>> more than 20 of them which simply don't need to be keywords.
>> A couple of weeks ago I made the suggestion that all the built-in types
>> (int, double, wchar etc) should be predefined identifiers, like
>> 'Object'; likewise, 'true' and 'false' do not need to be keywords. There
>> were some responses suggesting that there was no need for such a change.
>...
>> One more thing: if the lexical convention is adopted, then it makes a big difference whether or not the built-in types become identifiers. Because, for instance 'Object' and unquoted Object would mean the same thing; likewise 'creal' and unquoted creal would mean the same thing if creal were a predefined identifier; if creal were a keyword, 'creal' and creal would of course be completely independent.
>> 
>> -- Greg
>I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them.  This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them.
>
>The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance.  (Or development time.)
>
>In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design.  I may like it, but I consider it a lot less important than I do many other features.)

I second this thought. Change alone: not worth it. Change with added benefit of complete type homogeneity: very much worth it.

--AJG.

July 26, 2005

Re: A solution to the keyword problem

Posted by Greg Smith
in reply to Charles Hixson

Greg Smith

Posted in reply to Charles Hixson

Charles Hixson wrote:
> Greg Smith wrote:
> 
>> What problem? IMHO, there are too many, and more to the point, there are more than 20 of them which simply don't need to be keywords.
>> A couple of weeks ago I made the suggestion that all the built-in types
>> (int, double, wchar etc) should be predefined identifiers, like
>> 'Object'; likewise, 'true' and 'false' do not need to be keywords. There were some responses suggesting that there was no need for such a change.
> 
> ...
> 
>> One more thing: if the lexical convention is adopted, then it makes a big difference whether or not the built-in types become identifiers. Because, for instance 'Object' and unquoted Object would mean the same thing; likewise 'creal' and unquoted creal would mean the same thing if creal were a predefined identifier; if creal were a keyword, 'creal' and creal would of course be completely independent.
>>
>> -- Greg
> 
> I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them.  This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them.
> 
> The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance.  (Or development time.)
> 
> In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design.  I may like it, but I consider it a lot less important than I do many other features.)

I can't speak to what you are suggesting; this is a 'type/class unification', I just don't know enough yet about D class semantics to comment. A similar change was made to Python in versions 2.2 and 2.3, and it was a great enhancement. It also took a lot of plannning and implementing in order to work well without breaking too much existing code.

However, I don't see how this major change is connected at all to the minor change I am suggesting. Currently, the built-in type 'int' is indicated by a keyword 'int'; I am suggesting that the keyword (and all the other 'type' keywords) be removed, and pre-defined identifiers be implemented instead. This would have *zero* effect on existing D code.

This change could be done without changing the semantics of the types. It would also be perfectly possible and reasonable to change the semantics of the types as you suggest, while leaving the 20 keywords in place. This would not, however, address the issues I raised in my post.
I would appreciate if someone could address these specifically rather than just saying, "I don't see the utility". I could be utterly wrong about the problems with the C interface, for instance, and I'd appreciate knowing why.

I can see how my mention of 'Object' could cause confusion; since 'Object' can be used as a base class, and 'int' can't; and 'Object' is a  pre-defined identifier and 'int' isn't.

But they're both predefined types; I'm *not* suggesting that int, real, etc. be changed into predefined identifiers so that they can be used as base classes; I'm suggesting that it be done because:

  (a) there's absolutely no reason whatsoever, none, anyone has given, or that I can think of, that they should be keywords as opposed to predefined identifiers, except for "because it's already done that way". If there is a reason it was done that way, I'd be interested in hearing it.

  (b) It would not break any code whatsoever. There is no effect on the semantics of any currently legal D code, since the new identifier 'int' has the same semantics as the old keyword 'int'. The only downside is the effort to change the compiler.

  (c) There are at least a few good reasons why types should *not* be keywords; it would make more identifiers available for unrelated namespaces such as goto labels and module (directory path) names.
It would provide a way to add new built-in types in a uniform manner, without breaking old code. Certain error messages are much clearer with this change. Refer to previous posts.

  (d) "because that's the way it's already done", to my mind, should not apply to 'tabula rasa' designs which are in such a preliminary stage, especially for relatively minor changes to the compiler. I'm really guessing at how big a change it is, it's certainly not trivial; but the fact that the language already has a predefined identifier for the 'Object' type suggests to me that this is mostly a matter of deleting a bunch of code in the parser and lexer, and adding some new code, near where 'Object' is defined, to predefine the new identifiers. Since you can define
   'alias int myint;'
and then use the identifier 'myint' in place of keyword 'int' everywhere, it is clear that the parser can already handle identifiers for ints, so it's largely a matter of deleting the grammar rules which recognize the keyword tokens.

And again, the suggestion for lexically escaping identifiers is almost completely independent of all of this stuff, and is, I feel, a decent workaround for problems caused by keywords clashing with external names.

-- Greg

July 27, 2005

Re: A solution to the keyword problem

Posted by Regan Heath
in reply to Greg Smith

Regan Heath

Posted in reply to Greg Smith

For what it's worth I like your idea. I have nothing further to add to the argument in either direction however. I wonder what Walter thinks, it appears to me that he is the only person in a position to have all the information for a descision on this.

Regan

On Tue, 26 Jul 2005 10:45:55 -0400, Greg Smith <greg@siliconoptix.com> wrote:
> Charles Hixson wrote:
>> Greg Smith wrote:
>>
>>> What problem? IMHO, there are too many, and more to the point, there are more than 20 of them which simply don't need to be keywords.
>>> A couple of weeks ago I made the suggestion that all the built-in types
>>> (int, double, wchar etc) should be predefined identifiers, like
>>> 'Object'; likewise, 'true' and 'false' do not need to be keywords. There were some responses suggesting that there was no need for such a change.
>>  ...
>>
>>> One more thing: if the lexical convention is adopted, then it makes a big difference whether or not the built-in types become identifiers. Because, for instance 'Object' and unquoted Object would mean the same thing; likewise 'creal' and unquoted creal would mean the same thing if creal were a predefined identifier; if creal were a keyword, 'creal' and creal would of course be completely independent.
>>>
>>> -- Greg
>>  I could see doing this if it were a part of a more wide reaching change which would turn the built-in types into Objects (well, magic objects, that were literally present rather than pointed to...same internal representation, but different language mapping), resulting in such features as, say, 10.mul(7) being the equivalent of 10 * 7, and allowing one to create classes (NON-magic classes) descending from them.  This would also need to somehow harmonize array so that it could be a normal class, probably by allowing things like optAssign and optSliceAssign (do I remember those names correctly...I always need to look them up) to be redefined and used on classes that AREN'T strictly arrays as we now know them.
>>  The trick would be doing all that without paying an unacceptable penalty in terms of either ambiguity or performance.  (Or development time.)
>>  In other words...I don't see the utility of what you are proposing as an isolated change, and including it where it appears to fit won't happen until AFTER version 1.0 is out. (Probably not then, as I don't think that Walter likes the "Everything is an object" school of language design.  I may like it, but I consider it a lot less important than I do many other features.)
>
> I can't speak to what you are suggesting; this is a 'type/class unification', I just don't know enough yet about D class semantics to comment. A similar change was made to Python in versions 2.2 and 2.3, and it was a great enhancement. It also took a lot of plannning and implementing in order to work well without breaking too much existing code.
>
> However, I don't see how this major change is connected at all to the minor change I am suggesting. Currently, the built-in type 'int' is indicated by a keyword 'int'; I am suggesting that the keyword (and all the other 'type' keywords) be removed, and pre-defined identifiers be implemented instead. This would have *zero* effect on existing D code.
>
> This change could be done without changing the semantics of the types. It would also be perfectly possible and reasonable to change the semantics of the types as you suggest, while leaving the 20 keywords in place. This would not, however, address the issues I raised in my post.
> I would appreciate if someone could address these specifically rather than just saying, "I don't see the utility". I could be utterly wrong about the problems with the C interface, for instance, and I'd appreciate knowing why.
>
> I can see how my mention of 'Object' could cause confusion; since 'Object' can be used as a base class, and 'int' can't; and 'Object' is a   pre-defined identifier and 'int' isn't.
>
> But they're both predefined types; I'm *not* suggesting that int, real, etc. be changed into predefined identifiers so that they can be used as base classes; I'm suggesting that it be done because:
>
>    (a) there's absolutely no reason whatsoever, none, anyone has given, or that I can think of, that they should be keywords as opposed to predefined identifiers, except for "because it's already done that way". If there is a reason it was done that way, I'd be interested in hearing it.
>
>    (b) It would not break any code whatsoever. There is no effect on the semantics of any currently legal D code, since the new identifier 'int' has the same semantics as the old keyword 'int'. The only downside is the effort to change the compiler.
>
>    (c) There are at least a few good reasons why types should *not* be keywords; it would make more identifiers available for unrelated namespaces such as goto labels and module (directory path) names.
> It would provide a way to add new built-in types in a uniform manner, without breaking old code. Certain error messages are much clearer with this change. Refer to previous posts.
>
>    (d) "because that's the way it's already done", to my mind, should not apply to 'tabula rasa' designs which are in such a preliminary stage, especially for relatively minor changes to the compiler. I'm really guessing at how big a change it is, it's certainly not trivial; but the fact that the language already has a predefined identifier for the 'Object' type suggests to me that this is mostly a matter of deleting a bunch of code in the parser and lexer, and adding some new code, near where 'Object' is defined, to predefine the new identifiers. Since you can define
>     'alias int myint;'
> and then use the identifier 'myint' in place of keyword 'int' everywhere, it is clear that the parser can already handle identifiers for ints, so it's largely a matter of deleting the grammar rules which recognize the keyword tokens.
>
> And again, the suggestion for lexically escaping identifiers is almost completely independent of all of this stuff, and is, I feel, a decent workaround for problems caused by keywords clashing with external names.
>
>
>
> -- Greg

July 29, 2005

Re: A solution to the keyword problem

Posted by Hasan Aljudy
in reply to Greg Smith

Hasan Aljudy

Posted in reply to Greg Smith

I have nothing against that.

Maybe you would want to change the terminology you are using, because in the other thread you made, I got totally the wrong idea.

keyword: for me (and I'm an idiot, btw) means an identifier that has a special meaning. >>> therefor it makes perfect sense that things like "int" and "this" be keywords.

keyowrd: when you speak of it, you actually mean how the compiler deals with this "special identifier"

So when you say remove keywords, you aren't suggesting to make D dynamicly typed or anything like that.

^
^
That's what confused me in the previous thread. Maybe it's just because I don't know alot about compilers.


Although one concern remains: syntax hilighting!
I'm thinking most editors with d-syntax-hilighting would color 'real' in:
#import something.real.somethingelse;



Greg Smith wrote:
> What problem? IMHO, there are too many, and more to the point, there are more than 20 of them which simply don't need to be keywords.
> A couple of weeks ago I made the suggestion that all the built-in types
> (int, double, wchar etc) should be predefined identifiers, like
> 'Object'; likewise, 'true' and 'false' do not need to be keywords. There were some responses suggesting that there was no need for such a change.
> 
> I just believe at a gut level that's it a bad idea to put things in the keyword table when they can be done as predefined identifiers. I don't know of any other well-thought-out language which does this. But, that's not a very strong argument. So, I've been looking through the spec, and have found some specific things which support a reduction of the number of keywords.  Also, I have a simple suggestion which will allow a workaround for the issues caused by the keywords.
> 
> So, weakest first:
> 
> (1) Version identifiers are in a separate namespace. If I write some code with version(remote){...}, and later on, 'remote' becomes a keyword, then not only will I have to change all the code, I'll need tp change the build scripts, too.
> 
> (2) the import statement, and module names, are tied to the file system.
> I might create modules called 'supercomm.local' and 'supercomm.remote', if 'local' or 'remote' are added to the language as keywords, I have
> to rename the directories to fix it. If this is 3rd-party code supplied as a library, I'm skunked.
> 
> (3) D can interface to C code. Unless the C code uses identifiers which are D keywords. At particular risk of being C function or variable names: align, bit, byte, delegate, export, final, interface, version.
> 
> Not all of these problems can be really solved by eliminating the roughly 20% of keywords which stand in for predefined types or values, of course; you just reduce the risk of a collision.
> 
> So the suggestion is this: provide a lexical convention for quoting identifiers, to prevent them from being recognized as keywords:
> 
> import supercomm.'local';  /* ... assuming 'local' became a keyword */
> 
>   /*
>    * access to C function called 'final', which is a D keyword
>    */
> extern (C) int 'final'( struct foodle *p );
> 
> 
> The proposed extension to the D lexer is that an 'identifier' of at least two characters can be enclosed in single quotes; this will not change its meaning as an identifier but will prevent it from being recognized as a keyword. VHDL has this feature; it's very important to VHDL since it's often necessary to give certain names to ports in VHDL (even if those names are VHDL keywords), to interface to external things. You can even have names starting with digits in VHDL, but they must be quoted. The VHDL  quouting convention is to use slashes:
> 
>     variable /7up/: integer;  -- variable name starts with digit
>     variable /if/: integer;   -- variable name same as keyword
> ...
>     if /if/ > 0 then
>        /7up/ := /7up/ + 3;
>     end if;
> 
> The problem with slashes is you need to properly handle things like "a=b/c/d;" I suspect, in VHDL, that the set of tokens which can legally precede an identifier is disjoint from the set which can legally precede the '/' operator, so the lexer can just look at the previous token. However, in C/C++/D, ')', for one, can legally precede either of these.
> 
> Another possibility is to put a single letter in front of the string quote, this is consistent with r"foo\bar" and x"0D0A": i.e.
> 
> import supercomm.w"local";
> 
>     extern (C) int w"final"( struct foodle *p );
> 
> Personally I prefer the single-quote approach, but I don't think it matters too much, as long as there's a way to do it. Another possibility: <final>, but that presumes that expressions like a<b>c are useless enough that you don't mind encumbering them with the need for spaces.
> 
> Scripts which automatically translate C headers to D could detect the reserved D words and automatically escape them; or even escape any C identifer which is in danger of becoming a D keyword, since there's no harm in escaping something that isn't actually a keyword.
> 
> Another example of a separate namespace which is unnecessarily harmed by defining built-in types as keywords: goto labels. If I have some code where I do 'goto pchar', and pchar later becomes a new type and a new keyword, this code will need to be changed. If 'pchar' is added as a new pre-defined identifier, no harm will be done.
> 
> A while ago I was trying to build some old C++ code; a class had been defined with member functions called 'and', 'or' and 'xor'. These have been, fairly recently, added to the C++ language as keywords, presumably for folks who don't have '^','|' and '&'. (There are some more to cover all the operators with these characters in them). What happened was, I got these baffling error messages, I ended up looking at the preprocessor output to see if there was something screwy in there; I eventually found the problem and created a header file like this
> 
> #define xor ident_xor
> #define and ident_and
> ...
> 
> 
> But, D doesn't have #define. Also, if the code I was working with had been in the form of a 3rd party object module, it would have been impossible to link to  it without getting the compiler to recognize 'xor' etc as identifiers.
> (turns out there's a gcc compiler option to suppress recognition of these silly keywords, BTW).
> 
> So to summarize the suggestion:
> 
>   - keywords can get you into trouble, especially when you are interfacing with other environments. D interfaces with C, and the file system (module names), in this manner.
> 
>   - the more keywords you have, the more likely to get into this trouble. Keywords not necessary for parsing should be removed and implemented as predefined identifiers, like 'Object' is already. If you don't want it to be possible to redefine them in certain contexts, then implement that as needed (no need to prohibit 'wchar' from being used as a goto label, for instance, which is currently impossible). If builtin types were identifiers, I could "import mmedia.real.realaudio;", without redefining 'real' in any harmful way.
> 
>  - a lexical convention should be added to allow the construction of identifiers whose patterns are otherwise reserved as keywords; as a workaround when you run into a problem.
> 
> 
> One more thing: if the lexical convention is adopted, then it makes a big difference whether or not the built-in types become identifiers. Because, for instance 'Object' and unquoted Object would mean the same thing; likewise 'creal' and unquoted creal would mean the same thing if creal were a predefined identifier; if creal were a keyword, 'creal' and creal would of course be completely independent.
> 
> -- Greg
> 
> 
> 
> 
> 
> 
> 
> 
>

August 03, 2005

Re: A solution to the keyword problem

Posted by Greg Smith
in reply to Hasan Aljudy

Greg Smith

Posted in reply to Hasan Aljudy

Hasan Aljudy wrote:

> I have nothing against that.
> 
> Maybe you would want to change the terminology you are using, because in the other thread you made, I got totally the wrong idea.
> 
> keyword: for me (and I'm an idiot, btw) means an identifier that has a special meaning. >>> therefor it makes perfect sense that things like "int" and "this" be keywords.
> 
It's a question of what exactly that special meaning is, and where it's implemented. By your broad definition, 'Object' should be a keyword in D. But if you look in the list of keywords ( http://www.digitalmars.com/d/lex.html#keyword ) , you won't find it there.

'Keyword' is pretty standard terminology in the compiler business, and it is tied to the almost universal concept of analyzing source code by a two-stage process of (a) lexical analysis (lexing) and (b) grammatical analysis (parsing).  A 'keyword' is recognized in the lexer, prior to parsing. Predefined identifiers like 'Object' are recognized after parsing, and, unlike keywords, their interpretation may depend on name scoping rules. Since name scopes are generally defined by the structure of the program, which is inferred by the parser, it's very cumbersome to apply scoping rules before parsing.

Because of this distinction, you can have a D local variable called 'Object' but not one called 'while'.

I suspect that if I used different terminology, the result would be greater confusion overall.

- Greg

August 03, 2005

Re: A solution to the keyword problem

Posted by J C Calvarese
in reply to Greg Smith

J C Calvarese

Posted in reply to Greg Smith

In article <dcr2se$1hv2$1@digitaldaemon.com>, Greg Smith says...
>
>Hasan Aljudy wrote:
>
>> I have nothing against that.
>> 
>> Maybe you would want to change the terminology you are using, because in the other thread you made, I got totally the wrong idea.
>> 
>> keyword: for me (and I'm an idiot, btw) means an identifier that has a special meaning. >>> therefor it makes perfect sense that things like "int" and "this" be keywords.
>> 
>It's a question of what exactly that special meaning is, and where it's implemented. By your broad definition, 'Object' should be a keyword in D. But if you look in the list of keywords ( http://www.digitalmars.com/d/lex.html#keyword ) , you won't find it there.
>
>'Keyword' is pretty standard terminology in the compiler business, and it is tied to the almost universal concept of analyzing source code by a two-stage process of (a) lexical analysis (lexing) and (b) grammatical analysis (parsing).  A 'keyword' is recognized in the lexer, prior to parsing. Predefined identifiers like 'Object' are recognized after parsing, and, unlike keywords, their interpretation may depend on name scoping rules. Since name scopes are generally defined by the structure of the program, which is inferred by the parser, it's very cumbersome to apply scoping rules before parsing.

I'm not a compiler writer, so examples help me understand what your suggesting.

>Because of this distinction, you can have a D local variable called 'Object' but not one called 'while'.

I didn't even realize that we could do this:

import std.stdio;
int main()
{
int Object;
Object = 1;
writefln(Object);
return true;
}

It seems like a clever way coding that I'd probably end up shooting myself in the foot with. So now, you want to be able to do this:

import std.stdio;

int main()
{
double int;
int = 1;
writefln(int);
return true;
}

Seems like a bad idea to me. I don't see a big problem with the compiler prohibiting this. Yes, it could be a pain if you're porting a library written in another language that uses "int" as an identifier, but I don't expect that happens too often.

>I suspect that if I used different terminology, the result would be greater confusion overall.

Probably. It's a confusing topic anyway.

>- Greg

That's my 1.99999 cents.

jcc7

August 03, 2005

Re: A solution to the keyword problem

Posted by Regan Heath
in reply to J C Calvarese

Regan Heath

Posted in reply to J C Calvarese

On Wed, 3 Aug 2005 20:01:21 +0000 (UTC), J C Calvarese <technocrat7@gmail.com> wrote:
>>> Maybe you would want to change the terminology you are using, because in
>>> the other thread you made, I got totally the wrong idea.
>>>
>>> keyword: for me (and I'm an idiot, btw) means an identifier that has a
>>> special meaning. >>> therefor it makes perfect sense that things like
>>> "int" and "this" be keywords.
>>>
>> It's a question of what exactly that special meaning is, and where it's
>> implemented. By your broad definition, 'Object' should be a keyword in
>> D. But if you look in the list of keywords (
>> http://www.digitalmars.com/d/lex.html#keyword ) , you won't find it there.
>>
>> 'Keyword' is pretty standard terminology in the compiler business, and
>> it is tied to the almost universal concept of analyzing source code by a
>> two-stage process of (a) lexical analysis (lexing) and (b) grammatical
>> analysis (parsing).  A 'keyword' is recognized in the lexer, prior to
>> parsing. Predefined identifiers like 'Object' are recognized after
>> parsing, and, unlike keywords, their interpretation may depend on name
>> scoping rules. Since name scopes are generally defined by the structure
>> of the program, which is inferred by the parser, it's very cumbersome to
>> apply scoping rules before parsing.
>
> I'm not a compiler writer, so examples help me understand what your suggesting.

I've writen a simple C parser. I recommend it to anyone who wants an interesting and challenging (assuming no experience) challenge.

>> Because of this distinction, you can have a D local variable called
>> 'Object' but not one called 'while'.
>
> I didn't even realize that we could do this:
>
> import std.stdio;
> int main()
> {
> int Object;
> Object = 1;
> writefln(Object);
> return true;
> }
>
> It seems like a clever way coding that I'd probably end up shooting myself in the foot with. So now, you want to be able to do this:
>
> import std.stdio;
>
> int main()
> {
> double int;
> int = 1;
> writefln(int);
> return true;
> }
>
> Seems like a bad idea to me. I don't see a big problem with the compiler
> prohibiting this. Yes, it could be a pain if you're porting a library written in another language that uses "int" as an identifier, but I don't expect that happens too often.

One of the points made was that you can still prohibit the use above, if you want, as part of the 'parser' (not the lexer), the advantage being that you have more control over where you allow it and where you prohibit it.

Personally I don't have a problem with:

> double int;
> int = 1;
> writefln(int);

because it's clear when you read any of those lines, together or stand alone, that 'int' is not a type but a variable. This is because, you, like the parser can take the context of 'int' into consideration when you read the lines.

About the worst bug I can think of would be if you meant to type "int a = 1;" and accidently missed the "a" getting "int = 1;".  But, that would cause an error in all situations except where there was a variable called "int" present.  So, it's really no different to any other typo except that it can occur in a variable declaration (which can occur just about anywhere, anyway). Can anyone think of a potential bug which becomes 'more' likely because of this change?

If the keywords were removed then yes, code like that shown above with 'int' could become legal, unless Walter decided to prohibit it in the parser. The advantage would would be a change to the D grammar, it would get smaller and simpler.

Another point made was that as a result you get more descriptive error messages for less work (you don't have to code special cases in the lexer). Some examples were given in a previous thread.

The fact that conversion from another language, that allows "this" or any other D keyword, would become easier is another advantage.

In short, more control, better errors, other advantages and no disadvantages (except some work for Walter to implement the change) that I can see.

Regan

August 04, 2005

Re: A solution to the keyword problem

Posted by J C Calvarese
in reply to Regan Heath

J C Calvarese

Posted in reply to Regan Heath

In article <opsuydw2lk23k2f5@nrage.netwin.co.nz>, Regan Heath says...
>
>On Wed, 3 Aug 2005 20:01:21 +0000 (UTC), J C Calvarese <technocrat7@gmail.com> wrote:
..
>> I'm not a compiler writer, so examples help me understand what your suggesting.
>
>I've writen a simple C parser. I recommend it to anyone who wants an interesting and challenging (assuming no experience) challenge.

I'm much too lazy for that. ;)

>
>>> Because of this distinction, you can have a D local variable called 'Object' but not one called 'while'.
>>
>> I didn't even realize that we could do this:
>>
>> import std.stdio;
>> int main()
>> {
>> int Object;
>> Object = 1;
>> writefln(Object);
>> return true;
>> }
>>
>> It seems like a clever way coding that I'd probably end up shooting myself in the foot with. So now, you want to be able to do this:
>>
>> import std.stdio;
>>
>> int main()
>> {
>> double int;
>> int = 1;
>> writefln(int);
>> return true;
>> }
>>
>> Seems like a bad idea to me. I don't see a big problem with the compiler prohibiting this. Yes, it could be a pain if you're porting a library written in another language that uses "int" as an identifier, but I don't expect that happens too often.
>
>One of the points made was that you can still prohibit the use above, if you want, as part of the 'parser' (not the lexer), the advantage being that you have more control over where you allow it and where you prohibit it.

Well, if we're not prohibiting this kind of crazy, I don't understand why we'd undertake the effort. Are we trying to speed up a compiler that's already blazingly fast?

>Personally I don't have a problem with:
>
>> double int;
>> int = 1;
>> writefln(int);
>
>because it's clear when you read any of those lines, together or stand alone, that 'int' is not a type but a variable. This is because, you, like the parser can take the context of 'int' into consideration when you read the lines.

It's clear to me that who ever wrote that example was a masochist (oops, I wrote
that). The context shows that something fishy is going on. :)

>About the worst bug I can think of would be if you meant to type "int a = 1;" and accidently missed the "a" getting "int = 1;".  But, that would cause an error in all situations except where there was a variable called "int" present.  So, it's really no different to any other typo except that

Right. Well, I'm worried about when a variable called "int" is present.

>it can occur in a variable declaration (which can occur just about anywhere, anyway). Can anyone think of a potential bug which becomes 'more' likely because of this change?

Hey, we could make semi-colons optional, too. :)

>If the keywords were removed then yes, code like that shown above with 'int' could become legal, unless Walter decided to prohibit it in the parser. The advantage would would be a change to the D grammar, it would get smaller and simpler.
>
>Another point made was that as a result you get more descriptive error messages for less work (you don't have to code special cases in the lexer). Some examples were given in a previous thread.

There's always cost/benefit issues. Are you sure that the cost of changing the innards of the compiler outweigh the benefit of possibly better error messages? I'm sure I don't know. Even if we do get better error messages, though, it seems like a detour taking us farther away from D 1.0.

>The fact that conversion from another language, that allows "this" or any other D keyword, would become easier is another advantage.

So then we could call a method "this". Could we use the "this" of that method's class? :x I think keywords serve a purpose ("This identifier is off-limits"). It hurts my head to consider all of the possible pitfalls.

>In short, more control, better errors, other advantages and no disadvantages (except some work for Walter to implement the change) that I can see.
>
>Regan

Well, you might be right, but I'm unconvinced. If Walter wants to do it, he can do it. If someone else wants to try it out, that's what GDC is for: http://www.prowiki.org/wiki4d/wiki.cgi?GdcHacking

jcc7

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation