why are types all keywords? (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » why are types all keywords? (page 2)

July 19, 2005

Re: why are types all keywords?

Posted by Derek Parnell
in reply to Greg Smith

Derek Parnell

Posted in reply to Greg Smith

On Tue, 19 Jul 2005 15:45:12 -0400, Greg Smith wrote:

[snip]
> I don't think I've encountered any other well-thought-out language (and D clearly is one) which defines a whole bunch of keywords which are not actually necessary to the parsing process.

Greg, you have written a whole lot of sensible things here. You have my support for what its worth.

-- 
Derek Parnell
Melbourne, Australia
20/07/2005 8:02:37 AM

July 19, 2005

Re: why are types all keywords?

Posted by Hasan Aljudy
in reply to Greg Smith

Hasan Aljudy

Posted in reply to Greg Smith

Ah well, I guess I did help you make your point.

You're talking about compiler implementation, I have nothing to do with that.

My only concern is, well, I don't wanna be reading some code and keep wondering to myself whether this "int" here the real "int" or some variable name or class name defined by the user.

If it's merely a matter of compiler implementation then I don't care, as it's clearly not my business.

However, if what you are proposing would allow people to say "int" when they don't really mean the "int" that we currently know, then I have a problem with that.


Greg Smith wrote:
> Hasan Aljudy wrote:
> 
>>
>> Greg Smith wrote:
>>
>>> Hasan Aljudy wrote:
>>
>> [snip]
>>
>>>>
>>>> I just don't get it ...
>>>>
>>>> What's the point of making something like "int" not a keyword?
>>>>
>>>> #int int; //wth?
>>>> #class int
>>>> #{
>>>> # static int max = 1337; //wtf is int here? variable? type? class?
>>>> #}
>>>> #float double = int.max; //go figure
>>>> #double bit = cast(typeof( double )) int;
>>>>
>>> What's the point of *making* it a keyword???
>>>
>>> Yes, this change would allow you to redefine int. it's possible in
>>> other languages, and they haven't self-destructed as a result.
>>
>>
>> Sorry, all the languages I'v worked with are from the C family (C, C++, Java, D) with the exception of Pascal.
>>
>> How do other languages implement that?
> 
> 
> Very simple. You go into the symbol table at startup -- the same one
> into which the user names go - and you predefine the names there as types. Pascal does this, and you've probably never noticed. See?
> it doesn't hurt at all.
> 
>>
>>>  If this is a problem, you could make it illegal to redefine built-in names in certain scopes. If they are keywords, then
>>> this level of control is not possible.
>>
>>
>>
>> The ability to use "int" or "float" or "this" for one's own purposes is not really an advantage.
>>
> No, that's not the point. You can still make it illegal to redefine these. What's the difference between making it illegal to redefine them and making them keywords?
>   (1) by making them keywords, you complicate the grammar and gain no advantage by doing so; the grammar must still support type names which are identifiers.
>   (2) by making them keywords, you cause them to be treated differently, in the parser and semantic passes, from user-defined types. Functionality needs to be replicated in the compiler, since 'int' is discovered to be a type in the parser, while 'myint' is seen as an identifier in the parser, and is discovered to be a type in the semantic
> processing. This means more complexity than needed, and leads to inconsistent, and less useful, diagnostics.
>   (3) New built-in types can be added in future to the language as predefined identifiers, with much less likelihood of breaking old code
> than if they are added as new keywords.
>   (4) if they are defined as identifiers, you can make it illegal to
> redefine them in specific contexts. With keywords there is no such control.
> 
> To appeal to the KISS principle:
>   - If the built-in types can be implemented in the same way as the user-defined types, why not do so ?? If you want to make it illegal
> to redefine these, fine - but why chisel them into stone in the parser
> when the grammar doesn't need this, and would be simpler without it?
> 
>>>
>>> My point is, there's no reason to make it a keyword, unless you want
>>> it to always be (effectively) a special punctuation mark, in *all*
>>> possible contexts, and you want to extend that to *all* the built-in
>>> types, despite the fact that user-defined types don't have or need this
>>> special treatment, and you don't mind putting in extra grammar rules to
>>> deal with the fact that type names could be these keywords *or* identifiers.
>>
>>
>>
>> I still don't get your point ....
>> It's a keywrod because, well, how do you define a variable to be of a certain type? well, you use a "type name" to spcify the type of a variable.
>>
>> type_name variable_name;
>>
>> You can define your own types, but your own types will always be defined in terms of other types.
>>
>> typedef newtype oldtype;
>>
>> struct new_type
>> {
>>     some_known_type field1;
>>     some_other_known_type field2;
>>     //.. etc
>> }
>>
>> every new type is defined in terms of other type(s), there must be in the end a type which isn't defined in terms of anything.
>>
>> int is such a type.
>>
>> if it's not a keyword, then it can be turned on and off.
>> well, how do you turn it "on"? and what would be the point of having turned off?
>>
> Clearly, all types have to start from built-in types. This is immaterial
> to whether the built-in types are defined in the grammar as keywords, or
> in the symbol table as predefined names, as in pascal.
> 
> You are saying this: because there is no point in redefining them, they
> should be cast in stone in the parser. I mildly disagree with the premise, and I utterly disagree with the conclusion.
> 
> Regarding the premise, as I have pointed out, what if you want to add a new built-in type -- if you define it as a new keyword, it might conflict with a local variable name in some existing code.
> 
> If you want it to be illegal to redefine certain names, this is fine, but this does not by any means mean they need to be keywords!!
> IMHO this should be done in the symbol table, not by making keywords that are not required by the grammar. This is much simpler in the long run; it leads to better error messages, e.g. "can't redefine 'int' in this name space " vs. "Syntax error"; and allows control by scope, e.g. you might want to allow some names to be used in struct members.
> 
> 
>>
>> [snip]
>>
>>>>> I remember a long time
>>>>> ago, a buddy was baffled that his C code wouldn't compile in
>>>>> C++, it turned out he had a struct member called 'this' or 'catch'
>>>>> or something (this was before the days of syntax coloring).
> 
> ..
> 
>>>> Why didn't his compiler tell him that "this" is a keyword?
> 
>  ..
> 
>>> Why on earth would it do that? it reported a syntax error,
>>> since a keyword appeared in a position where it was not
>>> allowed by the grammar. A lot of tokens other than 'identifier'
>>> are allowed there - so you wouldn't even get something as helpful as
>>> "error at 'try' - expected 'identifier'"
>>>  Try it with your favourite C++ compiler.
>>
>>
>> I'm just saying the problem here is the error messege, not the keyword.
> 
> 
> I fully agree. And the best way to get better error messages is to allow the semantic pass to see these errors, rather than making them syntax errors, which is what happens when keywords are defined.
> 
>>
>>>>> In D as currently implemented,
>>>>>
>>>>>     i = int + 2;
>>>>>
>>>>> .. is a syntax error, whereas
>>>>>
>>>>>         alias int myint;
>>>>>          i = myint + 2;
>>>>>
>>>>>  ... is syntactically legal, but disallowed at the semantic level.
>>>>> Is this difference important or desirable?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I don't see your point .. both are errors.
>>>
>>>
>>>
>>>
>>> Here's the point:
>>>  (1) they are both, essentially, the same error, why should they
>>>      produce completely different error messages?
>>
>>
>>
>> because .. they can be treated differently.
>> for
>> #int + 2
>> there is no way around using something other than int.
>> but for
>> #myint + 2
>> you can redefine myint to be a variable, or you can use something other than myint.
>>
> Please step back and think about what I am trying to say with this example. Of course they can be, and are, treated differently; I know
> why the behavior occurs.
> I'm saying there's no advantage to this and there are disadvantages;
> which are eliminated by eliminating the keywords.
> They are treated differently because the *parser* knows 'int' is a type name, and has no rule allowing it to add a type to something; but the parser has a rule saying an identifier can be added to something. What I'm saying is: if int
> was *not* a keyword, we could eliminate the first rule, simplify the
> grammar, get better diagnostics, shorten the keyword table (and thus
> speed up the lexer) ... the compiler code which rejects 'int + 2' would
> then be the same code which rejects "myint+2".
> Is there any advantage to treating them differently?
> 
>>>  (2) the error message you get for the second one,
>>>    "can't do that to a type", is much more useful than the one you get
>>>    for the first, "syntax error".
>>
>>
>>
>> so? ask the compiler writer to produce a more informative error messege!
> 
> 
> You say later that you aren't familiar with compilers, and no offence,
> but that's showing here.
> By far he easiest way to improve the error message is to do away with the unnecessary keywords. It's very hard to produce helpful messages for errors which arise because no grammar rule is applicable. A syntax error is basically the parser saying "huh?". At best, it can tell you where it became irrevocably confused, and tell what kinds of tokens are legal at that point. It is possible to add additional grammar rules, solely for the purpose of matching specific illegal constructs, so that they can be given more meaningful error messages. This gets rather messy. And in this case, the desirable grammar rules already exist -- with 'identifier' in them, so that they don't apply when types happen to
> be built-in types.
> 
> It is far easier in the semantic phase to provide a guess at what you think the programmer was trying to do, and produce a useful error message. Imagine a language which allows array declarations sized by integer constants,or expressions formed of integer constants. It would be possible to make 'int a[-3]' a syntax error in such a language, by contriving the grammar so that no rule matched it. Far better to make it syntactically  legal, so the message is "error: negative array dimension for 'a'", rather than "syntax error". The test would be needed anyhow, since the grammar can't make "int a[7-10]" illegal.
> 
> Actually, we could get this improvement in D by modifying the grammar as such:
>    identifier_or_type::
>              IDENTIFIER  { $$ = lookup_ident($1); }
>           |  INT     { $$ = /* .. type obj for 'int' */ }
>           |  BYTE    { $$ = /* .. type obj for 'byte' */ }
>         ...
> 
> ... and eliminating all other rules referencing the type keywords, which, by D charter, are actually redundant. And, using 'identifier_or_type' in place of most IDENTIFER references (not the ones where IDENTIFER is assigned a meaning).
> 
> Thus, 'int + 2' would be caught by the same code as 'myint + 2'.
> 
> This change obtains most of the improvement I'm looking
> for while still preventing the names from being redefined. It's then a relatively small step to eliminate this one weird bit of grammar and provide predefined symbols.
> 
>>
>>
>> Ok, how would that help the language user?
>>
>> I never wrote a compiler, and I have no bit of clue about what you are talking about.
>>
> 
> 
>> But, assuming that you are corrent, and that it does indeed make writing the compielr easier .. your point still doesn't stand.
>>
>> The compiler has already been written!
>>
>> I think it would be much easier for the compiler aithur to use what he had already written than to rewrite the compiler to compensate for your suggestion.
> 
>  >
> This is a valid point in general, but there are times, and precious few of them, when there is an opportunity to get things right even it means changing something which already works as it is. D is, by charter, in such a situation. All such opportunities should be considered in the long-term view, since there will *never* be an easier time to make such
> a change. The cost of the change will be short-lived, the benefit will
> stay on.
> 
>>
>>>
>>> How about this: the language is in its early development. It is still
>>> possible to make changes like this. It will be much, much harder in the
>>> future. I still don't see one reason why there *should* be so many keywords (other than the fact that's already done that way)  and I've pointed out a few reasons why IMHO it's better, and cleaner, not to.
>>
>>
>>
>> Where are those reasons? I didn't see them.
>> The only reasons were:
>> 1- so you can use "int" as a variable name or something else.
>> 2- easier to implement in a compiler.
> 
> 
>>
>> but #1 is not really a practical reason. and I already answered #2
> 
> Regarding 2, the only reason you gave is the pre-existing code. Look
> at the trouble Bill Gates got us all into with that thinking in the
> early 80's. Do you really think the current D compiler will be the only one ever written?
> 
>  Also, you keep missing, or dismissing,
> 
>   3 - more consistent, useful error checking/error messages, by eliminating replication of semantic checking in the parser.
> 
>>
>>> The fact that C does the same thing does not qualify as a reason, since
>>> it's a stated goal of D to eliminate the very reason C needs to do that.
>>> So, having gone to the trouble to eliminate the need for keywords... why
>>> are they still there???
>>
>>
>>
>> Where does the documentation state that D's goal is to eliminate C's need for keywords?
>>
> Not quite that. The stated goal is to eliminate the need, which exists
> in C, for the parser to know which identifiers are previously defined
> as typedefs (or classes in C++), since C cannot be parsed otherwise.
> 
> 
> This makes D a 'context-free grammar', you don't need to feed
> information back to the parser from the symbol table.
> C defines 'int' etc as keywords for the same purpose, they must
> be distinguished (to the parser) from regular identifiers.
> (also, because C has idioms like 'unsigned char' which do not apply to typedefs, and have likewise been eliminated in D). So, making D's grammar context-free has, as a direct result, eliminated the need
> for type names to be keywords.
> 
> -----------------
> 
> http://www.digitalmars.com/d/index.html
> 
> Major Goals of D
>  ...
>     * Make D substantially easier to implement a compiler for than C++.
>  ...
>     * Have a context-free grammar.
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> -----------------
> 
> When I first encountered D, after reading the article on printf in Jan/05 Dr. Dobbs, I read the 'context-free grammar' part in the goals, and my first thought was "Great!" and my second was "... and so built-in types aren't keywords any more..." but that turned out be untrue, for reasons no-one has been able to supply.
> 
> I don't think I've encountered any other well-thought-out language (and D clearly is one) which defines a whole bunch of keywords which are not actually necessary to the parsing process.
> 
> 
> Thank you for helping me clarify my argument.
> 
> 
> BTW, I feel like I'm telling someone, "It's summer, you don't need to wear a snowsuit any more , you'll be more comfortable without it", and I keep getting back "you haven't really given me a strong enough reason to not wear it; the grocery store is a bit chilly, for instance; and I'm already wearing it and I know it fits..."
> 
> - greg
> 
>

July 20, 2005

Re: why are types all keywords?

Posted by Charles Hixson
in reply to Greg Smith

Charles Hixson

Posted in reply to Greg Smith

Greg Smith wrote:
>...

D doesn't have all the syntax that some languages (I'm thinking of Ada here) have which would allow you to specify how many bits a particular type should have, what value range it should allow, etc.  As a result all of the basic space allocating words need to by keywords.

A type basically means:
1) reserve this space.
2) define these operations over this space

Things get a bit more complex when we start thinking about where the space is allocated, how it interacts with other types, and how we pass it as a parameter, but those are the basics.

D has a simple (relatively simple) syntax.  As a result, it needs a large number of keywords.

July 20, 2005

Re: why are types all keywords?

Posted by Greg Smith
in reply to Charles Hixson

Greg Smith

Posted in reply to Charles Hixson

Charles Hixson wrote:
> Greg Smith wrote:
> 
> D doesn't have all the syntax that some languages (I'm thinking of Ada here) have which would allow you to specify how many bits a particular type should have, what value range it should allow, etc.  As a result all of the basic space allocating words need to by keywords.
> 
> A type basically means:
> 1) reserve this space.
> 2) define these operations over this space
> 
> Things get a bit more complex when we start thinking about where the space is allocated, how it interacts with other types, and how we pass it as a parameter, but those are the basics.

These are semantic issues which have absolutely nothing to do with whether the type names are keywords.
> 
> D has a simple (relatively simple) syntax.  As a result, it needs a large number of keywords.
This makes no sense at all; you must be using an unusual definition of 'keyword'. Languages with simpler syntax generally need
fewer keywords (example: python; extreme example: lisp). Furthermore,
the grammar of D would be simpler still if the type names became built-in identifiers (I mean the grammar in the compiler; the one known to the user would be effectively unchanged). Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me.

A keyword is a specific combination of letters (e.g. 'if', 'goto') which is recognized by the lexical scanner as having a distinct significance no matter where it appears in the token sequence, despite the fact that it follows the general rule defining how an 'identifier' is formed.

Keywords are assigned significance before their situation relative to the other tokens is analyzed (i.e. prior to the parser), whereas other
identifers are assigned meaning after the parsing process. When meaning
is assigned later, it is possible to apply sophisticated rules to the process (e.g. 'mtype' might be a function name at global scope, but also
defined as an alias type inside a function, and at the same time be the
name of members of several structs).

[None of this is immutable law of language design, it's just the way
modern languages are designed and parsed, and the terminology which is used. C and C++, in fact, require bending of these rules, which is generally viewed as a problem: Once a typedef is defined, references to it must be identified as such *prior* to the parser; since this requires scopes to be considered, and scopes are defined by the parsing process, this can be tricky].

When a word is a keyword, it's a keyword everywhere. So why define
keywords at all? The conventional language-design practice is that you define keywords as needed to make the language parseable. The 'if' keyword tells the parser to expect the structure of an 'if' statement.
An example of the opposite approach is FORTRAN, which was designed well before formal grammars had found their way into computer programming. In
FORTRAN there are no keywords, and spaces have no significance. As a result, FORTRAN is quite difficult to parse, even though the process needs to be done only on one line at a time. Consider:

100 FORMAT(I2,I3)
100 FORMAT(I2,I3)=0
    DO 100 I=1,20
    DO 100 I=1.20

The first is a 'format' statement for output formatting, and the second is an assignment to an element of a 2d array called 'FORMAT'.  The third
is a do loop, and the fourth is an assigment to 'DO10I'. In order to distingush these, a fortran compiler basically has to dither back and forth over the entire line, trying to figure out what the heck the thing is. The analogous behaviour in a language like D, which is not split into lines, would be to dither over the entire source file, making guesses about what things are and checking if those guesses still work when inner levels are analyzed. Ugh.

By having strategically positioned keywords, you can parse powerful grammars, with complex nested structure, in a more-or-less left-to-right fashion. Whenever you see 'if' sitting there, what follows is either an if statement or invalid input; you don't need to go find the other end of it to see if it might be something else. This design is very clear in  pascal, where every definition of anything starts with a keyword indicating exactly what you are defining: procedure or function, or variable,constant, or type; and that in turn tells the parser what to expect next [Ada too, I think]. In D, the parser needs to work a little harder to figure things out, but you have less clutter.

So the question is, why define a bunch of keywords which are not only unnecessary to the parsing process, but actually complicate the grammar? Anywhere in D where I can use 'int', I can also use 'myint', which is an identifier that I have aliased to 'int'. So the parser needs to understand every possible such construct where the type name is an identifier, and it needs additional rules to understand them when they are keywords. As I've mentioned previously, this doesn't just lead to a more complex parser, it also leads to inferior diagnostic messages.

[*] I've been reading the manual a bit more, and I've found that D already has a built-in type implemented as a predefined identifier: Object. So why are all the other ones keywords?

July 20, 2005

Re: why are types all keywords?

Posted by Ben Hinkle
in reply to Greg Smith

Ben Hinkle

Posted in reply to Greg Smith

> Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me.

Java and C# retain the basic types as keywords, though I don't know if they need to or not. It could be that they remain keywords to be more compatible with C/C++ tools - though that is just a guess. For example I'm not sure if the emacs mode and syntax highlighter would color 'int' correctly if it wasn't on the keyword list.

July 20, 2005

Re: why are types all keywords?

Posted by Charles Hixson
in reply to Greg Smith

Charles Hixson

Posted in reply to Greg Smith

Greg Smith wrote:
> Charles Hixson wrote:
>> Greg Smith wrote:
>>
>> D doesn't have all the syntax that some languages (I'm thinking of Ada here) have which would allow you to specify how many bits a particular type should have, what value range it should allow, etc.  As a result all of the basic space allocating words need to by keywords.
>>
>> A type basically means:
>> 1) reserve this space.
>> 2) define these operations over this space
>>
>> Things get a bit more complex when we start thinking about where the space is allocated, how it interacts with other types, and how we pass it as a parameter, but those are the basics.
> 
> These are semantic issues which have absolutely nothing to do with whether the type names are keywords.
>>
>> D has a simple (relatively simple) syntax.  As a result, it needs a large number of keywords.
> This makes no sense at all; you must be using an unusual definition of 'keyword'. Languages with simpler syntax generally need
> fewer keywords (example: python; extreme example: lisp). Furthermore,
> the grammar of D would be simpler still if the type names became built-in identifiers (I mean the grammar in the compiler; the one known to the user would be effectively unchanged). Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me.
> 
> A keyword is a specific combination of letters (e.g. 'if', 'goto') which is recognized by the lexical scanner as having a distinct significance no matter where it appears in the token sequence, despite the fact that it follows the general rule defining how an 'identifier' is formed.
> 
> Keywords are assigned significance before their situation relative to the other tokens is analyzed (i.e. prior to the parser), whereas other
> identifers are assigned meaning after the parsing process. When meaning
> is assigned later, it is possible to apply sophisticated rules to the process (e.g. 'mtype' might be a function name at global scope, but also
> defined as an alias type inside a function, and at the same time be the
> name of members of several structs).
> 
> [None of this is immutable law of language design, it's just the way
> modern languages are designed and parsed, and the terminology which is used. C and C++, in fact, require bending of these rules, which is generally viewed as a problem: Once a typedef is defined, references to it must be identified as such *prior* to the parser; since this requires scopes to be considered, and scopes are defined by the parsing process, this can be tricky].
> 
> When a word is a keyword, it's a keyword everywhere. So why define
> keywords at all? The conventional language-design practice is that you define keywords as needed to make the language parseable. The 'if' keyword tells the parser to expect the structure of an 'if' statement.
> An example of the opposite approach is FORTRAN, which was designed well before formal grammars had found their way into computer programming. In
> FORTRAN there are no keywords, and spaces have no significance. As a result, FORTRAN is quite difficult to parse, even though the process needs to be done only on one line at a time. Consider:
> 
> 100 FORMAT(I2,I3)
> 100 FORMAT(I2,I3)=0
>     DO 100 I=1,20
>     DO 100 I=1.20
> 
> The first is a 'format' statement for output formatting, and the second is an assignment to an element of a 2d array called 'FORMAT'.  The third
> is a do loop, and the fourth is an assigment to 'DO10I'. In order to distingush these, a fortran compiler basically has to dither back and forth over the entire line, trying to figure out what the heck the thing is. The analogous behaviour in a language like D, which is not split into lines, would be to dither over the entire source file, making guesses about what things are and checking if those guesses still work when inner levels are analyzed. Ugh.
> 
> By having strategically positioned keywords, you can parse powerful grammars, with complex nested structure, in a more-or-less left-to-right fashion. Whenever you see 'if' sitting there, what follows is either an if statement or invalid input; you don't need to go find the other end of it to see if it might be something else. This design is very clear in  pascal, where every definition of anything starts with a keyword indicating exactly what you are defining: procedure or function, or variable,constant, or type; and that in turn tells the parser what to expect next [Ada too, I think]. In D, the parser needs to work a little harder to figure things out, but you have less clutter.
> 
> So the question is, why define a bunch of keywords which are not only unnecessary to the parsing process, but actually complicate the grammar? Anywhere in D where I can use 'int', I can also use 'myint', which is an identifier that I have aliased to 'int'. So the parser needs to understand every possible such construct where the type name is an identifier, and it needs additional rules to understand them when they are keywords. As I've mentioned previously, this doesn't just lead to a more complex parser, it also leads to inferior diagnostic messages.
> 
> [*] I've been reading the manual a bit more, and I've found that D already has a built-in type implemented as a predefined identifier: Object. So why are all the other ones keywords?

Perhaps I am using an unusual definition.  E.g., I consider all of the words built into Forth to be keywords.  Note that you can, at your own risk, override any of them.  Forth has almost no syntax, it's all subsumed into the definitions of the words.  I consider a keyword to be anything that the compiler (or interpreter) knows what means.  Examples from D include not only things like int and uint, but also import, struct, etc.  With more syntax you need fewer keywords.  Perhaps Snobol is an example here.  (I don't really remember it clearly, but my impression was that it has LOTS of syntax, and few keywords.)

Note that this "Syntax" isn't an unified thing.  Ada has lots of syntax around storage allocation, but relatively few keywords, even though it allows you to specify such things as "This type denotes things that take up 37 bits and are floating point numbers with 3 digits of precision."  Just imagine the amount of work it would take to create such a type in D.  (Well, also imagine just how often it would be needed.)  D has chosen to PREDEFINE several "types" as keywords.  The other types are created by combining the primitive types.  One could argue, perhaps, that complex is a redundant type...but it can be very convenient.

For that matter, I occasionally wish that D had a bit more syntax around building types.  I'd like to be able to define a string class that has string literals.  (Others have uttered similar wishes, with perhaps a different idea of precisely what a string class would look like.)

What did you mean by keyword?

July 21, 2005

Re: why are types all keywords?

Posted by Greg Smith
in reply to Charles Hixson

Greg Smith

Posted in reply to Charles Hixson

Charles Hixson wrote:

> Greg Smith wrote:
> 
>> Charles Hixson wrote:
>>
>>> Greg Smith wrote:
>>>
>>> D doesn't have all the syntax that some languages (I'm thinking of Ada here) have which would allow you to specify how many bits a particular type should have, what value range it should allow, etc.  As a result all of the basic space allocating words need to by keywords.
>>>
>>> A type basically means:
>>> 1) reserve this space.
>>> 2) define these operations over this space
>>>
>>> Things get a bit more complex when we start thinking about where the space is allocated, how it interacts with other types, and how we pass it as a parameter, but those are the basics.
>>
>>
>> These are semantic issues which have absolutely nothing to do with whether the type names are keywords.
>>
>>>
>>> D has a simple (relatively simple) syntax.  As a result, it needs a large number of keywords.
>>
>> This makes no sense at all; you must be using an unusual definition of 'keyword'. Languages with simpler syntax generally need
>> fewer keywords (example: python; extreme example: lisp). Furthermore,
>> the grammar of D would be simpler still if the type names became built-in identifiers (I mean the grammar in the compiler; the one known to the user would be effectively unchanged). Ada has a lot of keywords too; but the type names are not among them, since (as in pascal and D) they don't need to be. C is actually a anomaly in this sense, and C++ inherited the anomaly. D has inherited the practice[*], while specifically shaking off the necessity, this is what puzzles me.
>>
>> A keyword is a specific combination of letters (e.g. 'if', 'goto') which is recognized by the lexical scanner as having a distinct significance no matter where it appears in the token sequence, despite the fact that it follows the general rule defining how an 'identifier' is formed.
>>
>> Keywords are assigned significance before their situation relative to the other tokens is analyzed (i.e. prior to the parser), whereas other
>> identifers are assigned meaning after the parsing process. When meaning
>> is assigned later, it is possible to apply sophisticated rules to the process (e.g. 'mtype' might be a function name at global scope, but also
>> defined as an alias type inside a function, and at the same time be the
>> name of members of several structs).
>>
>> [None of this is immutable law of language design, it's just the way
>> modern languages are designed and parsed, and the terminology which is used. C and C++, in fact, require bending of these rules, which is generally viewed as a problem: Once a typedef is defined, references to it must be identified as such *prior* to the parser; since this requires scopes to be considered, and scopes are defined by the parsing process, this can be tricky].
>>
>> When a word is a keyword, it's a keyword everywhere. So why define
>> keywords at all? The conventional language-design practice is that you define keywords as needed to make the language parseable. The 'if' keyword tells the parser to expect the structure of an 'if' statement.
>> An example of the opposite approach is FORTRAN, which was designed well before formal grammars had found their way into computer programming. In
>> FORTRAN there are no keywords, and spaces have no significance. As a result, FORTRAN is quite difficult to parse, even though the process needs to be done only on one line at a time. Consider:
>>
>> 100 FORMAT(I2,I3)
>> 100 FORMAT(I2,I3)=0
>>     DO 100 I=1,20
>>     DO 100 I=1.20
>>
>> The first is a 'format' statement for output formatting, and the second is an assignment to an element of a 2d array called 'FORMAT'.  The third
>> is a do loop, and the fourth is an assigment to 'DO10I'. In order to distingush these, a fortran compiler basically has to dither back and forth over the entire line, trying to figure out what the heck the thing is. The analogous behaviour in a language like D, which is not split into lines, would be to dither over the entire source file, making guesses about what things are and checking if those guesses still work when inner levels are analyzed. Ugh.
>>
>> By having strategically positioned keywords, you can parse powerful grammars, with complex nested structure, in a more-or-less left-to-right fashion. Whenever you see 'if' sitting there, what follows is either an if statement or invalid input; you don't need to go find the other end of it to see if it might be something else. This design is very clear in  pascal, where every definition of anything starts with a keyword indicating exactly what you are defining: procedure or function, or variable,constant, or type; and that in turn tells the parser what to expect next [Ada too, I think]. In D, the parser needs to work a little harder to figure things out, but you have less clutter.
>>
>> So the question is, why define a bunch of keywords which are not only unnecessary to the parsing process, but actually complicate the grammar? Anywhere in D where I can use 'int', I can also use 'myint', which is an identifier that I have aliased to 'int'. So the parser needs to understand every possible such construct where the type name is an identifier, and it needs additional rules to understand them when they are keywords. As I've mentioned previously, this doesn't just lead to a more complex parser, it also leads to inferior diagnostic messages.
>>
>> [*] I've been reading the manual a bit more, and I've found that D already has a built-in type implemented as a predefined identifier: Object. So why are all the other ones keywords?
> 
> 
> Perhaps I am using an unusual definition.  E.g., I consider all of the words built into Forth to be keywords.  Note that you can, at your own risk, override any of them.  Forth has almost no syntax, it's all subsumed into the definitions of the words.  I consider a keyword to be anything that the compiler (or interpreter) knows what means.  Examples from D include not only things like int and uint, but also import, struct, etc.  With more syntax you need fewer keywords.  Perhaps Snobol is an example here.  (I don't really remember it clearly, but my impression was that it has LOTS of syntax, and few keywords.)
> 
> Note that this "Syntax" isn't an unified thing.  Ada has lots of syntax around storage allocation, but relatively few keywords, even though it allows you to specify such things as "This type denotes things that take up 37 bits and are floating point numbers with 3 digits of precision."  Just imagine the amount of work it would take to create such a type in D.  (Well, also imagine just how often it would be needed.)  D has chosen to PREDEFINE several "types" as keywords.  The other types are created by combining the primitive types.  One could argue, perhaps, that complex is a redundant type...but it can be very convenient.
> 
I think you are making this much more complicated than it is. I'm proposing changing 'wchar' etc from a keyword to a predefined identifier. So it's still predefined, this doesn't affect anything you've discussed in the previous paragraph. All existing D code would be unaffected.

> For that matter, I occasionally wish that D had a bit more syntax around building types.  I'd like to be able to define a string class that has string literals.  (Others have uttered similar wishes, with perhaps a different idea of precisely what a string class would look like.)
> 
> What did you mean by keyword?
> 
I think I made that pretty clear in my last post, quoted above, but I'll try again.
Conventional terminology is that a keyword is a sequence of letters taken away from the allowed set of identifiers, or names, and effectively used as a nicely readable punctuation mark.  You cannot redefine a keyword in any context, since it's recognized as a keyword before its context is considered.

By contrast, you can have identifiers which are reserved in specific contexts without being keywords. For instance, in C++, it would be possible to remove 'this' from the keyword list, so that it could be used in other contexts, such as a parameter in a non-member function,
or a struct member name. In member functions, 'this' would be an implicitly declared parameter.
I'm not suggesting this is a good idea; my point is, that the language would be no harder to parse, since 'this' is syntactically allowed only
in places where you can use an identifier; and changing 'this' to a local variable name doesn't change the meaning of any construct to the point where a different parse would be desirable. Such a change would not break any existing code, but it would allow code which is currently illegal C++ (including some legal C code).

By this definition, forth (like postscript) has no keywords at all, and (also like postscript) virtually no grammar, thus no need for keywords. [So it's quite possible that the term 'keyword' could take on a different meaning in various discussions of forth...]

Side note, anybody remember 'small c'? this was a sort-of-C compiler for 8080 which, by negligence rather than intent, let you freely redefine most keywords, since it didn't really have a lexer separate from the parser. The parser had things like this:

    /* expect a statement */
    if( next_token("{") ) { /* compound statement */
    ...
    }else if ( next_word_is("while") ){
            /* it's a while statement */
             expect("(");
             ...

So, you could define 'int while' as a variable, and reference it, etc, but any statement starting in 'while' ( e.g. while=0;) would be disallowed because the code above would detect it.  I discovered this after using 'switch' as a global variable. A very 'interesting' compiler in many ways.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation