Jump to page: 1 24  
Page
Thread overview
So, You Want To Write Your Own Programming Language?
Jan 22, 2014
Walter Bright
Jan 22, 2014
Jacob Carlborg
Jan 22, 2014
Uplink_Coder
Jan 22, 2014
bearophile
Jan 22, 2014
Dejan Lekic
Jan 22, 2014
Paulo Pinto
Jan 22, 2014
Don
Jan 22, 2014
bearophile
Jan 22, 2014
Chris
Jan 22, 2014
bearophile
Jan 22, 2014
Casper Færgemand
Jan 22, 2014
Walter Bright
Jan 23, 2014
Chris
Jan 23, 2014
Nick Sabalausky
Jan 24, 2014
Chris
Jan 24, 2014
Mike James
Jan 27, 2014
Kagamin
Jan 28, 2014
Chris Cain
Jan 31, 2014
Kagamin
Jan 22, 2014
Paulo Pinto
Jan 22, 2014
Don
Jan 22, 2014
Walter Bright
Jan 22, 2014
Walter Bright
Jan 24, 2014
Steve Teale
Jan 24, 2014
Walter Bright
Feb 04, 2014
Walter Bright
Feb 04, 2014
Gary Willoughby
Feb 04, 2014
Walter Bright
January 22, 2014
http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/
January 22, 2014
On 2014-01-22 05:29, Walter Bright wrote:
> http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/

From the article: "Regex is just the wrong tool for lexing and parsing."

I'm wonder why is there so many books about implementing compilers that spends, usually, quite a large chapter about regular expressions?

-- 
/Jacob Carlborg
January 22, 2014
Walter Bright:

> http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/

Thank you for the simple nice article.


>The poisoning approach. [...] This is the approach we've been using in the D compiler, and are very pleased with the results.<

Yet, even in D most of the error messages after the first few ones are often not so useful to me. So perhaps I'd like a compiler switch to show only the first few error messages and then stop the compiler.


>Automated documentation generator. [...] Before Ddoc, the documentation had only a random correlation with the code, and too often, they had nothing to do with each other. After Ddoc, the two were brought in sync.<

And now the situation is even better, we have documentation unittests and the function arguments are verified to be in sync with their ddoc comment. Probably there's some space for further improvements.


>One semantic technique that is obvious in hindsight (but took Andrei Alexandrescu to point out to me) is called "lowering."<

In Haskell the GHC compiler goes one step further, it translates all the Haskell code into an intermediate code named Core, that is not the language of a virtual machine, it's still a functional language, but it's simpler, lot of the syntax differences between language constructs is reduced to a much reduced number of mostly functional stuff.


>My general rule is if the explanation for what the function does is more lines than the implementation code, then the function is likely trivia and should be booted out.<

In Haskell there's a standard module named Prelude, it's imported on default and defined lot of functions, etc of general use. Most functions in it are only few lines long (often 2-3 lines long, with some functions up to 10-13 lines long).


Bonus: the cute idea of a language for students:
http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c


(On Reddit I seem to see some comments, like structs not allowing constructors?)

Bye,
bearophile
January 22, 2014
> On Wednesday, 22 January 2014 at 10:36:31 UTC, Jacob Carlborg wrote:

> I'm wonder why is there so many books about implementing compilers that spends, usually, quite a large chapter about regular expressions?

I wonder about that too. For anything halfway useful regex has too much limitations. Wich you only find out in later chapter or pretty soon in your parser :D
January 22, 2014
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright wrote:
> http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/


"A good syntax needs redundancy in order to diagnose errors and give good error messages."

This is also true of natural languages. The higher the redundancy, the easier it is to guess or reconstruct what a person tried to say (in a noisy environment) or write (if the message gets messed up somehow). Texts in highly inflectional languages (like German) can be "recovered" with higher accuracy than texts in English.

If grammatical relations are no longer expressed by inflectional endings (as is often the case in English), the word order is crucial.

"The dog bit the man."

In Latin and German you can turn the statement around and still know who bit who(m).

Over the centuries, natural languages have reduced redundancy, but there are still loads of redundancies e.g. "two cats" (it would be enough to say "two cat", which some languages actually do, see also "a 15 _year_ old girl).

Syntax is getting simplified due to the fact that the listener "knows what we mean", e.g. "buy one get one free". I wonder to what extent languages will be simplified one day. But this is a topic for a whole book ...
January 22, 2014
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright wrote:
> http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/

Great article!
January 22, 2014
Chris:

> "A good syntax needs redundancy in order to diagnose errors and give good error messages."

I'd like to measure this statement experimentally: are error messages in Go and Scala any worse because of the optional use of semicolons? My initial supposition is that the answer is negative.

Bye,
bearophile
January 22, 2014
On Wednesday, 22 January 2014 at 04:29:05 UTC, Walter Bright wrote:
> http://www.reddit.com/r/programming/comments/1vtm2l/so_you_want_to_write_your_own_language_dr_dobbs/

Great article. I was surprised that you mentioned lowering positively, though.

I think from DMD we have enough experience to say that although lowering sounds good, it's generally a bad idea. It gives you a mostly-working prototype very quickly, but you pay a heavy price for it. It destroys valuable semantic information. You end up with poor quality error messages, and counter-intuitively, you can end up with _more_ special cases (eg, lowering ref-foreach in DMD means ref local variables can spread everywhere). And it reduces possibilities for the optimizer.

In DMD, lowering has caused *major* problems with AAs, foreach. and builtin-functions, and some of the transformations that the inliner makes. It's also caused problems with postincrement and exponentation. Probably there are other examples.

It seems to me that what does make sense is to perform lowering as the final step before passing the code to the backend. If you do it too early, you're shooting yourself in the foot.
January 22, 2014
On Wednesday, 22 January 2014 at 11:59:30 UTC, bearophile wrote:
> I'd like to measure this statement experimentally: are error messages in Go and Scala any worse because of the optional use of semicolons? My initial supposition is that the answer is negative.

Error messages in SML are either really neat or catastrophic.
January 22, 2014
On Wednesday, 22 January 2014 at 10:38:40 UTC, bearophile wrote:
>
> In Haskell the GHC compiler goes one step further, it translates all the Haskell code into an intermediate code named Core, that is not the language of a virtual machine, it's still a functional language, but it's simpler, lot of the syntax differences between language constructs is reduced to a much reduced number of mostly functional stuff.
>

Same story is with Erlang as far as I know.
« First   ‹ Prev
1 2 3 4