April 10, 2007
Paul Findlay Wrote:

> > Particular generators that spark my interest tend to have to do with x87, SSE extensions, and x86-64.  Most compilers to date don't properly use these functionalities.  Having them integrated into D Agner Fog-optimally would hugely replace alot of C++ gaming, video, rendering and graphics engine code.
> And its dawning on me that some text processing can take advantage of doing 64-bit/128-bit chunks at a time

Yeah, pretty much any large (>1kb) buffer copy algorithm runs fastest through SSE2 ever since 128-bit came out.  The raw power overcomes the need for tweaking the loop instead of just using rep movsd.

I think the asm guys have the best code we could hope for already written up on a few pages.  Agner Fog, Paul Hsieh et al.  If we could leech their strategy (if meaning legal) then I can see a good use for such a thing.  I guess we just ought to wait for AST reflection.
April 10, 2007
Don Clugston wrote:
> The problem I was referring to, is: how to store both values, and functions/operators, inside the tree. It seems to get messy very quickly.

It can, especially with the additional code you need to treat all those templates as specialized data types.

> I meant for this application. There's no doubt they're indispensable in other contexts.

Oh, my bad.  Yea, it would probably be overkill for BLADE. :)

>> Basically what you see here
>> is a chunk of the as-of-yet-experimental compile-time Enki parser.  This piece parses the Zero-or-more expression part of the EBNF variant that Enki supports.
>>
>> The templates evaluate to CTFE's that in turn make up the parser when it's executed.  So there's several layers of compile-time evaluation going on here.  Also, what's not obvious is that "char[] tokens" is actually an *array of strings* that is stored in a single char[] array; each string's length data is encoded as a size_t mapped onto the appropriate number of chars.  The Bind!() expressions also map parsed out data to a key/value set which is stored in a similar fashion (char[] bindings).
> 
> Seriously cool! Seems like you're generating a tree of nested mixins?

Almost.  The generate.ZeroOrMore returns an arbitrary string, that may be another "map" or a chunk of runtime code; the value is placed in the "value" string that's passed in.  The *rootmost* generator is what compiles all this stuff into an actual chunk of mixin-able code.  It's analogous to how the current rendition of Enki works.

It seems overkill, but it's needed so I can do some basic semantic analysis, and do things like declare binding vars. I'm still looking for ways to simplify the process.

> Anyway, I suspect this will really benefit from any compiler improvements, eg CTFE support for AAs or nested functions. And obviously AST macros.

Definitely for AA's but not so much for AST macros - I get the impression that AST manipulation will only be useful for D code, and not for completely arbitrary grammars like EBNF.  Getting compile-time AA support would cut Enki's code size down by almost a third:

const char[] hackedArray = "\x00\x00\x00\x05hello\x00\x00\x00\x05cruel\x00\x00\x00\x05world";

Under the hood, this is what most of my data looks like.  Dropping the support routines for manipulating such structures would help make things a lot less cumbersome. :(

> I really think the new mixins + CTFE was a breakthrough. Publish some of this stuff, and I think we'll see an exodus of many Boost developers into D.

Agreed.  CTFE frees us from many limitations imposed by templates, and the added flexibility of mixin() pretty much gives us the same strength of javascript's eval() statement at compile-time.  It's a huge deal.

It's possible that we might find a handful of interested devs in their camp, but I think we're still back to the same old problem as with the rest of C++: momentum.  I sincerely doubt we'll see a mass exodus from one camp to this one regardless of how good a job is done here.  Of course, I'll be happy to be wrong about that. :)
-- 
- EricAnderton at yahoo
April 10, 2007
KlausO wrote:
> 
> Hey pragma,
> 
> really cool, I've come up with a similar structure while experimenting
> with a PEG parser in D (see attachment)
> after I've read this article series on
> Codeproject:
> 
> http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
> http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
> http://www.codeproject.com/cpp/crafting_interpreter_p3.asp
> 
> The template system of D does an awesome job in keeping
> templated PEG grammars readable.

Yes it does!  Thanks for posting this - I didn't even know that article was there.

> BTW: If you turn Enki into a PEG style parser I definitely throw
> my attempts into the dustbin :-)
> Greets

Wow, that's one heck of an endorsement.  Thanks, but don't throw anything out yet.  This rendition of Enki is still a ways off though.  FYI I plan on keeping Enki's internals as human-readable as possible by keeping it self-hosting.  So there'll be two ways to utilize the toolkit: EBNF coding and "by hand".

> alias   Action!(
>           And!(
>             PlusRepeat!(EmailChar),
>             Char!('@'),             PlusRepeat!(
>               And!(
>                 Or!(
>                   In!(
>                     Range!('a', 'z'),
>                     Range!('A', 'Z'),
>                     Range!('0', '9'),
>                     Char!('_'),
>                     Char!('%'),
>                     Char!('-')
>                   ),
>                   Char!('.')
>                 ),
>                 Not!(EmailSuffix)
>               )
>             ),
>             Char!('.'),
>             EmailSuffix
>           ),
>           delegate void(char[] email) { writefln("<email:", email, ">"); }
>         )
>         Email;

I had an earlier cut that looked a lot like this. :)  But there's a very subtle problem lurking in there.  By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*.  As a result it failed when I tried to transcribe Enki's ENBF definition.  That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.

-- 
- EricAnderton at yahoo
April 10, 2007
Pragma schrieb:
> KlausO wrote:
>>
>> Hey pragma,
>>
>> really cool, I've come up with a similar structure while experimenting
>> with a PEG parser in D (see attachment)
>> after I've read this article series on
>> Codeproject:
>>
>> http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
>> http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
>> http://www.codeproject.com/cpp/crafting_interpreter_p3.asp
>>
>> The template system of D does an awesome job in keeping
>> templated PEG grammars readable.
> 
> Yes it does!  Thanks for posting this - I didn't even know that article was there.
> 
>> BTW: If you turn Enki into a PEG style parser I definitely throw
>> my attempts into the dustbin :-)
>> Greets
> 
> Wow, that's one heck of an endorsement.  Thanks, but don't throw anything out yet.  This rendition of Enki is still a ways off though.  FYI I plan on keeping Enki's internals as human-readable as possible by keeping it self-hosting.  So there'll be two ways to utilize the toolkit: EBNF coding and "by hand".

Nice to hear that I hit your taste :-)

> 
>> alias   Action!(
>>           And!(
>>             PlusRepeat!(EmailChar),
>>             Char!('@'),             PlusRepeat!(
>>               And!(
>>                 Or!(
>>                   In!(
>>                     Range!('a', 'z'),
>>                     Range!('A', 'Z'),
>>                     Range!('0', '9'),
>>                     Char!('_'),
>>                     Char!('%'),
>>                     Char!('-')
>>                   ),
>>                   Char!('.')
>>                 ),
>>                 Not!(EmailSuffix)
>>               )
>>             ),
>>             Char!('.'),
>>             EmailSuffix
>>           ),
>>           delegate void(char[] email) { writefln("<email:", email, ">"); }
>>         )
>>         Email;
> 
> I had an earlier cut that looked a lot like this. :)  But there's a very subtle problem lurking in there.  By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*.  As a result it failed when I tried to transcribe Enki's ENBF definition.  That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.
> 

Another issue I ran into is circular template dependencies. You could get nasty error messages like

dpeg.d(282): Error: forward reference to 'And!(Alternative,OptRepeat!(And!(WS,Char!('/'),WS,Alternative)),WS)'

Any idea how they could be resolved ?


FYI:

Other good PEG references:
http://pdos.csail.mit.edu/~baford/packrat/

Cat is a open source interpreter which utilizes a rule based parser in C#
http://www.codeproject.com/csharp/cat.asp
very interesting had been
http://code.google.com/p/cat-language/wiki/HowTheInterpreterWorks

C++ templated parsers
http://www.codeproject.com/cpp/yard-xml-parser.asp
http://www.codeproject.com/cpp/biscuit.asp
April 10, 2007
KlausO wrote:
> Pragma schrieb:
>> KlausO wrote:
>>
>>> alias   Action!(
>>>           And!(
>>>             PlusRepeat!(EmailChar),
>>>             Char!('@'),             PlusRepeat!(
>>>               And!(
>>>                 Or!(
>>>                   In!(
>>>                     Range!('a', 'z'),
>>>                     Range!('A', 'Z'),
>>>                     Range!('0', '9'),
>>>                     Char!('_'),
>>>                     Char!('%'),
>>>                     Char!('-')
>>>                   ),
>>>                   Char!('.')
>>>                 ),
>>>                 Not!(EmailSuffix)
>>>               )
>>>             ),
>>>             Char!('.'),
>>>             EmailSuffix
>>>           ),
>>>           delegate void(char[] email) { writefln("<email:", email, ">"); }
>>>         )
>>>         Email;
>>
>> I had an earlier cut that looked a lot like this. :)  But there's a very subtle problem lurking in there.  By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*.  As a result it failed when I tried to transcribe Enki's ENBF definition.  That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.
>>
> 
> Another issue I ran into is circular template dependencies. You could get nasty error messages like
> 
> dpeg.d(282): Error: forward reference to 'And!(Alternative,OptRepeat!(And!(WS,Char!('/'),WS,Alternative)),WS)'
> 
> Any idea how they could be resolved ?

Yep, that one bit me too; just use the CTFE trick I mentioned.  Wrapping each rule fixes this since DMD can resolve forward references to functions, but not with template instances.

> 
> FYI:
> 
> Other good PEG references:
> http://pdos.csail.mit.edu/~baford/packrat/
> 
> Cat is a open source interpreter which utilizes a rule based parser in C#
> http://www.codeproject.com/csharp/cat.asp
> very interesting had been
> http://code.google.com/p/cat-language/wiki/HowTheInterpreterWorks
> 
> C++ templated parsers
> http://www.codeproject.com/cpp/yard-xml-parser.asp
> http://www.codeproject.com/cpp/biscuit.asp

Thanks.  I can always use a little more research to lean on.

-- 
- EricAnderton at yahoo
April 10, 2007
Don Clugston wrote:
> I've begun a draft. A historical question for you --
> 
> On this page,
> http://www.artima.com/cppsource/top_cpp_software.html
> Scott Meyers says that g++ was the first compiler to generate native code. But I thought Zortech was older than g++. Is that correct?

Depends on how you look at it. g++ was first 'released' in December, 1987. But the release notes for it say: "The GNU C++ Compiler is still in test release, and is NOT ready for everyday use" so I don't consider that a real release. Oregon Software released their native C++ compiler (not for the PC) in Jan or Feb 1988, nobody seems to recall exactly. Zortech's first release was in Jun 1988. Michael Tiemann, author of g++, calls the Sep 1988 release the "first really stable version."
April 10, 2007
Pragma wrote:
> By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*.  As a result it failed when I tried to transcribe Enki's ENBF definition.  That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.

One of the motivations for CTFE was to address that exact problem.
April 10, 2007
> KlausO wrote:
> 
>> Pragma schrieb:
>>>
>>> I had an earlier cut that looked a lot like this. :)  But there's a very subtle problem lurking in there.  By making your entire grammar one monster-sized template instance, you'll run into DMD's identifier-length limit *fast*.  As a result it failed when I tried to transcribe Enki's ENBF definition.  That's why I wrap each rule as a CTFE-style function as it side-steps that issue rather nicely, without generating too many warts.
>>>
>>
>> Another issue I ran into is circular template dependencies. You could get nasty error messages like
>>


The way my dparse sidesteps this (both the id length limit and forward references) is to have the parser functions specialized only on the name of the reduction, the grammar is carried in an outer scope (the mixed in template that is never used as an identifier). This also allows the inner template to specialize on anything that is defined.
1 2 3 4 5
Next ›   Last »