DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Announce » DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky (page 2)

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Joakim
in reply to Dicebot

Joakim

Posted in reply to Dicebot

On Tuesday, 10 June 2014 at 17:19:42 UTC, Dicebot wrote:
> On Tuesday, 10 June 2014 at 15:37:11 UTC, Andrei Alexandrescu wrote:
>> Watch, discuss, upvote!
>>
>> https://news.ycombinator.com/newest
>>
>> https://twitter.com/D_Programming/status/476386465166135296
>>
>> https://www.facebook.com/dlang.org/posts/863635576983458
>>
>> http://www.reddit.com/r/programming/comments/27sjxf/dconf_2014_day_1_talk_4_inside_the_regular/
>>
>>
>> Andrei
>
> http://youtu.be/hkaOciiP11c

Great talk, just finished watching the youtube upload.  I zoned out during the livestream, as it was late over here and I was falling asleep during this fairly technical talk, but now that I'm awake, enjoyed going through it.

Never knew how regular expression engines are implemented, good introduction to the topic and how D made your approach easier or harder.  A model talk for DConf, particularly given the great results on the regex-dna benchmark.

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Artur Skawina
in reply to Dmitry Olshansky

Artur Skawina

Posted in reply to Dmitry Olshansky

On 06/12/14 11:17, Dmitry Olshansky via Digitalmars-d-announce wrote:
> This one thing I'm loosing sleep over - what precisely is so good in CTFE code generation in _practical_ context (DSL that is quite stable, not just tiny helpers)?

Language integration; direct access to meta data (such as types, but
also constants).

> By the end of day it's just about having to write a trivial line in your favorite build system (NOT make) vs having to wait for a couple of minutes each build hoping the compiler won't hit your system's memory limits.

If it really was only about an extra makefile rule then CTFE wouldn't make much difference; it would just be an explicitly-requested smarter version of constant folding. But that is not the case.

Simple example: create a function that implements an algorithm
which is derived from some type given to it as input. /Derived/
does not mean that it only contains some conditionally executed
code that depends on some property of that type; it means that
the algorithm itself is determined from the type. With the
external-generator solution you can emit a templated function,
but what you can *not* do is emit code based on meta-data or
CT introspection - because the necessary data simply isn't
available when the external generator runs.
With CTFE you have direct access to all the data and generating
the code becomes almost trivial. It makes a night-and-day type of
difference.
While you could implement a sufficiently-smart-generator that could
handle some subset of the functionality of CTFE, it would be
prohibitively expensive to do so, wouldn't scale and would often be
pointless, if you had to resort to generating code containing mixin
expressions anyway. There's a reason why this isn't done in other
languages that don't have CTFE.

> Unless things improve dramatically CTFE code generation + mixin is just our funny painful toy.

The code snippets posted here are of course just toy programs. This does not mean that CTFE and mixins are merely toys, they enable writing code in ways that just isn't practically possible in other languages. The fact that there isn't much such publicly available code is just a function of D's microscopic user base.

Real Programmers write mixins that write mixins.

artur

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Dicebot
in reply to Dmitry Olshansky

Dicebot

Posted in reply to Dmitry Olshansky

On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
> This one thing I'm loosing sleep over - what precisely is so good in CTFE code generation in _practical_ context (DSL that is quite stable, not just tiny helpers)?
>
> By the end of day it's just about having to write a trivial line in your favorite build system (NOT make) vs having to wait for a couple of minutes each build hoping the compiler won't hit your system's memory limits.

Oh, this is a very good question :) There are two unrelated concerns here:

1)

Reflection. It is less of an issue for pure DSL solutions because those don't provide any good reflection capabilities anyway, but other code generation approaches have very similar problems.

By doing all code generation in separate build step you potentially lose many of guarantees of keeping various parts of your application in sync.

2)

Moving forward. You use traditional reasoning of DSL generally being something rare and normally stable. This fits most common DSL usage but tight in-language integration D makes possible brings new opportunities of using DSL and code generation casually all other your program.

I totally expect programming culture to evolve to the point where something like 90% of all application code is being generated in typical project. D has good base for promoting such paradigm switch and reducing any unnecessary mental context switches is very important here.

This was pretty much the point I was trying to make with my DConf talk ( and have probably failed :) )

> And these couple of minutes are more like 30 minutes at a times. Worse yet unlike proper build system it doesn't keep track of actual changes (same regex patterns get recompiled over and over), at this point seamless integration into the language starts felling like a joke.
>
> And speaking of seamless integration: just generate a symbol name out of pattern at CTFE to link to later, at least this much can be done relatively fast. And voila even the clunky run-time generation is not half-bad at integration.
>
> Unless things improve dramatically CTFE code generation + mixin is just our funny painful toy.

Unfortunately current implementation of frontend falls behind language capabilities a lot. There are no fundamental reasons why it can't work with better compiler. In fact, deadlnix has made a very good case for SDC taking over as next D frontend exactly because of things like CTFE JIT.

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Dicebot
in reply to Colin

Dicebot

Posted in reply to Colin

On Thursday, 12 June 2014 at 10:40:56 UTC, Colin wrote:
> Maybe a change to the compiler to write any mixin'd string out to a temporary file (along with some identifier information and the line of code that generated it) and at the next compilation time try reading it back from that file iff the line of code that generated it hasnt changed?
>
> Then, there'd be no heavy work for the compiler to do, apart from read that file in to a string.

Compiler can cache return value of function that get called from inside mixin statement (for a given argument set). As CTFE is implicitly pure (no global state at compile-time) later generated code can be simply re-used for same argument set.

Re-using it between compiler invocations is more tricky because it is only legal if generator function and all stuff they indirectly use have not changed too. Ignoring this requirement can result in nasty build issues that are only fixed by clean build. Too harmful in my opinion.

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Colin
in reply to Dicebot

Colin

Posted in reply to Dicebot

On Thursday, 12 June 2014 at 12:31:09 UTC, Dicebot wrote:
> On Thursday, 12 June 2014 at 10:40:56 UTC, Colin wrote:
>> Maybe a change to the compiler to write any mixin'd string out to a temporary file (along with some identifier information and the line of code that generated it) and at the next compilation time try reading it back from that file iff the line of code that generated it hasnt changed?
>>
>> Then, there'd be no heavy work for the compiler to do, apart from read that file in to a string.
>
> Compiler can cache return value of function that get called from inside mixin statement (for a given argument set). As CTFE is implicitly pure (no global state at compile-time) later generated code can be simply re-used for same argument set.
>
> Re-using it between compiler invocations is more tricky because it is only legal if generator function and all stuff they indirectly use have not changed too. Ignoring this requirement can result in nasty build issues that are only fixed by clean build. Too harmful in my opinion.

Yeah, it quite dangerous I agree. I was only thinking of a solution to the problem above where a ctRegex is compiled every time, whether it was changed or not.

I'm sure theres some way of keeping track of all dependent D modules filename, and if any of them have been changed in the chain, recalculate the string mixin.

Only trouble with that is, there'd be a good chunk of checking for every mixin, and would slow the compiler down in normal use cases.

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Timon Gehr
in reply to Dicebot

Timon Gehr

Posted in reply to Dicebot

On 06/12/2014 02:31 PM, Dicebot wrote:
> Compiler can cache return value of function that get called from inside
> mixin statement (for a given argument set). As CTFE is implicitly pure
> (no global state at compile-time) later generated code can be simply
> re-used for same argument set.
>
> Re-using it between compiler invocations is more tricky because it is
> only legal if generator function and all stuff they indirectly use have
> not changed too. Ignoring this requirement can result in nasty build
> issues that are only fixed by clean build. Too harmful in my opinion.

Clearly, nirvana is continuous compilation, where the compiler performs explicit dependency management at the level of nodes in the syntax tree.

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Dicebot
in reply to Timon Gehr

Dicebot

Posted in reply to Timon Gehr

On Thursday, 12 June 2014 at 12:49:23 UTC, Timon Gehr wrote:
> On 06/12/2014 02:31 PM, Dicebot wrote:
>> Compiler can cache return value of function that get called from inside
>> mixin statement (for a given argument set). As CTFE is implicitly pure
>> (no global state at compile-time) later generated code can be simply
>> re-used for same argument set.
> >
>> Re-using it between compiler invocations is more tricky because it is
>> only legal if generator function and all stuff they indirectly use have
>> not changed too. Ignoring this requirement can result in nasty build
>> issues that are only fixed by clean build. Too harmful in my opinion.
>
> Clearly, nirvana is continuous compilation, where the compiler performs explicit dependency management at the level of nodes in the syntax tree.

Yeah I was wondering if we can merge some of rdmd functionality into compiler to speed up rebuilds and do better dependency tracking. But I am not sure it can fit nicely into current frontend architecture.

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Adam D. Ruppe
in reply to Dmitry Olshansky

Adam D. Ruppe

Posted in reply to Dmitry Olshansky

On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
> This one thing I'm loosing sleep over - what precisely is so good in CTFE code generation in _practical_ context (DSL that is quite stable, not just tiny helpers)?

I've asked this same question before and my answer is mostly the same as dicebot: I think reflection is the important bit. Of course, even there it is sometimes useful to break it into two steps (one just prints the data out kinda like dmd -X then a regular program reads it and generates the code), but I find it really useful to read D code and generate stuff based on that.

> By the end of day it's just about having to write a trivial line in your favorite build system (NOT make)

it is actually pretty trivial in make too...

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Dmitry Olshansky
in reply to Dicebot

Dmitry Olshansky

Posted in reply to Dicebot

12-Jun-2014 16:25, Dicebot пишет:
> On Thursday, 12 June 2014 at 09:17:45 UTC, Dmitry Olshansky wrote:
>> This one thing I'm loosing sleep over - what precisely is so good in
>> CTFE code generation in _practical_ context (DSL that is quite stable,
>> not just tiny helpers)?
>>
>> By the end of day it's just about having to write a trivial line in
>> your favorite build system (NOT make) vs having to wait for a couple
>> of minutes each build hoping the compiler won't hit your system's
>> memory limits.
>
> Oh, this is a very good question :) There are two unrelated concerns here:
>

It's always nice to ask something on D NG, so many good answers I can hardly choose whom to reply ;) So this is kind of broadcast.

Yes, the answer seems spot on - reflection! But allow me to retort.

I'm not talking about completely stand-alone generator. Just as well generator tool could be written in D using the same exact sources as your D program does. Including the static introspection and type-awareness. Then generator itself is a library + "an invocation script" in D.

The Q is specifically of CTFE in this scenario, including not only obvious shortcomings of design, but fundamental ones of compilation inside of compilation. Unlike proper compilation is has nothing persistent to back it up. It feels backwards, a bit like C++ TMP but, of course, much-much better.

> 1)
>
> Reflection. It is less of an issue for pure DSL solutions because those
> don't provide any good reflection capabilities anyway, but other code
> generation approaches have very similar problems.
>
> By doing all code generation in separate build step you potentially lose
> many of guarantees of keeping various parts of your application in sync.
>

Use the same sources for the generator. In essence all is the same, just relying on separate runs and linkage, not mixin. Necessary "hooks" to link to later could indeed be generated with a tiny bit of CTFE.

Yes, deeply embedded stuff might not be that easy. The scope and damage is smaller though.

> 2)
>
> Moving forward. You use traditional reasoning of DSL generally being
> something rare and normally stable. This fits most common DSL usage but
> tight in-language integration D makes possible brings new opportunities
> of using DSL and code generation casually all other your program.
>

Well, I'm biased by heavy-handed ones. Say I have a (no longer) secret plan of doing a next-gen parser generator in D. Needless to say swaths of non-trivial code generation. I'm all for embedding nicely but I see very little _practical_ gains in CTFE+mixin here EVEN if CTFE wouldn't suck. See the point above about using the same metadata and types as the user application would.

> I totally expect programming culture to evolve to the point where
> something like 90% of all application code is being generated in typical
> project. D has good base for promoting such paradigm switch and reducing
> any unnecessary mental context switches is very important here.
>
> This was pretty much the point I was trying to make with my DConf talk (
> and have probably failed :) )

I liked the talk, but you know ... 4th or 5th talk with CTFE/mixin I think I might have been distracted :)

More specifically this bright future of 90%+ concise DSL driven programs is undermined by the simple truth - no amount of improvement in CTFE would make generators run faster then optimized standalone tool invocation. The tool (library written in D) may read D metadata just fine.

I heard D builds times are important part of its adoption so...

>
>> And these couple of minutes are more like 30 minutes at a times. Worse
>> yet unlike proper build system it doesn't keep track of actual changes
>> (same regex patterns get recompiled over and over), at this point
>> seamless integration into the language starts felling like a joke.
>>
>> And speaking of seamless integration: just generate a symbol name out
>> of pattern at CTFE to link to later, at least this much can be done
>> relatively fast. And voila even the clunky run-time generation is not
>> half-bad at integration.
>>
>> Unless things improve dramatically CTFE code generation + mixin is
>> just our funny painful toy.
>
> Unfortunately current implementation of frontend falls behind language
> capabilities a lot. There are no fundamental reasons why it can't work
> with better compiler.

It might solve most of _current_ problems, but I foresee fundamental issues of "no global state" in CTFE that in say 10 years from now would look a lot like `#include` in C++. A major one is there is no way for compiler to not recompile generated code as it has no knowledge of how it might have changed from the previous run.

> In fact, deadlnix has made a very good case for
> SDC taking over as next D frontend exactly because of things like CTFE JIT.

Yeah, we ought to help him!

-- 
Dmitry Olshansky

June 12, 2014

Re: DConf 2014 Day 1 Talk 4: Inside the Regular Expressions in D by Dmitry Olshansky

Posted by Andrei Alexandrescu
in reply to dennis luehring

Andrei Alexandrescu

Posted in reply to dennis luehring

On 6/12/14, 4:04 AM, dennis luehring wrote:
> you should write a big top post about your CTFE experience/problems - it
> is important enough

yes please

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation