Request for comments: std.d.lexer (page 4)

> The hint is that your question is a bit faulty: by calling it "the D grammar" do you mean the exact one listed on the website or any equivalent that parses the same language (including the ones obtained by simple transformations)? The latter. The one I use for Pegged to generate (what is hopefully) a D parser is already modified, discards constructs like NameList := Name NameList in favor of Name+ Anyway, let's stop here. Back to lexing proper :)

On Sunday, 27 January 2013 at 19:46:12 UTC, Walter Bright wrote: > On 1/27/2013 1:51 AM, Brian Schott wrote: >> I'm interested in ideas on the API design and other high-level issues at the >> moment. I don't consider this ready for inclusion. (The current module being >> reviewed for inclusion in Phobos is the new std.uni.) > > Just a quick comment: byToken() should not accept a filename. It's input should be via an InputRange, not a file. The file name is accepted for eventual error reporting purposes. The actual input for the lexer is the parameter called "range". Regarding the times that I posted, my point was that it's not slower than "dmd -c", nothing more.

On 01/27/2013 10:39 PM, Brian Schott wrote: > ... > > Regarding the times that I posted, my point was that it's not slower > than "dmd -c", nothing more. Sure. The point you brought across, however, was that it is not significantly faster yet. :o)

On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote: > On 1/27/2013 1:39 PM, Brian Schott wrote: >> The file name is accepted for eventual error reporting purposes. > > Use an OutputRange for that. What about that delegate-based design? I thought everyone agreed that it was nice? David

On 1/27/2013 4:48 PM, David Nadlinger wrote: > On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote: >> On 1/27/2013 1:39 PM, Brian Schott wrote: >>> The file name is accepted for eventual error reporting purposes. >> >> Use an OutputRange for that. > > What about that delegate-based design? I thought everyone agreed that it was nice? An OutputRange is a way of doing that. The advantage of OutputRange's is that is TheWayToDoThings in Phobos so that components can all interoperate and plug into each other.

January 28, 2013

Re: Request for comments: std.d.lexer

Posted by Brian Schott
in reply to Walter Bright

Permalink

Brian Schott

Posted in reply to Walter Bright

Permalink

On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
> On 1/27/2013 1:39 PM, Brian Schott wrote:
>> The file name is accepted for eventual error reporting purposes.
>
> Use an OutputRange for that.

I think you misunderstand. The file name is so that if you pass in "foo.d" the lexer can say "Error: unterminated string literal beginning on line 123 of foo.d". It's not so that error messagaes will be written to a file of that name.

On the topic of performance, I realized that the numbers posted previously were actually for a debug build. Fail.

For whatever reason, the current version of the lexer code isn't triggering my heisenbug[1] and I was able to build with -release -inline -O.

Here's what avgtime has to say:

$ avgtime -q -h -r 200 dscanner --tokenCount ../phobos/std/datetime.d

------------------------
Total time (ms): 51409.8
Repetitions    : 200
Sample mode    : 250 (169 ocurrences)
Median time    : 255.57
Avg time       : 257.049
Std dev.       : 4.39338
Minimum        : 252.931
Maximum        : 278.658
95% conf.int.  : [248.438, 265.66]  e = 8.61087
99% conf.int.  : [245.733, 268.366]  e = 11.3166
EstimatedAvg95%: [256.44, 257.658]  e = 0.608881
EstimatedAvg99%: [256.249, 257.849]  e = 0.800205
Histogram      :
    msecs: count  normalized bar
      250:   169  ########################################
      260:    22  #####
      270:     9  ##

Which works out to 1,327,784 tokens per second on my Ivy Bridge i7.

I created a small program that demangles the output of valgrind so that tools like KCachegrind can display profiling information more clearly. It's now on the wiki[2]

The bottleneck in std.d.lexer as it stands is the appender instances that assemble Token.value during iteration and front() on the array of char[]. (As I'm sure everyone expected)

[1] http://forum.dlang.org/thread/bug-9353-3@http.d.puremagic.com%2Fissues%2F
[2] http://wiki.dlang.org/Other_Dev_Tools

On 1/27/2013 4:53 PM, Brian Schott wrote: > On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote: >> On 1/27/2013 1:39 PM, Brian Schott wrote: >>> The file name is accepted for eventual error reporting purposes. >> >> Use an OutputRange for that. > > I think you misunderstand. The file name is so that if you pass in "foo.d" the > lexer can say "Error: unterminated string literal beginning on line 123 of > foo.d". It's not so that error messagaes will be written to a file of that name. Yes, I did misunderstand. I suggest updating the documentation to clear up the misunderstanding.

On Monday, 28 January 2013 at 00:53:03 UTC, Brian Schott wrote: > On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote: >> On 1/27/2013 1:39 PM, Brian Schott wrote: >>> The file name is accepted for eventual error reporting purposes. >> >> Use an OutputRange for that. > > I think you misunderstand. The file name is so that if you pass in "foo.d" the lexer can say "Error: unterminated string literal beginning on line 123 of foo.d". It's not so that error messagaes will be written to a file of that name. > I don't think that is a good idea. For instance mixin need to be lexed but don't come from a file. The lexer should report the error, what is done on error is up to the user of the lexer.

On Monday, 28 January 2013 at 00:51:28 UTC, Walter Bright wrote: > On 1/27/2013 4:48 PM, David Nadlinger wrote: >> On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote: >>> On 1/27/2013 1:39 PM, Brian Schott wrote: >>>> The file name is accepted for eventual error reporting purposes. >>> >>> Use an OutputRange for that. >> >> What about that delegate-based design? I thought everyone agreed that it was nice? > > An OutputRange is a way of doing that. The advantage of OutputRange's is that is TheWayToDoThings in Phobos so that components can all interoperate and plug into each other. I was talking about the design you proposed yourself here: http://forum.dlang.org/post/jvp9ke$2m45$1@digitalmars.com Oh, and you really don't need to give me the basic Phobos/ranges sales pitch, I think I'm quite aware of their advantages. I'm just not sure that e.g. having an "exception thrower" output range would be a wise design decision. David

Forums