January 27, 2013
> The hint is that your question is a bit faulty: by calling it "the D grammar" do you mean the exact one listed on the website or any equivalent that parses the same language (including the ones obtained by simple transformations)?

The latter. The one I use for Pegged to generate (what is hopefully) a D parser is already modified, discards constructs like NameList := Name NameList in favor of Name+

Anyway, let's stop here. Back to lexing proper :)
January 27, 2013
On Sunday, 27 January 2013 at 19:46:12 UTC, Walter Bright wrote:
> On 1/27/2013 1:51 AM, Brian Schott wrote:
>> I'm interested in ideas on the API design and other high-level issues at the
>> moment. I don't consider this ready for inclusion. (The current module being
>> reviewed for inclusion in Phobos is the new std.uni.)
>
> Just a quick comment: byToken() should not accept a filename. It's input should be via an InputRange, not a file.

The file name is accepted for eventual error reporting purposes. The actual input for the lexer is the parameter called "range".

Regarding the times that I posted, my point was that it's not slower than "dmd -c", nothing more.
January 27, 2013
On 01/27/2013 10:39 PM, Brian Schott wrote:
> ...
>
> Regarding the times that I posted, my point was that it's not slower
> than "dmd -c", nothing more.

Sure. The point you brought across, however, was that it is not significantly faster yet. :o)
January 27, 2013
On 1/27/2013 1:39 PM, Brian Schott wrote:
> The file name is accepted for eventual error reporting purposes.

Use an OutputRange for that.

January 28, 2013
On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
> On 1/27/2013 1:39 PM, Brian Schott wrote:
>> The file name is accepted for eventual error reporting purposes.
>
> Use an OutputRange for that.

What about that delegate-based design? I thought everyone agreed that it was nice?

David
January 28, 2013
On 1/27/2013 4:48 PM, David Nadlinger wrote:
> On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
>> On 1/27/2013 1:39 PM, Brian Schott wrote:
>>> The file name is accepted for eventual error reporting purposes.
>>
>> Use an OutputRange for that.
>
> What about that delegate-based design? I thought everyone agreed that it was nice?

An OutputRange is a way of doing that. The advantage of OutputRange's is that is TheWayToDoThings in Phobos so that components can all interoperate and plug into each other.

January 28, 2013
On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
> On 1/27/2013 1:39 PM, Brian Schott wrote:
>> The file name is accepted for eventual error reporting purposes.
>
> Use an OutputRange for that.

I think you misunderstand. The file name is so that if you pass in "foo.d" the lexer can say "Error: unterminated string literal beginning on line 123 of foo.d". It's not so that error messagaes will be written to a file of that name.

On the topic of performance, I realized that the numbers posted previously were actually for a debug build. Fail.

For whatever reason, the current version of the lexer code isn't triggering my heisenbug[1] and I was able to build with -release -inline -O.

Here's what avgtime has to say:

$ avgtime -q -h -r 200 dscanner --tokenCount ../phobos/std/datetime.d

------------------------
Total time (ms): 51409.8
Repetitions    : 200
Sample mode    : 250 (169 ocurrences)
Median time    : 255.57
Avg time       : 257.049
Std dev.       : 4.39338
Minimum        : 252.931
Maximum        : 278.658
95% conf.int.  : [248.438, 265.66]  e = 8.61087
99% conf.int.  : [245.733, 268.366]  e = 11.3166
EstimatedAvg95%: [256.44, 257.658]  e = 0.608881
EstimatedAvg99%: [256.249, 257.849]  e = 0.800205
Histogram      :
    msecs: count  normalized bar
      250:   169  ########################################
      260:    22  #####
      270:     9  ##

Which works out to 1,327,784 tokens per second on my Ivy Bridge i7.

I created a small program that demangles the output of valgrind so that tools like KCachegrind can display profiling information more clearly. It's now on the wiki[2]

The bottleneck in std.d.lexer as it stands is the appender instances that assemble Token.value during iteration and front() on the array of char[]. (As I'm sure everyone expected)

[1] http://forum.dlang.org/thread/bug-9353-3@http.d.puremagic.com%2Fissues%2F
[2] http://wiki.dlang.org/Other_Dev_Tools

January 28, 2013
On 1/27/2013 4:53 PM, Brian Schott wrote:
> On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
>> On 1/27/2013 1:39 PM, Brian Schott wrote:
>>> The file name is accepted for eventual error reporting purposes.
>>
>> Use an OutputRange for that.
>
> I think you misunderstand. The file name is so that if you pass in "foo.d" the
> lexer can say "Error: unterminated string literal beginning on line 123 of
> foo.d". It's not so that error messagaes will be written to a file of that name.

Yes, I did misunderstand. I suggest updating the documentation to clear up the misunderstanding.

January 28, 2013
On Monday, 28 January 2013 at 00:53:03 UTC, Brian Schott wrote:
> On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
>> On 1/27/2013 1:39 PM, Brian Schott wrote:
>>> The file name is accepted for eventual error reporting purposes.
>>
>> Use an OutputRange for that.
>
> I think you misunderstand. The file name is so that if you pass in "foo.d" the lexer can say "Error: unterminated string literal beginning on line 123 of foo.d". It's not so that error messagaes will be written to a file of that name.
>

I don't think that is a good idea. For instance mixin need to be lexed but don't come from a file.

The lexer should report the error, what is done on error is up to the user of the lexer.
January 28, 2013
On Monday, 28 January 2013 at 00:51:28 UTC, Walter Bright wrote:
> On 1/27/2013 4:48 PM, David Nadlinger wrote:
>> On Sunday, 27 January 2013 at 23:49:11 UTC, Walter Bright wrote:
>>> On 1/27/2013 1:39 PM, Brian Schott wrote:
>>>> The file name is accepted for eventual error reporting purposes.
>>>
>>> Use an OutputRange for that.
>>
>> What about that delegate-based design? I thought everyone agreed that it was nice?
>
> An OutputRange is a way of doing that. The advantage of OutputRange's is that is TheWayToDoThings in Phobos so that components can all interoperate and plug into each other.

I was talking about the design you proposed yourself here: http://forum.dlang.org/post/jvp9ke$2m45$1@digitalmars.com

Oh, and you really don't need to give me the basic Phobos/ranges sales pitch, I think I'm quite aware of their advantages. I'm just not sure that e.g. having an "exception thrower" output range would be a wise design decision.

David