Request for comments: std.d.lexer (page 18)

On 2013-02-09 16:10, Andrei Alexandrescu wrote: > Requiring a random-access range of ubyte with a terminating zero may be > the most general approach to a fast lexer - and that's fine. Requiring a random-access range probably makes it easier. People here seems to try to support ranges with less functionality, like input or forward ranges. -- /Jacob Carlborg

On 2013-02-09 15:37, Andrei Alexandrescu wrote: > That's not a problem. You may require that the input has a terminating > zero. You cannot just append a terminating zero in the lexer if it's a range? -- /Jacob Carlborg

On 2/9/13 10:37 AM, Jacob Carlborg wrote: > On 2013-02-09 16:10, Andrei Alexandrescu wrote: > >> Requiring a random-access range of ubyte with a terminating zero may be >> the most general approach to a fast lexer - and that's fine. > > Requiring a random-access range probably makes it easier. People here > seems to try to support ranges with less functionality, like input or > forward ranges. Yah. The way I see it is, start with a random-access range and then see what the use patterns are. Then possibly refine. Andrei

On 2/9/13 10:38 AM, Jacob Carlborg wrote: > On 2013-02-09 15:37, Andrei Alexandrescu wrote: > >> That's not a problem. You may require that the input has a terminating >> zero. > > You cannot just append a terminating zero in the lexer if it's a range? You require it. It's client's burden. Andrei

On Friday, 8 February 2013 at 16:00:05 UTC, Dmitry Olshansky wrote: > 08-Feb-2013 19:52, Iain Buclaw пишет: >> That's still lies. :o) >> > > g++ ? :) > > >> >> -- >> Iain Buclaw >> >> *(p < e ? p++ : p) = (c & 0x0f) + '0'; I'd think it's built by DMC++, Digital Mars C++ compiler.

09-Feb-2013 19:46, Andrei Alexandrescu пишет: > On 2/9/13 10:37 AM, Jacob Carlborg wrote: >> On 2013-02-09 16:10, Andrei Alexandrescu wrote: >> >>> Requiring a random-access range of ubyte with a terminating zero may be >>> the most general approach to a fast lexer - and that's fine. >> >> Requiring a random-access range probably makes it easier. People here >> seems to try to support ranges with less functionality, like input or >> forward ranges. > > Yah. The way I see it is, start with a random-access range and then see > what the use patterns are. Then possibly refine. > I don't get this. There is no sensible requirement to forbid non-random access. A couple of static ifs and you are done. And I can confidently say that std.d.lexer has quite some room for optimization in both cases and it doesn't have to sacrifice the generic path. I intent to continue hacking on Brain's implementation and to help him refine it. Any real help (as in work and analysis) is appreciated, thanks. -- Dmitry Olshansky

09-Feb-2013 23:20, SomeDude пишет: > On Friday, 8 February 2013 at 16:00:05 UTC, Dmitry Olshansky wrote: >> 08-Feb-2013 19:52, Iain Buclaw пишет: > >>> That's still lies. :o) >>> >> >> g++ ? :) >> >> >>> >>> -- >>> Iain Buclaw >>> >>> *(p < e ? p++ : p) = (c & 0x0f) + '0'; > > I'd think it's built by DMC++, Digital Mars C++ compiler. On linux, sure ;) -- Dmitry Olshansky

February 09, 2013

Sentinel InputRange ?

Posted by Walter Bright
in reply to Andrei Alexandrescu

Permalink

Walter Bright

Posted in reply to Andrei Alexandrescu

Permalink

On 2/9/2013 6:37 AM, Andrei Alexandrescu wrote:
> On 2/9/13 3:07 AM, Jonathan M Davis wrote:
>> On Friday, February 08, 2013 23:48:50 Walter Bright wrote:
>>> On 2/8/2013 12:01 AM, Jonathan M Davis wrote:
>>>> It's not quite a use case where
>>>> ranges shine - especially when efficiency is a top priority.
>>>
>>> A more problematic case is dmd's lexer relies on a 0 byte at the end to be a
>>> "sentinel" for the end of file. Without such a sentinel, every consumption
>>> of a source character requires two operations rather than one.
>>
>> I didn't know that. That's a cute trick. But yeah, without controlling the
>> input, it's not possible, and that won't work with a general implementation in
>> a library. It would be trivial enough to put a wrapper around the input to add
>> the 0 byte at the end, but the wrapper would still have to keep checking for
>> empty, so you wouldn't gain anything.
>
> That's not a problem. You may require that the input has a terminating zero.

Perhaps we can formalize this a bit. Define a SentinelInputRange, which has an additional manifest constant with the name 'sentinel'. empty becomes defined as:

    empty = front == sentinel;

popFront() does not advance if empty. Advancing over a SentinelInputRange can be done the usual:

    for (r = range; !r.empty; r.popFront()) { auto e = r.front; }

or can be done as:

    for (r = range; r.front != sentinel; r.popFront()) { auto e = r.front; }

This should work well with 0 terminated strings. The output of the lexer should also be a SentinelInputRange, with TOKeof as the sentinel.

On 2/9/2013 12:43 PM, Walter Bright wrote:> Perhaps we can formalize this a bit. Define a SentinelInputRange, which has an > additional manifest constant with the name 'sentinel'. empty becomes defined as: > > empty = front == sentinel; > > popFront() does not advance if empty. Advancing over a SentinelInputRange can be > done the usual: > > for (r = range; !r.empty; r.popFront()) { auto e = r.front; } > > or can be done as: > > for (r = range; r.front != sentinel; r.popFront()) { auto e = r.front; } If it's not clear, SentinelInputRange.front can be called *before* checking empty. If it is empty, then the sentinel is returned.

Forums