February 28, 2013
On 2/28/2013 7:41 AM, Dmitry Olshansky wrote:
> The merit of sentinel range IMHO is largely in tapping into C-strings and the
> like more naturally.

The merit is having significantly faster speed.

February 28, 2013
On 2/28/2013 3:23 AM, deadalnix wrote:
> On Thursday, 28 February 2013 at 10:57:29 UTC, Walter Bright wrote:
>> Notice the double read of *p. This is what SentinelInputRange is trying to fix.
>>
>
> But it will be optimized away.

(c == 1 || c == 5) does not optimize to (c == 5).
February 28, 2013
Walter Bright:

> The merit is having significantly faster speed.

Maybe it's time to write two benchmarks to compare the two different speeds (and run the two benchmarks with DMD and LDC).

Bye,
bearophile
February 28, 2013
On 2/28/2013 6:31 AM, Steven Schveighoffer wrote:
> If this doesn't translate to the same code, I don't know why not.

Try it and see with your favorite C compiler.

Then try the lookahead cases I also posted.
February 28, 2013
28-Feb-2013 20:52, Walter Bright пишет:
> On 2/28/2013 3:29 AM, Jonathan M Davis wrote:
>> And you were just claiming that the lexer checked the sentinel type in
>> only
>> one place. If that's indeed the case (and I think that it's quite
>> close to
>> being true if it isn't true), then you _wouldn't_ need to use static
>> ifs like
>> this in many places. So, which is it? If you need to check the
>> sentinel often
>> enough that using static ifs is a problem, then it's probably not
>> buying you
>> much of anything over checking empty anyway.
>
> Please, again, examine lexer.c.
>
> For example, look at what happens after line 719. Try 794 in particular.
>
> Now look at line 996 and what follows. Note that there is not a single
> null check there, yet the code is correct and does not run off the end
> of the data.
>
>
>> But my point is that outside of strings or arrays, you're almost
>> certainly
>> stuck with that.
>
> I've given you two examples (lexer and regexp) where you are certainly
> not stuck with that, and those two cases matter.
>

I'll add that when I wrote current std.regex internal VM I've switched to the sentinel as terminator of the program instead of checking the length. The end result was about 5% of speed up gained back then (measured on DMD alone though, other compilers didn't compile it at the time).

In the lexer it may help a bit more given the lookahead.

-- 
Dmitry Olshansky
February 28, 2013
On 2/28/2013 7:40 AM, Steven Schveighoffer wrote:
> If you look at lexer.c, case 0 is the first test.

No, it is not. It is actually a table lookup - all done in parallel.

    jmp cases[character]

February 28, 2013
On Thursday, 28 February 2013 at 16:36:44 UTC, Walter Bright wrote:
> On 2/28/2013 6:48 AM, Ary Borenszweig wrote:
>> On 2/28/13 1:57 AM, Walter Bright wrote:
>>> On 2/27/2013 8:01 PM, John Colvin wrote:
>>>> Why must sentinel be known at compile time? I don't see what's in the
>>>> way of it
>>>> being a runtime argument.
>>>
>>> Performance!
>>>
>>> It should be usable as a case in a switch statement.
>>
>> Isn't it possible for the optimizer to inline the function call and then combine
>> the next ifs?
>>
>> if (isSentinel(value)) {
>> } else {
>>   switch(value) {
>>   case ...
>>   }
>> }
>
> 1. I don't know of any C compiler that would do that.
>

As mentioned, LLVM is able to do this kind of things. I have to admit I was quite amazed when I discovered it.

> 2. There are other cases, as I pointed out to deadalnix.
>
> 3. You still can't do lookahead without extra checks
>

You need the actual check to do proper generic D code. But the compiler is able to optimize it away via inlining and mecanism described in 1.

> Once again, guys, have a look at lexer.c at the lines I pointed out.

I see nothing that can't be done by an optimizing compiler. Maybe something is, but I can't find it. And the example you pointed me aren't.
February 28, 2013
28-Feb-2013 20:56, Walter Bright пишет:
> On 2/28/2013 7:41 AM, Dmitry Olshansky wrote:
>> The merit of sentinel range IMHO is largely in tapping into C-strings
>> and the
>> like more naturally.
>
> The merit is having significantly faster speed.
>

5% is good but doesn't count as significant. But that was for std.regex quite sometime ago. In the lexer it might get be more.

I can try right away with std.d.lexer by Brain Schott and get rid of the check for the length and place a null byte right after the file buffer. Then I should be able to estimate the gain.

-- 
Dmitry Olshansky
February 28, 2013
On Thu, 28 Feb 2013 11:43:06 -0500, Walter Bright <newshound2@digitalmars.com> wrote:

> On 2/28/2013 6:59 AM, Andrei Alexandrescu wrote:
>> On 2/28/13 2:37 AM, deadalnix wrote:
>>> I don't see how defining a specific sentinel range here helps.
>>
>> On first blush I agree. It may as well be a range that by convention is
>> sentinel-terminated, and there's calls to front and popFront but never to empty.
>
>
> Consider the following code from lexer.c:
>
>     p++;
>     switch (*p)
>
> Written using an InputRange:
>
>     popFront();
>     switch (front)
>
> That code is INVALID. This is why a SentinelInputRange is necessary. You can't just use an InputRange in an invalid manner by convention.

Does switch(*p) include a case for 0?  If so, wouldn't it be equivalent to say if(empty) /* do stuff that case 0 does */ else switch(front)

-Steve
February 28, 2013
On 2/28/13 11:54 AM, Walter Bright wrote:
> On 2/28/2013 7:31 AM, Andrei Alexandrescu wrote:
>> I don't think the sentinel input range is a blocker for redoing the
>> parser (with
>> ranges) in D. This discussion has probably run its course. The right
>> thing to do
>> at this point is port the lexer and figure what primitives are
>> necessary from
>> its input.
>
> I don't agree. It is a blocker for getting a fast lexer.

I don't think what I said was understood.

Andrei