View mode: basic / threaded / horizontal-split · Log in · Help
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/28/2013 7:41 AM, Dmitry Olshansky wrote:
> The merit of sentinel range IMHO is largely in tapping into C-strings and the
> like more naturally.

The merit is having significantly faster speed.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/28/2013 3:23 AM, deadalnix wrote:
> On Thursday, 28 February 2013 at 10:57:29 UTC, Walter Bright wrote:
>> Notice the double read of *p. This is what SentinelInputRange is trying to fix.
>>
>
> But it will be optimized away.

(c == 1 || c == 5) does not optimize to (c == 5).
February 28, 2013
Re: Proposal for SentinelInputRange
Walter Bright:

> The merit is having significantly faster speed.

Maybe it's time to write two benchmarks to compare the two 
different speeds (and run the two benchmarks with DMD and LDC).

Bye,
bearophile
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/28/2013 6:31 AM, Steven Schveighoffer wrote:
> If this doesn't translate to the same code, I don't know why not.

Try it and see with your favorite C compiler.

Then try the lookahead cases I also posted.
February 28, 2013
Re: Proposal for SentinelInputRange
28-Feb-2013 20:52, Walter Bright пишет:
> On 2/28/2013 3:29 AM, Jonathan M Davis wrote:
>> And you were just claiming that the lexer checked the sentinel type in
>> only
>> one place. If that's indeed the case (and I think that it's quite
>> close to
>> being true if it isn't true), then you _wouldn't_ need to use static
>> ifs like
>> this in many places. So, which is it? If you need to check the
>> sentinel often
>> enough that using static ifs is a problem, then it's probably not
>> buying you
>> much of anything over checking empty anyway.
>
> Please, again, examine lexer.c.
>
> For example, look at what happens after line 719. Try 794 in particular.
>
> Now look at line 996 and what follows. Note that there is not a single
> null check there, yet the code is correct and does not run off the end
> of the data.
>
>
>> But my point is that outside of strings or arrays, you're almost
>> certainly
>> stuck with that.
>
> I've given you two examples (lexer and regexp) where you are certainly
> not stuck with that, and those two cases matter.
>

I'll add that when I wrote current std.regex internal VM I've switched 
to the sentinel as terminator of the program instead of checking the 
length. The end result was about 5% of speed up gained back then 
(measured on DMD alone though, other compilers didn't compile it at the 
time).

In the lexer it may help a bit more given the lookahead.

-- 
Dmitry Olshansky
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/28/2013 7:40 AM, Steven Schveighoffer wrote:
> If you look at lexer.c, case 0 is the first test.

No, it is not. It is actually a table lookup - all done in parallel.

    jmp cases[character]
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 16:36:44 UTC, Walter Bright 
wrote:
> On 2/28/2013 6:48 AM, Ary Borenszweig wrote:
>> On 2/28/13 1:57 AM, Walter Bright wrote:
>>> On 2/27/2013 8:01 PM, John Colvin wrote:
>>>> Why must sentinel be known at compile time? I don't see 
>>>> what's in the
>>>> way of it
>>>> being a runtime argument.
>>>
>>> Performance!
>>>
>>> It should be usable as a case in a switch statement.
>>
>> Isn't it possible for the optimizer to inline the function 
>> call and then combine
>> the next ifs?
>>
>> if (isSentinel(value)) {
>> } else {
>>   switch(value) {
>>   case ...
>>   }
>> }
>
> 1. I don't know of any C compiler that would do that.
>

As mentioned, LLVM is able to do this kind of things. I have to 
admit I was quite amazed when I discovered it.

> 2. There are other cases, as I pointed out to deadalnix.
>
> 3. You still can't do lookahead without extra checks
>

You need the actual check to do proper generic D code. But the 
compiler is able to optimize it away via inlining and mecanism 
described in 1.

> Once again, guys, have a look at lexer.c at the lines I pointed 
> out.

I see nothing that can't be done by an optimizing compiler. Maybe 
something is, but I can't find it. And the example you pointed me 
aren't.
February 28, 2013
Re: Proposal for SentinelInputRange
28-Feb-2013 20:56, Walter Bright пишет:
> On 2/28/2013 7:41 AM, Dmitry Olshansky wrote:
>> The merit of sentinel range IMHO is largely in tapping into C-strings
>> and the
>> like more naturally.
>
> The merit is having significantly faster speed.
>

5% is good but doesn't count as significant. But that was for std.regex 
quite sometime ago. In the lexer it might get be more.

I can try right away with std.d.lexer by Brain Schott and get rid of the 
check for the length and place a null byte right after the file buffer. 
Then I should be able to estimate the gain.

-- 
Dmitry Olshansky
February 28, 2013
Re: Proposal for SentinelInputRange
On Thu, 28 Feb 2013 11:43:06 -0500, Walter Bright  
<newshound2@digitalmars.com> wrote:

> On 2/28/2013 6:59 AM, Andrei Alexandrescu wrote:
>> On 2/28/13 2:37 AM, deadalnix wrote:
>>> I don't see how defining a specific sentinel range here helps.
>>
>> On first blush I agree. It may as well be a range that by convention is
>> sentinel-terminated, and there's calls to front and popFront but never  
>> to empty.
>
>
> Consider the following code from lexer.c:
>
>     p++;
>     switch (*p)
>
> Written using an InputRange:
>
>     popFront();
>     switch (front)
>
> That code is INVALID. This is why a SentinelInputRange is necessary. You  
> can't just use an InputRange in an invalid manner by convention.

Does switch(*p) include a case for 0?  If so, wouldn't it be equivalent to  
say if(empty) /* do stuff that case 0 does */ else switch(front)

-Steve
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/28/13 11:54 AM, Walter Bright wrote:
> On 2/28/2013 7:31 AM, Andrei Alexandrescu wrote:
>> I don't think the sentinel input range is a blocker for redoing the
>> parser (with
>> ranges) in D. This discussion has probably run its course. The right
>> thing to do
>> at this point is port the lexer and figure what primitives are
>> necessary from
>> its input.
>
> I don't agree. It is a blocker for getting a fast lexer.

I don't think what I said was understood.

Andrei
5 6 7 8 9 10 11 12 13
Top | Discussion index | About this forum | D home