View mode: basic / threaded / horizontal-split · Log in · Help
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 04:58:23 UTC, Walter Bright 
wrote:
> On 2/27/2013 8:01 PM, John Colvin wrote:
>> Why must sentinel be known at compile time? I don't see what's 
>> in the way of it
>> being a runtime argument.
>
> Performance!
>
> It should be usable as a case in a switch statement.

Little did you know about this bug: 
http://d.puremagic.com/issues/show_bug.cgi?id=5862

:)

But I guess this defeats the purpose of the switch statement 
(i.e. optimizations).

Btw can we get a clear opinion on whether 5862 is an actual bug 
that needs to be fixed?
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, February 28, 2013 05:53:13 Zach the Mystic wrote:
> My understanding of the logic of Sentinel Ranges so far is that
> switch statements and other control flow can proceed eagerly,
> because "go" values can be checked before the sentinel "stop"
> value, and "!empty" is known implicitly. I don't know exactly
> where the speed benefits of having a single "stop" value known at
> compile time come from.

Because the case value needs to be known at compile time, but even if it 
didn't have to be, it would be less efficient to have to call a function for it 
than it would be to have a value known at compile time.

> Is this design focused more on your knowledge of how the compiler
> optimizes machine code, or on something which can be grasped at a
> higher level?

I think that it's simply a matter of looking at how often empty has to be 
called in the lexer if you don't have a sentinel value. Most every time that 
you pop front, you'll have to check empty, whereas with a sentinel value, 
you'll almost never have to check it. That saves at least one operation for 
every character that's lexed.

Now, given that you're going to have to wrap any range in order for to be 
sentinel range (unless it was designed as one in the first place), in most 
cases, you're going to be stuck checking empty all the time anyway. For 
strings wrapped in a sentinel range, you can avoid it, because you can just 
make its last element 0 (though that may mean reallocating the entire string 
depending on whether you can append 0 to it without reallocating). But you're 
still stuck wrapping it.

I suspect that it would just be better to special case strings rather than to 
create this sentinel range idea, because I really don't see any benefit for it 
outside of strings, and it's not like it's hard to just use a couple of static 
ifs to avoid the check for empty and add the sentinel case when you're 
operating on strings.

- Jonathan M Davis
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 8:47 PM, Jonathan M Davis wrote:
> Now, the only real benefit that I see for this allowing you to make a string
> zero-terminated (which in the case of a lexer would probably mean copying the
> entire file into a new string which has zero on the end).

Nawp, there is no extra copy. Take a look at the compiler source. You have to 
read the file into memory anyway - so make the buffer one byte longer, and put a 
0 at the end.

> In general, you'll be
> forced to wrap a range in a sentinel range to get this behavior, which means
> that you're _still_ checking empty all the time, because it has to keep
> checking whether it's supposed to make front 0 now. And that probably means
> that it'll be slightly _more_ expensive to do this for anything other than a
> string. That being the case, it might be better to just special case strings
> rather than come up with this whole new range idea.

Nope, not necessary to wrap it.


> I'm also not at all covinced that this is generally useful. It may be that
> it's a great idea for lexers, but what other use cases are there?

Anything that walks a C string. Lots of cases for that. Sentinels are often used 
where high speed processing of data is desired. Google sentinel-terminated data 
for more examples.


> Also, I'd point out that even for strings, doing something like this means
> wrapping them, because their empty isn't defined in a manner which works with
> isSentinelRange.

For D strings, yes, for C strings, no need to wrap them.

> So, I'm inclined to believe that we'd be better off just special casing strings
> in any algorithms that can take advantage of this sort of thing than we would
> be creating this sentinel range idea.

0 terminated C strings are a classic case of this. Another case is a token 
stream ending with an EOF token.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 9:03 PM, Andrej Mitrovic wrote:
> But I guess this defeats the purpose of the switch statement (i.e. optimizations).

It certainly does defeat the performance of the switch statement, and hence the 
point of the sentinel.

> Btw can we get a clear opinion on whether 5862 is an actual bug that needs to be
> fixed?

I'd have to investigate.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 9:06 PM, Jonathan M Davis wrote:
> Now, given that you're going to have to wrap any range in order for to be
> sentinel range

No. Look at how lexer.c is fed data.
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 04:57:08 UTC, Walter Bright 
wrote:
> On 2/27/2013 8:53 PM, Zach the Mystic wrote:
>> My understanding of the logic of Sentinel Ranges so far is 
>> that switch
>> statements and other control flow can proceed eagerly, because 
>> "go" values can
>> be checked before the sentinel "stop" value, and "!empty" is 
>> known implicitly. I
>> don't know exactly where the speed benefits of having a single 
>> "stop" value
>> known at compile time come from.
>>
>> Is this design focused more on your knowledge of how the 
>> compiler optimizes
>> machine code, or on something which can be grasped at a higher 
>> level?
>
> Take a look at lexer.c.
>
> With an InputRange, reading a character from a 0 terminated 
> string requires two read operations. A SentinalInputRange 
> requires only one.

If the range define empty with something like front == sentinel, 
the inliner should kick in a reduce the whole stuff to only one 
read, no ?
February 28, 2013
Re: Proposal for SentinelInputRange
On Wednesday, February 27, 2013 21:10:15 Walter Bright wrote:
> > Also, I'd point out that even for strings, doing something like this means
> > wrapping them, because their empty isn't defined in a manner which works
> > with isSentinelRange.
> 
> For D strings, yes, for C strings, no need to wrap them.

But you have to deal with D strings, not C strings if you're dealing with 
ranges. char* isn't a range. So, unless you're talking about wrapping a char* 
in a range, char* isn't going to work. And simply appending 0 to the end of a 
D string isn't enough, because isSentinelnputRange would fail, because 
std.array.empty doesn't match it. So, you need a wrapper even if it's only to 
pass the template constraint. That being the case, regardless of whether 
you're dealing with char* or string, you need a wrapper.

And what ranges other than strings can take advantage of anything like this? 
There's no way to append a value to range other than chain, which means that 
you're forced to either construct ranges specifically with the idea that 
they'll be used as sentinel ranges (in which case, they're a wrapper around 
whatever they're using to hold the data internally - probably an array - and 
if you're doing that, why not just use an array in the first place?), or you're 
forced to wrap an existing range (which would then require it to check for 
empty on each popFront so that it can make its front be 0 when it's empty).

What are we gaining here that can't be gained by simply using a few static ifs 
to special case strings? And I don't see how this idea is gaining us much of 
anything outside of strings and maybe arrays - both of which have to be 
wrapped in order for their empty to act appropriately (even if they can avoid 
the extra empty checks internally by appending the sentinel value to their 
end). So, why not just special case strings or arrays in the few situations 
where something like this is needed, especially when it would be so easy to 
do?

- Jonathan M Davis
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 9:20 PM, deadalnix wrote:
> If the range define empty with something like front == sentinel, the inliner
> should kick in a reduce the whole stuff to only one read, no ?

    auto c = front;
    if (c == sentinel || c == XX)

is two reads. This may not seem important, but when you want high speed, it can 
halve it.
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 05:28:47 UTC, Walter Bright 
wrote:
> On 2/27/2013 9:20 PM, deadalnix wrote:
>> If the range define empty with something like front == 
>> sentinel, the inliner
>> should kick in a reduce the whole stuff to only one read, no ?
>
>     auto c = front;
>     if (c == sentinel || c == XX)
>
> is two reads. This may not seem important, but when you want 
> high speed, it can halve it.

Don't you have to check for both all the time ? You have to check 
for the sentinel anyway.
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 05:28:20 UTC, Jonathan M Davis 
wrote:
> What are we gaining here that can't be gained by simply using a 
> few static ifs to special case strings?

That's the question that is swaying me to your side at the 
moment. It's possible that the special performance which Sentinel 
Ranges offer is actually confined to a very narrow band of 
problems which can be dealt with transparently in the existing 
library. In which case the right thing to do is make an 
enhancement request asking for just such performance enhancing 
drugs to be administered. :-)
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home