February 28, 2013
On Thursday, 28 February 2013 at 04:58:23 UTC, Walter Bright wrote:
> On 2/27/2013 8:01 PM, John Colvin wrote:
>> Why must sentinel be known at compile time? I don't see what's in the way of it
>> being a runtime argument.
>
> Performance!
>
> It should be usable as a case in a switch statement.

Little did you know about this bug: http://d.puremagic.com/issues/show_bug.cgi?id=5862

:)

But I guess this defeats the purpose of the switch statement (i.e. optimizations).

Btw can we get a clear opinion on whether 5862 is an actual bug that needs to be fixed?
February 28, 2013
On Thursday, February 28, 2013 05:53:13 Zach the Mystic wrote:
> My understanding of the logic of Sentinel Ranges so far is that switch statements and other control flow can proceed eagerly, because "go" values can be checked before the sentinel "stop" value, and "!empty" is known implicitly. I don't know exactly where the speed benefits of having a single "stop" value known at compile time come from.

Because the case value needs to be known at compile time, but even if it didn't have to be, it would be less efficient to have to call a function for it than it would be to have a value known at compile time.

> Is this design focused more on your knowledge of how the compiler optimizes machine code, or on something which can be grasped at a higher level?

I think that it's simply a matter of looking at how often empty has to be called in the lexer if you don't have a sentinel value. Most every time that you pop front, you'll have to check empty, whereas with a sentinel value, you'll almost never have to check it. That saves at least one operation for every character that's lexed.

Now, given that you're going to have to wrap any range in order for to be sentinel range (unless it was designed as one in the first place), in most cases, you're going to be stuck checking empty all the time anyway. For strings wrapped in a sentinel range, you can avoid it, because you can just make its last element 0 (though that may mean reallocating the entire string depending on whether you can append 0 to it without reallocating). But you're still stuck wrapping it.

I suspect that it would just be better to special case strings rather than to create this sentinel range idea, because I really don't see any benefit for it outside of strings, and it's not like it's hard to just use a couple of static ifs to avoid the check for empty and add the sentinel case when you're operating on strings.

- Jonathan M Davis
February 28, 2013
On 2/27/2013 8:47 PM, Jonathan M Davis wrote:
> Now, the only real benefit that I see for this allowing you to make a string
> zero-terminated (which in the case of a lexer would probably mean copying the
> entire file into a new string which has zero on the end).

Nawp, there is no extra copy. Take a look at the compiler source. You have to read the file into memory anyway - so make the buffer one byte longer, and put a 0 at the end.

> In general, you'll be
> forced to wrap a range in a sentinel range to get this behavior, which means
> that you're _still_ checking empty all the time, because it has to keep
> checking whether it's supposed to make front 0 now. And that probably means
> that it'll be slightly _more_ expensive to do this for anything other than a
> string. That being the case, it might be better to just special case strings
> rather than come up with this whole new range idea.

Nope, not necessary to wrap it.


> I'm also not at all covinced that this is generally useful. It may be that
> it's a great idea for lexers, but what other use cases are there?

Anything that walks a C string. Lots of cases for that. Sentinels are often used where high speed processing of data is desired. Google sentinel-terminated data for more examples.


> Also, I'd point out that even for strings, doing something like this means
> wrapping them, because their empty isn't defined in a manner which works with
> isSentinelRange.

For D strings, yes, for C strings, no need to wrap them.

> So, I'm inclined to believe that we'd be better off just special casing strings
> in any algorithms that can take advantage of this sort of thing than we would
> be creating this sentinel range idea.

0 terminated C strings are a classic case of this. Another case is a token stream ending with an EOF token.

February 28, 2013
On 2/27/2013 9:03 PM, Andrej Mitrovic wrote:
> But I guess this defeats the purpose of the switch statement (i.e. optimizations).

It certainly does defeat the performance of the switch statement, and hence the point of the sentinel.

> Btw can we get a clear opinion on whether 5862 is an actual bug that needs to be
> fixed?

I'd have to investigate.
February 28, 2013
On 2/27/2013 9:06 PM, Jonathan M Davis wrote:
> Now, given that you're going to have to wrap any range in order for to be
> sentinel range

No. Look at how lexer.c is fed data.
February 28, 2013
On Thursday, 28 February 2013 at 04:57:08 UTC, Walter Bright wrote:
> On 2/27/2013 8:53 PM, Zach the Mystic wrote:
>> My understanding of the logic of Sentinel Ranges so far is that switch
>> statements and other control flow can proceed eagerly, because "go" values can
>> be checked before the sentinel "stop" value, and "!empty" is known implicitly. I
>> don't know exactly where the speed benefits of having a single "stop" value
>> known at compile time come from.
>>
>> Is this design focused more on your knowledge of how the compiler optimizes
>> machine code, or on something which can be grasped at a higher level?
>
> Take a look at lexer.c.
>
> With an InputRange, reading a character from a 0 terminated string requires two read operations. A SentinalInputRange requires only one.

If the range define empty with something like front == sentinel, the inliner should kick in a reduce the whole stuff to only one read, no ?
February 28, 2013
On Wednesday, February 27, 2013 21:10:15 Walter Bright wrote:
> > Also, I'd point out that even for strings, doing something like this means wrapping them, because their empty isn't defined in a manner which works with isSentinelRange.
> 
> For D strings, yes, for C strings, no need to wrap them.

But you have to deal with D strings, not C strings if you're dealing with ranges. char* isn't a range. So, unless you're talking about wrapping a char* in a range, char* isn't going to work. And simply appending 0 to the end of a D string isn't enough, because isSentinelnputRange would fail, because std.array.empty doesn't match it. So, you need a wrapper even if it's only to pass the template constraint. That being the case, regardless of whether you're dealing with char* or string, you need a wrapper.

And what ranges other than strings can take advantage of anything like this? There's no way to append a value to range other than chain, which means that you're forced to either construct ranges specifically with the idea that they'll be used as sentinel ranges (in which case, they're a wrapper around whatever they're using to hold the data internally - probably an array - and if you're doing that, why not just use an array in the first place?), or you're forced to wrap an existing range (which would then require it to check for empty on each popFront so that it can make its front be 0 when it's empty).

What are we gaining here that can't be gained by simply using a few static ifs to special case strings? And I don't see how this idea is gaining us much of anything outside of strings and maybe arrays - both of which have to be wrapped in order for their empty to act appropriately (even if they can avoid the extra empty checks internally by appending the sentinel value to their end). So, why not just special case strings or arrays in the few situations where something like this is needed, especially when it would be so easy to do?

- Jonathan M Davis
February 28, 2013
On 2/27/2013 9:20 PM, deadalnix wrote:
> If the range define empty with something like front == sentinel, the inliner
> should kick in a reduce the whole stuff to only one read, no ?

    auto c = front;
    if (c == sentinel || c == XX)

is two reads. This may not seem important, but when you want high speed, it can halve it.
February 28, 2013
On Thursday, 28 February 2013 at 05:28:47 UTC, Walter Bright wrote:
> On 2/27/2013 9:20 PM, deadalnix wrote:
>> If the range define empty with something like front == sentinel, the inliner
>> should kick in a reduce the whole stuff to only one read, no ?
>
>     auto c = front;
>     if (c == sentinel || c == XX)
>
> is two reads. This may not seem important, but when you want high speed, it can halve it.

Don't you have to check for both all the time ? You have to check for the sentinel anyway.
February 28, 2013
On Thursday, 28 February 2013 at 05:28:20 UTC, Jonathan M Davis wrote:
> What are we gaining here that can't be gained by simply using a few static ifs to special case strings?

That's the question that is swaying me to your side at the moment. It's possible that the special performance which Sentinel Ranges offer is actually confined to a very narrow band of problems which can be dealt with transparently in the existing library. In which case the right thing to do is make an enhancement request asking for just such performance enhancing drugs to be administered. :-)