View mode: basic / threaded / horizontal-split · Log in · Help
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 6:01 PM, Zach the Mystic wrote:
> On Thursday, 28 February 2013 at 01:56:33 UTC, Walter Bright wrote:
>> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>>> What if more than one
>>> value can end the range, EOF, '\0'?
>>
>> I have never seen a need for that.
>
> Do you mean that you've never seen software that uses that, or that you've never
> been convinced that such software couldn't be made simpler?

Pretty much both. For example, often a foreach loop will exit either when it 
runs out of data or some other condition is met. I don't see a need to work both 
of those conditions into the definition of a range.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 6:05 PM, timotheecour wrote:
> On Thursday, 28 February 2013 at 01:56:33 UTC, Walter Bright wrote:
>> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>>> What if more than one
>>> value can end the range, EOF, '\0'?
>>
>> I have never seen a need for that.
>
> how about a predicate isSentinel instead of a fixed sentinel value?
> That'd allow more flexibility such as more than one sentinel value.

A fixed sentinel value makes it clear that it should be a compile time value for 
performance reasons. If it was a function, then there's really not much 
advantage to having a SentinelInputRange.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 6:10 PM, timotheecour wrote:
>>> how about a predicate isSentinel instead of a fixed sentinel value?
>>> That'd allow more flexibility such as more than one sentinel value.
>>
>> Not usable in a switch/case statement.
>
> why is that needed?

Look at the lexer.c source code.

tl,dr: PERFORMANCE!
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright 
wrote:
> A SentinelInputRange is an InputRange with the following 
> additions:
>
> 1. a compile time property 'sentinel' that is the terminating 
> value of the range
> 2. empty is defined as: empty = (front == sentinel)
> 3. it is not necessary for empty to be called before front
>
> A C style 0-terminated string is an example of a 
> SentinelInputRange.
>
> The additions to std.range would be:
>
> 1. isSentinelInputRange(T) which returns true if T is a 
> SentinelInputRange
> 2. a unittest
> 3. documentation of this
>
> An addition to std.string would be a function that takes a 
> char* and returns a SentinelInputRange.
>
> Motivation:
> 1. easy conversion of C strings to ranges
> 2. necessary for a fast implementation of a lexer
>
> Any takers?

Why must sentinel be known at compile time? I don't see what's in 
the way of it being a runtime argument.
February 28, 2013
Re: Proposal for SentinelInputRange
Am Thu, 28 Feb 2013 05:01:39 +0100
schrieb "John Colvin" <john.loughran.colvin@gmail.com>:

> On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright 
> wrote:
> > A SentinelInputRange is an InputRange with the following 
> > additions:
> >
> > 1. a compile time property 'sentinel' that is the terminating 
> > value of the range
> > 2. empty is defined as: empty = (front == sentinel)
> > 3. it is not necessary for empty to be called before front
> >
> > A C style 0-terminated string is an example of a 
> > SentinelInputRange.
> >
> > The additions to std.range would be:
> >
> > 1. isSentinelInputRange(T) which returns true if T is a 
> > SentinelInputRange
> > 2. a unittest
> > 3. documentation of this
> >
> > An addition to std.string would be a function that takes a 
> > char* and returns a SentinelInputRange.
> >
> > Motivation:
> > 1. easy conversion of C strings to ranges
> > 2. necessary for a fast implementation of a lexer
> >
> > Any takers?
> 
> Why must sentinel be known at compile time? I don't see what's in 
> the way of it being a runtime argument.

Possibly because a 0 check can be reduced to a simple bit
test on the CPU (and you need to do that often and possibly in
several source code locations when parsing long strings). A
runtime value means a memory read and comparison every time!
(For simple loops the compiler will keep the value in a
register, but that's not generally the case.)

-- 
Marco
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright 
wrote:
> A SentinelInputRange is an InputRange with the following 
> additions:
>
> 1. a compile time property 'sentinel' that is the terminating 
> value of the range
> 2. empty is defined as: empty = (front == sentinel)
> 3. it is not necessary for empty to be called before front
>
> A C style 0-terminated string is an example of a 
> SentinelInputRange.
>
> The additions to std.range would be:
>
> 1. isSentinelInputRange(T) which returns true if T is a 
> SentinelInputRange
> 2. a unittest
> 3. documentation of this
>
> An addition to std.string would be a function that takes a 
> char* and returns a SentinelInputRange.
>
> Motivation:
> 1. easy conversion of C strings to ranges
> 2. necessary for a fast implementation of a lexer
>
> Any takers?

Can you explain what it does buy us ? Is it faster to check for 
the sentinel value that to check for emptiness ?

Is it really handy in generic code ? For instance, if I use a 
switch, the actual sentinel value may be conflicting with other 
element in the switch. How does this kind of problem are solved ? 
Do they are problems in the first place ?
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, February 28, 2013 05:24:57 deadalnix wrote:
> Can you explain what it does buy us ? Is it faster to check for
> the sentinel value that to check for emptiness ?
> 
> Is it really handy in generic code ? For instance, if I use a
> switch, the actual sentinel value may be conflicting with other
> element in the switch. How does this kind of problem are solved ?
> Do they are problems in the first place ?

In the case of a lexer, it allows you to skip checking for empty most of the 
time. You have something like

switch(front)
{
   case '+': ...
   case '>': ...
    ...
   case 0: //do whatever you do at the end
}

So, as far as the switch statement goes, you don't have to worry about whether 
the range is empty or not. The case for 0 takes care of it and will only get 
checked if it actually happens, whereas without the sentinel value, you have 
to check empty every time you pop front.

Now, the only real benefit that I see for this allowing you to make a string 
zero-terminated (which in the case of a lexer would probably mean copying the 
entire file into a new string which has zero on the end). In general, you'll be 
forced to wrap a range in a sentinel range to get this behavior, which means 
that you're _still_ checking empty all the time, because it has to keep 
checking whether it's supposed to make front 0 now. And that probably means 
that it'll be slightly _more_ expensive to do this for anything other than a 
string. That being the case, it might be better to just special case strings 
rather than come up with this whole new range idea.

I'm also not at all covinced that this is generally useful. It may be that 
it's a great idea for lexers, but what other use cases are there? I don't know 
that there isn't one, but I can't think of any. And the _only_ time that it's 
of any real benefit is definitely going to be when ranges are built with this 
idea in the first place rather than wrapping them, which almost guarantees that 
you're just dealing with arrays, which you can always just special case if you 
need this level of efficiency.

Also, I'd point out that even for strings, doing something like this means 
wrapping them, because their empty isn't defined in a manner which works with 
isSentinelRange. Unlike most other cases, the wrapper wouldn't have to keep 
checking for empty (it could just guarantee that the string it was given ended 
with 0 and then say that it's length was one less than the underlying string), 
but you're still dealing with a wrapper and relying on the compiler to 
optimize that cost away. Also, that totally screws with passing the range to 
any functions which special case strings. Those functions would now also have 
to special case the wrapper type if you wanted the extra efficiency that you get 
from the function understanding that it's a string rather than a range of code 
units or code points (depending on which the sentinel range is claiming it is 
- probably code units in the case of the lexer).

So, I'm inclined to believe that we'd be better off just special casing strings 
in any algorithms that can take advantage of this sort of thing than we would 
be creating this sentinel range idea.

- Jonathan M Davis
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 03:37:25 UTC, Walter Bright 
wrote:
> On 2/27/2013 6:01 PM, Zach the Mystic wrote:
>> On Thursday, 28 February 2013 at 01:56:33 UTC, Walter Bright 
>> wrote:
>>> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>>>> What if more than one
>>>> value can end the range, EOF, '\0'?
>>>
>>> I have never seen a need for that.
>>
>> Do you mean that you've never seen software that uses that, or 
>> that you've never
>> been convinced that such software couldn't be made simpler?
>
> Pretty much both. For example, often a foreach loop will exit 
> either when it runs out of data or some other condition is met. 
> I don't see a need to work both of those conditions into the 
> definition of a range.

My understanding of the logic of Sentinel Ranges so far is that 
switch statements and other control flow can proceed eagerly, 
because "go" values can be checked before the sentinel "stop" 
value, and "!empty" is known implicitly. I don't know exactly 
where the speed benefits of having a single "stop" value known at 
compile time come from.

Is this design focused more on your knowledge of how the compiler 
optimizes machine code, or on something which can be grasped at a 
higher level?
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 8:53 PM, Zach the Mystic wrote:
> My understanding of the logic of Sentinel Ranges so far is that switch
> statements and other control flow can proceed eagerly, because "go" values can
> be checked before the sentinel "stop" value, and "!empty" is known implicitly. I
> don't know exactly where the speed benefits of having a single "stop" value
> known at compile time come from.
>
> Is this design focused more on your knowledge of how the compiler optimizes
> machine code, or on something which can be grasped at a higher level?

Take a look at lexer.c.

With an InputRange, reading a character from a 0 terminated string requires two 
read operations. A SentinalInputRange requires only one.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 8:01 PM, John Colvin wrote:
> Why must sentinel be known at compile time? I don't see what's in the way of it
> being a runtime argument.

Performance!

It should be usable as a case in a switch statement.
1 2 3 4 5 6
Top | Discussion index | About this forum | D home