Proposal for SentinelInputRange (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Proposal for SentinelInputRange (page 2)

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Walter Bright
in reply to Zach the Mystic

Walter Bright

Posted in reply to Zach the Mystic

On 2/27/2013 6:01 PM, Zach the Mystic wrote:
> On Thursday, 28 February 2013 at 01:56:33 UTC, Walter Bright wrote:
>> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>>> What if more than one
>>> value can end the range, EOF, '\0'?
>>
>> I have never seen a need for that.
>
> Do you mean that you've never seen software that uses that, or that you've never
> been convinced that such software couldn't be made simpler?

Pretty much both. For example, often a foreach loop will exit either when it runs out of data or some other condition is met. I don't see a need to work both of those conditions into the definition of a range.

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Walter Bright
in reply to timotheecour

Walter Bright

Posted in reply to timotheecour

On 2/27/2013 6:05 PM, timotheecour wrote:
> On Thursday, 28 February 2013 at 01:56:33 UTC, Walter Bright wrote:
>> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>>> What if more than one
>>> value can end the range, EOF, '\0'?
>>
>> I have never seen a need for that.
>
> how about a predicate isSentinel instead of a fixed sentinel value?
> That'd allow more flexibility such as more than one sentinel value.

A fixed sentinel value makes it clear that it should be a compile time value for performance reasons. If it was a function, then there's really not much advantage to having a SentinelInputRange.

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Walter Bright
in reply to timotheecour

Walter Bright

Posted in reply to timotheecour

On 2/27/2013 6:10 PM, timotheecour wrote:
>>> how about a predicate isSentinel instead of a fixed sentinel value?
>>> That'd allow more flexibility such as more than one sentinel value.
>>
>> Not usable in a switch/case statement.
>
> why is that needed?

Look at the lexer.c source code.

tl,dr: PERFORMANCE!

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by John Colvin
in reply to Walter Bright

John Colvin

Posted in reply to Walter Bright

On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright wrote:
> A SentinelInputRange is an InputRange with the following additions:
>
> 1. a compile time property 'sentinel' that is the terminating value of the range
> 2. empty is defined as: empty = (front == sentinel)
> 3. it is not necessary for empty to be called before front
>
> A C style 0-terminated string is an example of a SentinelInputRange.
>
> The additions to std.range would be:
>
> 1. isSentinelInputRange(T) which returns true if T is a SentinelInputRange
> 2. a unittest
> 3. documentation of this
>
> An addition to std.string would be a function that takes a char* and returns a SentinelInputRange.
>
> Motivation:
> 1. easy conversion of C strings to ranges
> 2. necessary for a fast implementation of a lexer
>
> Any takers?

Why must sentinel be known at compile time? I don't see what's in the way of it being a runtime argument.

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Marco Leise
in reply to John Colvin

Marco Leise

Posted in reply to John Colvin

Am Thu, 28 Feb 2013 05:01:39 +0100
schrieb "John Colvin" <john.loughran.colvin@gmail.com>:

> On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright wrote:
> > A SentinelInputRange is an InputRange with the following additions:
> >
> > 1. a compile time property 'sentinel' that is the terminating
> > value of the range
> > 2. empty is defined as: empty = (front == sentinel)
> > 3. it is not necessary for empty to be called before front
> >
> > A C style 0-terminated string is an example of a SentinelInputRange.
> >
> > The additions to std.range would be:
> >
> > 1. isSentinelInputRange(T) which returns true if T is a
> > SentinelInputRange
> > 2. a unittest
> > 3. documentation of this
> >
> > An addition to std.string would be a function that takes a char* and returns a SentinelInputRange.
> >
> > Motivation:
> > 1. easy conversion of C strings to ranges
> > 2. necessary for a fast implementation of a lexer
> >
> > Any takers?
> 
> Why must sentinel be known at compile time? I don't see what's in the way of it being a runtime argument.

Possibly because a 0 check can be reduced to a simple bit test on the CPU (and you need to do that often and possibly in several source code locations when parsing long strings). A runtime value means a memory read and comparison every time! (For simple loops the compiler will keep the value in a register, but that's not generally the case.)

-- 
Marco

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by deadalnix
in reply to Walter Bright

deadalnix

Posted in reply to Walter Bright

On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright wrote:
> A SentinelInputRange is an InputRange with the following additions:
>
> 1. a compile time property 'sentinel' that is the terminating value of the range
> 2. empty is defined as: empty = (front == sentinel)
> 3. it is not necessary for empty to be called before front
>
> A C style 0-terminated string is an example of a SentinelInputRange.
>
> The additions to std.range would be:
>
> 1. isSentinelInputRange(T) which returns true if T is a SentinelInputRange
> 2. a unittest
> 3. documentation of this
>
> An addition to std.string would be a function that takes a char* and returns a SentinelInputRange.
>
> Motivation:
> 1. easy conversion of C strings to ranges
> 2. necessary for a fast implementation of a lexer
>
> Any takers?

Can you explain what it does buy us ? Is it faster to check for the sentinel value that to check for emptiness ?

Is it really handy in generic code ? For instance, if I use a switch, the actual sentinel value may be conflicting with other element in the switch. How does this kind of problem are solved ? Do they are problems in the first place ?

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Jonathan M Davis
in reply to deadalnix

Jonathan M Davis

Posted in reply to deadalnix

On Thursday, February 28, 2013 05:24:57 deadalnix wrote:
> Can you explain what it does buy us ? Is it faster to check for the sentinel value that to check for emptiness ?
> 
> Is it really handy in generic code ? For instance, if I use a switch, the actual sentinel value may be conflicting with other element in the switch. How does this kind of problem are solved ? Do they are problems in the first place ?

In the case of a lexer, it allows you to skip checking for empty most of the time. You have something like

switch(front)
{
    case '+': ...
    case '>': ...
     ...
    case 0: //do whatever you do at the end
}

So, as far as the switch statement goes, you don't have to worry about whether the range is empty or not. The case for 0 takes care of it and will only get checked if it actually happens, whereas without the sentinel value, you have to check empty every time you pop front.

Now, the only real benefit that I see for this allowing you to make a string zero-terminated (which in the case of a lexer would probably mean copying the entire file into a new string which has zero on the end). In general, you'll be forced to wrap a range in a sentinel range to get this behavior, which means that you're _still_ checking empty all the time, because it has to keep checking whether it's supposed to make front 0 now. And that probably means that it'll be slightly _more_ expensive to do this for anything other than a string. That being the case, it might be better to just special case strings rather than come up with this whole new range idea.

I'm also not at all covinced that this is generally useful. It may be that it's a great idea for lexers, but what other use cases are there? I don't know that there isn't one, but I can't think of any. And the _only_ time that it's of any real benefit is definitely going to be when ranges are built with this idea in the first place rather than wrapping them, which almost guarantees that you're just dealing with arrays, which you can always just special case if you need this level of efficiency.

Also, I'd point out that even for strings, doing something like this means wrapping them, because their empty isn't defined in a manner which works with isSentinelRange. Unlike most other cases, the wrapper wouldn't have to keep checking for empty (it could just guarantee that the string it was given ended with 0 and then say that it's length was one less than the underlying string), but you're still dealing with a wrapper and relying on the compiler to optimize that cost away. Also, that totally screws with passing the range to any functions which special case strings. Those functions would now also have to special case the wrapper type if you wanted the extra efficiency that you get from the function understanding that it's a string rather than a range of code units or code points (depending on which the sentinel range is claiming it is - probably code units in the case of the lexer).

So, I'm inclined to believe that we'd be better off just special casing strings in any algorithms that can take advantage of this sort of thing than we would be creating this sentinel range idea.

- Jonathan M Davis

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Zach the Mystic
in reply to Walter Bright

Zach the Mystic

Posted in reply to Walter Bright

On Thursday, 28 February 2013 at 03:37:25 UTC, Walter Bright wrote:
> On 2/27/2013 6:01 PM, Zach the Mystic wrote:
>> On Thursday, 28 February 2013 at 01:56:33 UTC, Walter Bright wrote:
>>> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>>>> What if more than one
>>>> value can end the range, EOF, '\0'?
>>>
>>> I have never seen a need for that.
>>
>> Do you mean that you've never seen software that uses that, or that you've never
>> been convinced that such software couldn't be made simpler?
>
> Pretty much both. For example, often a foreach loop will exit either when it runs out of data or some other condition is met. I don't see a need to work both of those conditions into the definition of a range.

My understanding of the logic of Sentinel Ranges so far is that switch statements and other control flow can proceed eagerly, because "go" values can be checked before the sentinel "stop" value, and "!empty" is known implicitly. I don't know exactly where the speed benefits of having a single "stop" value known at compile time come from.

Is this design focused more on your knowledge of how the compiler optimizes machine code, or on something which can be grasped at a higher level?

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Walter Bright
in reply to Zach the Mystic

Walter Bright

Posted in reply to Zach the Mystic

On 2/27/2013 8:53 PM, Zach the Mystic wrote:
> My understanding of the logic of Sentinel Ranges so far is that switch
> statements and other control flow can proceed eagerly, because "go" values can
> be checked before the sentinel "stop" value, and "!empty" is known implicitly. I
> don't know exactly where the speed benefits of having a single "stop" value
> known at compile time come from.
>
> Is this design focused more on your knowledge of how the compiler optimizes
> machine code, or on something which can be grasped at a higher level?

Take a look at lexer.c.

With an InputRange, reading a character from a 0 terminated string requires two read operations. A SentinalInputRange requires only one.

February 28, 2013

Re: Proposal for SentinelInputRange

Posted by Walter Bright
in reply to John Colvin

Walter Bright

Posted in reply to John Colvin

On 2/27/2013 8:01 PM, John Colvin wrote:
> Why must sentinel be known at compile time? I don't see what's in the way of it
> being a runtime argument.

Performance!

It should be usable as a case in a switch statement.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation