View mode: basic / threaded / horizontal-split · Log in · Help
February 28, 2013
Re: Proposal for SentinelInputRange
28-Feb-2013 05:56, Walter Bright пишет:
> On 2/27/2013 5:53 PM, Zach the Mystic wrote:
>> What if more than one
>> value can end the range, EOF, '\0'?
>
> I have never seen a need for that.

Inside of the D lexer:

EOF is 0 or 0x1a :)

-- 
Dmitry Olshansky
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 10:02 PM, deadalnix wrote:
> On Thursday, 28 February 2013 at 05:28:47 UTC, Walter Bright wrote:
>> On 2/27/2013 9:20 PM, deadalnix wrote:
>>> If the range define empty with something like front == sentinel, the inliner
>>> should kick in a reduce the whole stuff to only one read, no ?
>>
>>     auto c = front;
>>     if (c == sentinel || c == XX)
>>
>> is two reads. This may not seem important, but when you want high speed, it
>> can halve it.
>
> Don't you have to check for both all the time ? You have to check for the
> sentinel anyway.

I suggest again taking a look at the dmd source lexer.c for how to do it. There 
is no extra check.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 11:02 PM, Dmitry Olshansky wrote:
> Inside of the D lexer:
>
> EOF is 0 or 0x1a :)

There is always a terminating 0, even if the file ends in 0x1a.

(The 0x1A comes from some old editors that end a file with a control Z.)
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 9:28 PM, Jonathan M Davis wrote:
> But you have to deal with D strings, not C strings if you're dealing with
> ranges. char* isn't a range. So, unless you're talking about wrapping a char*
> in a range, char* isn't going to work. And simply appending 0 to the end of a
> D string isn't enough, because isSentinelnputRange would fail, because
> std.array.empty doesn't match it. So, you need a wrapper even if it's only to
> pass the template constraint. That being the case, regardless of whether
> you're dealing with char* or string, you need a wrapper.

Again, please see how lexer.c works. I assure you, there is no double copying 
going on, nor is there a double test for the terminating 0.


> And what ranges other than strings can take advantage of anything like this?

I've mentioned a couple already in this thread, and I'll add another - an 
interpreter bytecode, you can see an example in the (former) std.regexp. Search 
for "REend".


> What are we gaining here

High speed processing.


>So, why not just special case strings or arrays in the few situations
> where something like this is needed, especially when it would be so easy to
> do?

Sentinels structure the code differently.
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 07:22:53 UTC, Walter Bright 
wrote:
> On 2/27/2013 10:02 PM, deadalnix wrote:
>> On Thursday, 28 February 2013 at 05:28:47 UTC, Walter Bright 
>> wrote:
>>> On 2/27/2013 9:20 PM, deadalnix wrote:
>>>> If the range define empty with something like front == 
>>>> sentinel, the inliner
>>>> should kick in a reduce the whole stuff to only one read, no 
>>>> ?
>>>
>>>    auto c = front;
>>>    if (c == sentinel || c == XX)
>>>
>>> is two reads. This may not seem important, but when you want 
>>> high speed, it
>>> can halve it.
>>
>> Don't you have to check for both all the time ? You have to 
>> check for the
>> sentinel anyway.
>
> I suggest again taking a look at the dmd source lexer.c for how 
> to do it. There is no extra check.

I did, in fact I know it for quite a while.

It does check for sentinel in many places (and it has to). How is 
it better than a empty property defined as front == sentinel ?

For instance,

if(!r.empty) {
   switch(r.front) {
       // cases
   }

   // somecode
} else {
    // error code
}

Will be optimized by adding a case for the sentinel in the switch 
(if dmd don't, then it has to be fixed, LLVM is definitively able 
to do stuff like that, and I didn't checked, but I'd bet that GCC 
can do it as well) in the given way :

switch(r.front) {
    case sentinel :
       // error code
       goto Label;

    // cases
}

// somecode

Label:

I don't see how defining a specific sentinel range here helps.
February 28, 2013
Re: Proposal for SentinelInputRange
On Wednesday, February 27, 2013 23:33:09 Walter Bright wrote:
> On 2/27/2013 9:28 PM, Jonathan M Davis wrote:
> > But you have to deal with D strings, not C strings if you're dealing with
> > ranges. char* isn't a range. So, unless you're talking about wrapping a
> > char* in a range, char* isn't going to work. And simply appending 0 to
> > the end of a D string isn't enough, because isSentinelnputRange would
> > fail, because std.array.empty doesn't match it. So, you need a wrapper
> > even if it's only to pass the template constraint. That being the case,
> > regardless of whether you're dealing with char* or string, you need a
> > wrapper.
> 
> Again, please see how lexer.c works. I assure you, there is no double
> copying going on, nor is there a double test for the terminating 0.

I know what the lexer does, and remember that it _doesn't_ operate on ranges, 
and there are subtle differences between being able to just use char* and 
trying to handle generic ranges.

And no, the lexer doesn't have a double test. The place you're going to be 
stuck with a double test is most any range which isn't a string, because such 
ranges won't have sentinel values, and there will be no way to add them (as 
you really can't append to ranges), and so they'll end up being wrapped in a 
SentinelRange which will have to check on each popFront whether the wrapped 
range is now empty making it so that front needs to be the sentinel value. And 
most any range which _was_ designed to have a sentinel value would have to be 
managing its own contents (because otherwise, it would just be back to 
wrapping a range and having to check empty), which likely means that it'll 
just be a thin wrapper around a string or array anyway.

Strings will still need to be wrapped, because they won't pass isSentinelRange 
otherwise, but they won't get any extra checks, because the wrapper can just 
check for 0 on the end and append it if it's not there.

> >So, why not just special case strings or arrays in the few situations
> >
> > where something like this is needed, especially when it would be so easy
> > to
> > do?
> 
> Sentinels structure the code differently.

Given how a lexer works (and I have been working on a lexer off and on 
recently), the only real difference is that you'd just use a couple of static 
ifs like

static if(!isSomeString!R)
{
   if(range.empty)
       break; //or whatever you do at the end
}

static if(isSomeString!R)
{
   case 0:
       break; //or whatever you do at the end
}

So, in the case of a lexer, I don't see sentinel ranges as buying us much. You 
end up having to wrap most any range that you pass to the lexer or whatever 
(including strings so that they'll pass isSentinelRange), you lose out on any 
optimizations of any functions that you call which special-case strings 
(though there probably wouldn't be many of those in a lexer), and all you 
avoid is a couple of static ifs.

The idea of sentinels certainly isn't useless, but anything caring about that 
sort of speed is likely to just use strings or arrays, and those can trivially 
be special cased to avoid unnecessary empty checks and to add the check for 
the sentinel, making the whole sentinel range idea an unnecessary complication 
IMHO.

- Jonathan M Davis
February 28, 2013
Re: Proposal for SentinelInputRange
On 2013-02-28 06:13, Walter Bright wrote:

> No. Look at how lexer.c is fed data.

C strings have a sentinel from the beginning, '\0'.

-- 
/Jacob Carlborg
February 28, 2013
Re: Proposal for SentinelInputRange
On 2013-02-28 08:23, Walter Bright wrote:

> There is always a terminating 0, even if the file ends in 0x1a.
>
> (The 0x1A comes from some old editors that end a file with a control Z.)

http://dlang.org/lex.html, "End of File" shows this:

EndOfFile:
    physical end of the file
    \u0000
    \u001A

Doesn't that mean that the input is empty if the file is empty, \u0000 
or \u001A is encountered?

-- 
/Jacob Carlborg
February 28, 2013
Re: Proposal for SentinelInputRange
On Thursday, 28 February 2013 at 01:12:19 UTC, Walter Bright 
wrote:
> Motivation:
> 1. easy conversion of C strings to ranges

This is already easy.

struct CString
{
    @property char front() { return *p; }
    void popFront() { ++p; }
    @property bool empty() const { return !*p; }
    char* p;
}

(Yep, rubbish implementation, missing save, constructors etc., 
but you get the idea).

> 2. necessary for a fast implementation of a lexer

This is very domain specific. Is a whole new range concept in the 
standard library necessary?

My worry here is that we are setting a precedent for adding new 
range concepts. If the only justification needed is that is saves 
a single operation in some niche area of computing then we are 
opening the door to a LOT of different range concepts.

Are there any other REAL uses for this other than in one line of 
a lexer implementation? I think inclusions into the standard 
library should require at least several distinct and realistic 
use cases.
February 28, 2013
Re: Proposal for SentinelInputRange
On 2/27/2013 11:37 PM, deadalnix wrote:
> It does check for sentinel in many places (and it has to).

No, it doesn't. I suggest examining line 481 of lexer.c. Then examine line 558 
and 559.

https://github.com/D-Programming-Language/dmd/blob/master/src/lexer.c
1 2 3 4 5 6 7 8
Top | Discussion index | About this forum | D home