View mode: basic / threaded / horizontal-split · Log in · Help
August 07, 2012
Re: std.d.lexer requirements
On 8/7/2012 1:14 AM, Jonathan M Davis wrote:
> But you can also configure the lexer to return an error token instead of using
> the delegate if that's what you prefer. But Walter is right in that if you
> have to check every token for whether it's an error, that will incur overhead.
> So, depending on your use case, that could be unacceptable.

It's not just overhead - it's just plain ugly to constantly check for error 
tokens. It's also tedious and error prone to insert those checks.

I don't see any advantage to it.
August 07, 2012
Re: std.d.lexer requirements
On Tuesday, August 07, 2012 02:54:42 Walter Bright wrote:
> On 8/7/2012 1:14 AM, Jonathan M Davis wrote:
> > But you can also configure the lexer to return an error token instead of
> > using the delegate if that's what you prefer. But Walter is right in that
> > if you have to check every token for whether it's an error, that will
> > incur overhead. So, depending on your use case, that could be
> > unacceptable.
> 
> It's not just overhead - it's just plain ugly to constantly check for error
> tokens. It's also tedious and error prone to insert those checks.
> 
> I don't see any advantage to it.

It's easier to see where in the range of tokens the errors occur. A delegate 
is disconnected from the point where the range is being consumed, whereas if 
tokens are used for errors, then the function consuming the range can see 
exactly where in the range of tokens the error is (and potentially handle it 
differently based on that information).

Regardless, I was asked to keep that option in there by at least one person 
(Philippe Sigaud IIRC), which is why I didn't just switch over to the delegate 
entirely.

- Jonathan M Davis
August 07, 2012
Re: std.d.lexer requirements
Walter Bright , dans le message (digitalmars.D:174393), a écrit :
> If the delegate returns, then the lexer recovers.

That's an option, if there is only one way to recover (which is a 
reasonable assumption).

You wanted the delegate to "decide what to do with the errors (ignore, 
throw exception, or quit)".

Throwing is handled, but not ignore/quit. Jonathan's solution (delegate 
returning a bool) is good. It could also be a delegate returning an int, 
0 meaning continue, and any other value being an error code that can be 
retrieved later. It could also be a number of characters to skip (0 
meaning break).
August 07, 2012
Re: std.d.lexer requirements
Walter Bright , dans le message (digitalmars.D:174394), a écrit :
> On 8/7/2012 1:14 AM, Jonathan M Davis wrote:
>> But you can also configure the lexer to return an error token instead of using
>> the delegate if that's what you prefer. But Walter is right in that if you
>> have to check every token for whether it's an error, that will incur overhead.
>> So, depending on your use case, that could be unacceptable.
> 
> It's not just overhead - it's just plain ugly to constantly check for error 
> tokens. It's also tedious and error prone to insert those checks.

It's not necessarily ugly, because of the powerful range design. You can 
branch the lexer to a range adapter that just ignore error tokens, or 
throw when it meats an error token.

For example, just use:
auto tokens = data.lexer.throwOnErrorToken;

I don't think this is more ugly than:
auto tokens = data.lexer!(complex signature) { throw LexException; };

But yes, there is overhead, so I understand returning error tokens is 
not satisfactory for everyone.

> I don't see any advantage to it.

Storing the error somewhere can be of use.
For example, you may want to lex a whole file into an array of tokens, 
and then deal with you errors as you analyse the array of tokens. 
Of course, you can alway make a delegate to store the error somewhere, 
but it is easier if this somewhere is in your token pile.

What I don't see any advantage is using a delegate that can only return 
or throw. A policy makes the job:
auto tokens = data.lexer!ExceptionPolicy.throwException;
That's clean too.

If you want the delegate to be of any use, then it must have 
data to process. That's why I said we have to worry about the 
signature of the delegate.

-- 
Christophe
August 07, 2012
Re: std.d.lexer requirements
On Tue, Aug 7, 2012 at 12:06 PM, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> Regardless, I was asked to keep that option in there by at least one person
> (Philippe Sigaud IIRC), which is why I didn't just switch over to the delegate
> entirely.

IIRC, I was not the only one, as people here interested in coding an
IDE asked for it too. A lexer is useful for more than 'just' parsing D
afterwards: an IDE could easily color tokens according to their type
and an error token is just was is needed to highlight errors.

Also, what I proposed was a *static* decision: with SkipErrors { no,
yes }. With a static if inside its guts, the lexer could change its
behavior accordingly. Make skipError.yes the default and Walter get
its speed. It's just that an IDE or another parser could use

auto lex = std.lexer.Lexer!(SkipError.no)(input);


Walter, with all due respect, you sometimes give the impression to
forget we are talking about D and go back to deeply entrenched C-isms.
Compile-time decisions can be used to avoid any overhead as long as
you have a clear idea of what the two code paths should look like.

And, as Christophe said, ranges are a powerful API. In another thread
Simen and me did some comparison between C-like code and code using
only ranges upon ranges upon ranges. A (limited!) difference in speed
appeared only for very long calculations.
August 07, 2012
Re: std.d.lexer requirements
On 2012-08-07 12:06, Jonathan M Davis wrote:

> It's easier to see where in the range of tokens the errors occur. A delegate
> is disconnected from the point where the range is being consumed, whereas if
> tokens are used for errors, then the function consuming the range can see
> exactly where in the range of tokens the error is (and potentially handle it
> differently based on that information).

Just pass the same token to the delegate that you would have returned 
otherwise?

-- 
/Jacob Carlborg
August 07, 2012
Re: std.d.lexer requirements
On 8/7/2012 3:06 AM, Jonathan M Davis wrote:
> It's easier to see where in the range of tokens the errors occur. A delegate
> is disconnected from the point where the range is being consumed, whereas if
> tokens are used for errors, then the function consuming the range can see
> exactly where in the range of tokens the error is (and potentially handle it
> differently based on that information).

The delegate has a context pointer giving it a reference to whatever context the 
code calling the Lexer needs.
August 07, 2012
Re: std.d.lexer requirements
On 8/7/2012 7:15 AM, Philippe Sigaud wrote:
> Also, what I proposed was a *static* decision: with SkipErrors { no,
> yes }. With a static if inside its guts, the lexer could change its
> behavior accordingly.

Yes, I understand about static if decisions :-) hell I invented them!


> Walter, with all due respect, you sometimes give the impression to
> forget we are talking about D and go back to deeply entrenched C-isms.

Delegates are not C-isms.


> Compile-time decisions can be used to avoid any overhead as long as
> you have a clear idea of what the two code paths should look like.

Yes, I understand that. There's also a point about adding too much complexity to 
the interface. The delegate callback reduces complexity in the interface.

> And, as Christophe said, ranges are a powerful API. In another thread
> Simen and me did some comparison between C-like code and code using
> only ranges upon ranges upon ranges. A (limited!) difference in speed
> appeared only for very long calculations.

That's good, and you really don't need to sell me on ranges - I'm already sold.
August 07, 2012
Re: std.d.lexer requirements
On Tue, Aug 7, 2012 at 9:38 PM, Walter Bright
<newshound2@digitalmars.com> wrote:

> Yes, I understand about static if decisions :-) hell I invented them!

And what a wonderful decision that was!

> Yes, I understand that. There's also a point about adding too much
> complexity to the interface. The delegate callback reduces complexity in the
> interface.

OK, then let's let Jonathan work, and we will see how it goes.


>> And, as Christophe said, ranges are a powerful API. In another thread
>> Simen and me did some comparison between C-like code and code using
>> only ranges upon ranges upon ranges. A (limited!) difference in speed
>> appeared only for very long calculations.
>
>
> That's good, and you really don't need to sell me on ranges - I'm already
> sold.

Well, you gave the impression a bit upstream in this thread that
having to filter a token range to eliminate errors was an atrocity
(millions of tokens!).

As far as I'm concerned, the recent good news was to (re?)discover
than complex calls of ranges upon ranges could still be calculated by
CTFE. That's really neat.
August 07, 2012
Re: std.d.lexer requirements
On Tuesday, August 07, 2012 12:38:26 Walter Bright wrote:
> Yes, I understand that. There's also a point about adding too much
> complexity to the interface. The delegate callback reduces complexity in
> the interface.

It doesn't really affect much to allow choosing between returning a token and 
using a delegate, especially if ignoring errors is treated as a separate 
option rather than simply using a delegate that skips them (which may or may 
not be beneficial - it's faster without the delegate, but it's actually kind of 
hard to get lexing errors).

What worries me more is stuff like providing a way to have the range calculate 
the current position itself (as Christophe suggested IIRC) or having it 
provide an efficient way to determine the number of code units between two 
ranges so that you can slice the range lexed to put in the Token. Determining 
the number of code units is easily done with ptr for strings, but for 
everything else, you generally have to count as code units are consumed, which 
isn't really an issue for small tokens (especially those like symbols where 
the length is known without counting) but does add up for arbitrarily long 
ones such as comments or string literals. So, providing a way to calculate it 
more efficiently where possible might be desirable, but it's yet another layer 
of complication, and I don't know that it's actually possible to provide such 
a function in enough situations for it to be worth providing that 
functionality.

I expect that the configuration stuff is going to have to be adjusted after I'm 
done, since I'm not sure that it's entirely clear what's worth configuring or 
not.

- Jonathan M Davis
15 16 17 18 19 20
Top | Discussion index | About this forum | D home