Dscanner - It exists (page 3) - D Programming Language Discussion Forum

On 8/1/2012 10:35 AM, Walter Bright wrote: > I suggest proposing the D lexer as an addition to Phobos. But if that is done, > its interface would need to accept a range as input, and its output should be a > range of tokens. See the thread over in digitalmars.D about a proposed std.d.lexer.

On 8/1/2012 3:44 PM, Bernard Helyer wrote: > I would be concerned with potential performance ramifications, > though. As well you should be. A poorly constructed range can have terrible performance. But one thing to take careful note of: you *can* define a range that is nothing more than a pointer. Of course, you must be careful using such, because it won't be safe, but if performance overrides everything else, that option is available. And best of all, one could still supply a safe range to the same algorithm code, without changing any of it.

On 2012-08-01 22:20, Jonathan M Davis wrote: > If you want really good performance out of a range-based solution operating on > ranges of dchar, then you need to special case for the built-in string types > all over the place, and if you have to wrap them in other range types > (generally because of calling another range-based function), then there's a > good chance that you will indeed get a performance hit. D's range-based > approach is really nice from the perspective of usability, but you have to > work at it a bit if you want it to be efficient when operating on strings. It > _can_ be done though. Is it really worth it though? Most use cases will just be with regular strings? -- /Jacob Carlborg

On 2012-08-02 00:23, David wrote: >> I think the best way here is to define a BufferedRange that takes any >> other range and supplies a buffer for it (with the appropriate >> primitives) in a native array. >> >> Andrei > > Don't you think, this range stuff is overdone? Define some fancy Range > stuff, if an array just works perfectly? > > Ranges > Iterators, yes, but I think they are overdone. > > I think so. Some parts of the community seem to be obsessed about ranges. -- /Jacob Carlborg

On Thursday, August 02, 2012 08:18:39 Jacob Carlborg wrote: > On 2012-08-01 22:20, Jonathan M Davis wrote: > > If you want really good performance out of a range-based solution operating on ranges of dchar, then you need to special case for the built-in string types all over the place, and if you have to wrap them in other range types (generally because of calling another range-based function), then there's a good chance that you will indeed get a performance hit. D's range-based approach is really nice from the perspective of usability, but you have to work at it a bit if you want it to be efficient when operating on strings. It _can_ be done though. > > Is it really worth it though? Most use cases will just be with regular strings? It's really not all that hard to special case for strings, especially when you're operating primarily on code units. And I think that the lexer should be flexible enough to be usable with ranges other than strings. We're trying to make most stuff in Phobos range-based, not string-based or array-based. - Jonathan M Davis

On 2012-08-02 08:26, Jonathan M Davis wrote: > It's really not all that hard to special case for strings, especially when > you're operating primarily on code units. And I think that the lexer should be > flexible enough to be usable with ranges other than strings. We're trying to > make most stuff in Phobos range-based, not string-based or array-based. Ok. I just don't think it's worth giving up some performance or make the design overly complicated just to make a range interface. But if ranges doesn't cause these problems I'm happy. -- /Jacob Carlborg

August 02, 2012

Re: Dscanner - It exists

Posted by Jonathan M Davis
in reply to Jacob Carlborg

Permalink

Jonathan M Davis

Posted in reply to Jacob Carlborg

Permalink

On Thursday, August 02, 2012 08:51:26 Jacob Carlborg wrote:
> On 2012-08-02 08:26, Jonathan M Davis wrote:
> > It's really not all that hard to special case for strings, especially when you're operating primarily on code units. And I think that the lexer should be flexible enough to be usable with ranges other than strings. We're trying to make most stuff in Phobos range-based, not string-based or array-based.
> Ok. I just don't think it's worth giving up some performance or make the design overly complicated just to make a range interface. But if ranges doesn't cause these problems I'm happy.

A range-based function operating on strings without special-casing them often _will_ harm performance. But if you special-case them for strings, then you can avoid that performance penalty - especially if you can avoid having to decode any characters.

The result is that using range-based functions on strings is generally correct without the function writer (or the caller) having to worry about encodings and the like, but if they want to eke out all of the performance that they can, they need to go to the extra effort of special-casing the function for strings. Like much of D, it favors correctness/saftey but allows you to get full performance if you work at it a bit harder.

In the case of the lexer, it's really not all that bad - especially since string mixins allow me to give the operation that I need (e.g. get the first code unit) in the correct way for that particular range type without worrying about the details.

For instance, I have this function which I use to generate a mixin any time that I want to get the first code unit:

string declareFirst(R)()
    if(isForwardRange!R && is(Unqual!(ElementType!R) == dchar))
{
    static if(isNarrowString!R)
        return "Unqual!(ElementEncodingType!R) first = range[0];";
    else
        return "dchar first = range.front;";
}

So, every line using it becomes

mixin(declareFirst!R());

which really isn't any worse than

char c = str[0];

except that it works with more than just strings. Yes, it's more effort to get the lexer working with all ranges of dchar, but I don't think that it's all that much worse, it the result is much more flexible.

- Jonathan M Davis

On 2012-08-02 09:43, Jonathan M Davis wrote: > For instance, I have this function which I use to generate a mixin any time > that I want to get the first code unit: > > string declareFirst(R)() > if(isForwardRange!R && is(Unqual!(ElementType!R) == dchar)) > { > static if(isNarrowString!R) > return "Unqual!(ElementEncodingType!R) first = range[0];"; > else > return "dchar first = range.front;"; > } > > So, every line using it becomes > > mixin(declareFirst!R()); > > which really isn't any worse than > > char c = str[0]; > > except that it works with more than just strings. Yes, it's more effort to get > the lexer working with all ranges of dchar, but I don't think that it's all > that much worse, it the result is much more flexible. That doesn't look too bad. I'm quite happy :) -- /Jacob Carlborg

On 8/2/12 3:43 AM, Jonathan M Davis wrote: > A range-based function operating on strings without special-casing them often > _will_ harm performance. But if you special-case them for strings, then you > can avoid that performance penalty - especially if you can avoid having to > decode any characters. > > The result is that using range-based functions on strings is generally correct > without the function writer (or the caller) having to worry about encodings > and the like, but if they want to eke out all of the performance that they > can, they need to go to the extra effort of special-casing the function for > strings. Like much of D, it favors correctness/saftey but allows you to get > full performance if you work at it a bit harder. > > In the case of the lexer, it's really not all that bad - especially since > string mixins allow me to give the operation that I need (e.g. get the first > code unit) in the correct way for that particular range type without worrying > about the details. > > For instance, I have this function which I use to generate a mixin any time > that I want to get the first code unit: > > string declareFirst(R)() > if(isForwardRange!R&& is(Unqual!(ElementType!R) == dchar)) > { > static if(isNarrowString!R) > return "Unqual!(ElementEncodingType!R) first = range[0];"; > else > return "dchar first = range.front;"; > } > > So, every line using it becomes > > mixin(declareFirst!R()); > > which really isn't any worse than > > char c = str[0]; > > except that it works with more than just strings. Yes, it's more effort to get > the lexer working with all ranges of dchar, but I don't think that it's all > that much worse, it the result is much more flexible. I just posted in the .D forum a simple solution that is fast, uses general ranges, and has you avoid all of the hecatomb of code above. Andrei

Forums