Request for comments: std.d.lexer (page 16) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Request for comments: std.d.lexer (page 16)

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by Iain Buclaw
in reply to Brian Schott

Iain Buclaw

Posted in reply to Brian Schott

Attachments:

text/html part

On 8 February 2013 16:41, Brian Schott <briancschott@gmail.com> wrote:

> On Friday, 8 February 2013 at 16:38:00 UTC, Iain Buclaw wrote:
>
>> I see we could be doing this all day. :þ
>>
>
> We could.
>
>
>  I'll lay down the hint, dmd compiles the source, not gcc. And although gcc
>> may be invoked during a certain special stage of compilation, its actually just a driver to call a certain toolchain program that is outside of gcc. :-)
>>
>
> What we're saying is that dmd, The Digital Mars D Compiler, is written in C++ and is thus built by GCC/G++. We can tell by looking at the Makefile that DMD doesn't build itself.
>

That still has nothing to do with how dmd links D programs. ;)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by Jonathan M Davis
in reply to Dmitry Olshansky

Jonathan M Davis

Posted in reply to Dmitry Olshansky

On Friday, February 08, 2013 14:29:20 Dmitry Olshansky wrote:
> If target is random access range just use offset throughout. It's basically becomes base + offset vs base + pointer i.e. non-issue

> If not then pointer argument no longer applies and you can just as well use separate counter on per popFront. It'd not that costly in this case and flexibility tramps other concerns with forward ranges in any case.

I don't know exactly how costly it ends up being (and certainly, poor design choices in other areas could dwarf that cost), but it does incur extra overhead throughout the lexing process. In most code, it wouldn't be a big deal, but the lexer is trying to be lightning fast, so every extra bit like that is going to add up and slow it down. But you really don't have any choice if you don't even have random access. Regardless, my point is that accepting generic ranges rather than pointers complicates things somewhat and does incur at least a slight performance penalty.

- Jonathan M Davis

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by Dmitry Olshansky
in reply to Jonathan M Davis

Dmitry Olshansky

Posted in reply to Jonathan M Davis

08-Feb-2013 23:25, Jonathan M Davis пишет:
> On Friday, February 08, 2013 14:29:20 Dmitry Olshansky wrote:
>> If target is random access range just use offset throughout. It's
>> basically becomes base + offset vs base + pointer i.e. non-issue
>
>> If not then pointer argument no longer applies and you can just as well
>> use separate counter on per popFront. It'd not that costly in this case
>> and flexibility tramps other concerns with forward ranges in any case.
>
> I don't know exactly how costly it ends up being (and certainly, poor design
> choices in other areas could dwarf that cost), but it does incur extra
> overhead throughout the lexing process.
> In most code, it wouldn't be a big
> deal, but the lexer is trying to be lightning fast, so every extra bit like
> that is going to add up and slow it down.

I suspect it *might* require extra register for base depending on smartness of the compiler, extra register in turn could rise the cost.

The key to speed here however is not in using raw pointers, assembly and/or SIMD. FWIW if there is only a single base pointer + offsets compiler can assume no pointer aliasing and optimize things better.

The discussion becomes purely rhetorical unless some hard data is actually presented and not based on DMD's optimizer, please.

> But you really don't have any choice
> if you don't even have random access.

Agreed.

> Regardless, my point is that accepting
> generic ranges rather than pointers complicates things somewhat and does incur
> at least a slight performance penalty.
>

Complication - yes, slight performance cost is what I doubt it in RA case. Seems like a compiler/optimizer issue.

For one I used to write cycles with naked pointers in C to get better performance out of crappy compilers. I won't ever do it today as it would get worse performance in addition to being less readable (I've checked at least on VC++ about a year ago, compiler rewrites these index-based loops better, YMMV).

-- 
Dmitry Olshansky

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by Dmitry Olshansky
in reply to Iain Buclaw

Dmitry Olshansky

Posted in reply to Iain Buclaw

08-Feb-2013 20:51, Iain Buclaw пишет:
> On 8 February 2013 16:41, Brian Schott <briancschott@gmail.com
> <mailto:briancschott@gmail.com>> wrote:
>
>     On Friday, 8 February 2013 at 16:38:00 UTC, Iain Buclaw wrote:
>
>         I see we could be doing this all day. :þ
>
>
>     We could.
>
>
>         I'll lay down the hint, dmd compiles the source, not gcc. And
>         although gcc
>         may be invoked during a certain special stage of compilation,
>         its actually
>         just a driver to call a certain toolchain program that is
>         outside of gcc.
>         :-)
>
>
>     What we're saying is that dmd, The Digital Mars D Compiler, is
>     written in C++ and is thus built by GCC/G++. We can tell by looking
>     at the Makefile that DMD doesn't build itself.
>
>
> That still has nothing to do with how dmd links D programs. ;)

We've been discussing DMD's lexer written in C++ that is obviously compiled using the default compiler on target platform. That's what is being measured the good ol' C++ lexer vs D lexer. I don't see how the way DMD does linking is related to what tool is used to actually build it.


-- 
Dmitry Olshansky

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by FG
in reply to Brian Schott

FG

Posted in reply to Brian Schott

On 2013-02-08 17:35, Brian Schott wrote:
> GDC decided to randomly not fail to build on my system, so it's time for MOAR
> CHARTS.

Interesting. Seems that LDC is faster than GDC but has a bigger start overhead.

OT: Why is the chart's legend in reverse order? :P

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by Jonathan M Davis
in reply to Dmitry Olshansky

Jonathan M Davis

Posted in reply to Dmitry Olshansky

On Friday, February 08, 2013 23:50:13 Dmitry Olshansky wrote:
> Complication - yes, slight performance cost is what I doubt it in RA case. Seems like a compiler/optimizer issue.

The main problem isn't RA ranges. It's quite probable that the optimizer takes care of any minor additional overhead that's incurred there. The issue is really pure forward ranges, because you're forced to count the code units as they're processed (or even save the range as you go along, given how expensive it would be to go find the index even if you have it). But we really have no way of knowing how bad it is without hard data. It would certainly be trivial to end up doing other stuff in the implementation which cost far more.

- Jonathan M Davis

February 08, 2013

Re: Request for comments: std.d.lexer

Posted by David Nadlinger
in reply to FG

David Nadlinger

Posted in reply to FG

On Friday, 8 February 2013 at 20:41:52 UTC, FG wrote:
> On 2013-02-08 17:35, Brian Schott wrote:
>> GDC decided to randomly not fail to build on my system, so it's time for MOAR
>> CHARTS.
>
> Interesting. Seems that LDC is faster than GDC but has a bigger start overhead.
>
> OT: Why is the chart's legend in reverse order? :P

It seems like that, but the differences are small enough that you can't make reliable statements without having a closer look at the statistics.

David,
who dreams of a world in which every graph has (at least) error bars

February 09, 2013

Re: Request for comments: std.d.lexer

Posted by dennis luehring
in reply to Brian Schott

dennis luehring

Posted in reply to Brian Schott

Am 08.02.2013 17:35, schrieb Brian Schott:
> On Friday, 8 February 2013 at 15:35:55 UTC, Dmitry Olshansky
> wrote:
>> Would be intereesting to try gdc as dmd on linux uses gcc.
>
> GDC decided to randomly not fail to build on my system, so it's
> time for MOAR CHARTS.
>
> dmd -inline -noboundscheck -O -w -wi -m64 -property
> ldc2 -O2 -release -vectorize -m64
> gdc -O3 -fno-bounds-check -frelease -m64
> http://hackerpilot.github.com/experimental/std_lexer/images/times3-all.png
>

i still don't get what you comparing here

you benchmarking something like an mixup of code-generation, startup-times and filesystem behavior + lexing time (+ lexing output style)

and this all makes only sense if you lexer produces the very same output (so it can be out of the box used as an replacement) as the dmd lexer - for beeing compareable

or aren't you in the phase of detail benchmarking/profiling?

February 09, 2013

Re: Request for comments: std.d.lexer

Posted by deadalnix
in reply to Andrei Alexandrescu

deadalnix

Posted in reply to Andrei Alexandrescu

On Wednesday, 6 February 2013 at 03:51:33 UTC, Andrei Alexandrescu wrote:
> I think it would be reasonable for a lexer to require a range of ubyte as input, and carry its own decoding. In the first approximation it may even require a random-access range of ubyte.
>

Playing around that, I discovered a bug in std.utf : slice and other range are not threated the same way by decodeFront, which is rather problematic. Jonathan also hit that bug : http://d.puremagic.com/issues/show_bug.cgi?id=9456

That make the lexer hard to write with a consistent behavior for InputRanges.

The bug probably exists in everything that rely on decodeFront at some point.

February 09, 2013

Re: Request for comments: std.d.lexer

Posted by deadalnix
in reply to Jonathan M Davis

deadalnix

Posted in reply to Jonathan M Davis

On Friday, 8 February 2013 at 20:54:32 UTC, Jonathan M Davis wrote:
> On Friday, February 08, 2013 23:50:13 Dmitry Olshansky wrote:
>> Complication - yes, slight performance cost is what I doubt it in RA
>> case. Seems like a compiler/optimizer issue.
>
> The main problem isn't RA ranges. It's quite probable that the optimizer takes
> care of any minor additional overhead that's incurred there. The issue is
> really pure forward ranges, because you're forced to count the code units as
> they're processed (or even save the range as you go along, given how expensive
> it would be to go find the index even if you have it). But we really have no
> way of knowing how bad it is without hard data. It would certainly be trivial
> to end up doing other stuff in the implementation which cost far more.
>

For pure InputRanges, that is pretty bad as the decoding of UTF chars basically have to be done 2 times, and each codepoint popped individually, or the lexer have to embed its own homebrew version of std.utf .

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation