February 08, 2013
On 8 February 2013 16:41, Brian Schott <briancschott@gmail.com> wrote:

> On Friday, 8 February 2013 at 16:38:00 UTC, Iain Buclaw wrote:
>
>> I see we could be doing this all day. :þ
>>
>
> We could.
>
>
>  I'll lay down the hint, dmd compiles the source, not gcc. And although gcc
>> may be invoked during a certain special stage of compilation, its actually just a driver to call a certain toolchain program that is outside of gcc. :-)
>>
>
> What we're saying is that dmd, The Digital Mars D Compiler, is written in C++ and is thus built by GCC/G++. We can tell by looking at the Makefile that DMD doesn't build itself.
>

That still has nothing to do with how dmd links D programs. ;)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';


February 08, 2013
On Friday, February 08, 2013 14:29:20 Dmitry Olshansky wrote:
> If target is random access range just use offset throughout. It's basically becomes base + offset vs base + pointer i.e. non-issue

> If not then pointer argument no longer applies and you can just as well use separate counter on per popFront. It'd not that costly in this case and flexibility tramps other concerns with forward ranges in any case.

I don't know exactly how costly it ends up being (and certainly, poor design choices in other areas could dwarf that cost), but it does incur extra overhead throughout the lexing process. In most code, it wouldn't be a big deal, but the lexer is trying to be lightning fast, so every extra bit like that is going to add up and slow it down. But you really don't have any choice if you don't even have random access. Regardless, my point is that accepting generic ranges rather than pointers complicates things somewhat and does incur at least a slight performance penalty.

- Jonathan M Davis
February 08, 2013
08-Feb-2013 23:25, Jonathan M Davis пишет:
> On Friday, February 08, 2013 14:29:20 Dmitry Olshansky wrote:
>> If target is random access range just use offset throughout. It's
>> basically becomes base + offset vs base + pointer i.e. non-issue
>
>> If not then pointer argument no longer applies and you can just as well
>> use separate counter on per popFront. It'd not that costly in this case
>> and flexibility tramps other concerns with forward ranges in any case.
>
> I don't know exactly how costly it ends up being (and certainly, poor design
> choices in other areas could dwarf that cost), but it does incur extra
> overhead throughout the lexing process.
> In most code, it wouldn't be a big
> deal, but the lexer is trying to be lightning fast, so every extra bit like
> that is going to add up and slow it down.

I suspect it *might* require extra register for base depending on smartness of the compiler, extra register in turn could rise the cost.

The key to speed here however is not in using raw pointers, assembly and/or SIMD. FWIW if there is only a single base pointer + offsets compiler can assume no pointer aliasing and optimize things better.

The discussion becomes purely rhetorical unless some hard data is actually presented and not based on DMD's optimizer, please.

> But you really don't have any choice
> if you don't even have random access.

Agreed.

> Regardless, my point is that accepting
> generic ranges rather than pointers complicates things somewhat and does incur
> at least a slight performance penalty.
>

Complication - yes, slight performance cost is what I doubt it in RA case. Seems like a compiler/optimizer issue.

For one I used to write cycles with naked pointers in C to get better performance out of crappy compilers. I won't ever do it today as it would get worse performance in addition to being less readable (I've checked at least on VC++ about a year ago, compiler rewrites these index-based loops better, YMMV).


-- 
Dmitry Olshansky
February 08, 2013
08-Feb-2013 20:51, Iain Buclaw пишет:
> On 8 February 2013 16:41, Brian Schott <briancschott@gmail.com
> <mailto:briancschott@gmail.com>> wrote:
>
>     On Friday, 8 February 2013 at 16:38:00 UTC, Iain Buclaw wrote:
>
>         I see we could be doing this all day. :þ
>
>
>     We could.
>
>
>         I'll lay down the hint, dmd compiles the source, not gcc. And
>         although gcc
>         may be invoked during a certain special stage of compilation,
>         its actually
>         just a driver to call a certain toolchain program that is
>         outside of gcc.
>         :-)
>
>
>     What we're saying is that dmd, The Digital Mars D Compiler, is
>     written in C++ and is thus built by GCC/G++. We can tell by looking
>     at the Makefile that DMD doesn't build itself.
>
>
> That still has nothing to do with how dmd links D programs. ;)

We've been discussing DMD's lexer written in C++ that is obviously compiled using the default compiler on target platform. That's what is being measured the good ol' C++ lexer vs D lexer. I don't see how the way DMD does linking is related to what tool is used to actually build it.


-- 
Dmitry Olshansky
February 08, 2013
On 2013-02-08 17:35, Brian Schott wrote:
> GDC decided to randomly not fail to build on my system, so it's time for MOAR
> CHARTS.

Interesting. Seems that LDC is faster than GDC but has a bigger start overhead.

OT: Why is the chart's legend in reverse order? :P
February 08, 2013
On Friday, February 08, 2013 23:50:13 Dmitry Olshansky wrote:
> Complication - yes, slight performance cost is what I doubt it in RA case. Seems like a compiler/optimizer issue.

The main problem isn't RA ranges. It's quite probable that the optimizer takes care of any minor additional overhead that's incurred there. The issue is really pure forward ranges, because you're forced to count the code units as they're processed (or even save the range as you go along, given how expensive it would be to go find the index even if you have it). But we really have no way of knowing how bad it is without hard data. It would certainly be trivial to end up doing other stuff in the implementation which cost far more.

- Jonathan M Davis
February 08, 2013
On Friday, 8 February 2013 at 20:41:52 UTC, FG wrote:
> On 2013-02-08 17:35, Brian Schott wrote:
>> GDC decided to randomly not fail to build on my system, so it's time for MOAR
>> CHARTS.
>
> Interesting. Seems that LDC is faster than GDC but has a bigger start overhead.
>
> OT: Why is the chart's legend in reverse order? :P

It seems like that, but the differences are small enough that you can't make reliable statements without having a closer look at the statistics.

David,
who dreams of a world in which every graph has (at least) error bars
February 09, 2013
Am 08.02.2013 17:35, schrieb Brian Schott:
> On Friday, 8 February 2013 at 15:35:55 UTC, Dmitry Olshansky
> wrote:
>> Would be intereesting to try gdc as dmd on linux uses gcc.
>
> GDC decided to randomly not fail to build on my system, so it's
> time for MOAR CHARTS.
>
> dmd -inline -noboundscheck -O -w -wi -m64 -property
> ldc2 -O2 -release -vectorize -m64
> gdc -O3 -fno-bounds-check -frelease -m64
> http://hackerpilot.github.com/experimental/std_lexer/images/times3-all.png
>

i still don't get what you comparing here

you benchmarking something like an mixup of code-generation, startup-times and filesystem behavior + lexing time (+ lexing output style)

and this all makes only sense if you lexer produces the very same output (so it can be out of the box used as an replacement) as the dmd lexer - for beeing compareable

or aren't you in the phase of detail benchmarking/profiling?
February 09, 2013
On Wednesday, 6 February 2013 at 03:51:33 UTC, Andrei Alexandrescu wrote:
> I think it would be reasonable for a lexer to require a range of ubyte as input, and carry its own decoding. In the first approximation it may even require a random-access range of ubyte.
>

Playing around that, I discovered a bug in std.utf : slice and other range are not threated the same way by decodeFront, which is rather problematic. Jonathan also hit that bug : http://d.puremagic.com/issues/show_bug.cgi?id=9456

That make the lexer hard to write with a consistent behavior for InputRanges.

The bug probably exists in everything that rely on decodeFront at some point.
February 09, 2013
On Friday, 8 February 2013 at 20:54:32 UTC, Jonathan M Davis wrote:
> On Friday, February 08, 2013 23:50:13 Dmitry Olshansky wrote:
>> Complication - yes, slight performance cost is what I doubt it in RA
>> case. Seems like a compiler/optimizer issue.
>
> The main problem isn't RA ranges. It's quite probable that the optimizer takes
> care of any minor additional overhead that's incurred there. The issue is
> really pure forward ranges, because you're forced to count the code units as
> they're processed (or even save the range as you go along, given how expensive
> it would be to go find the index even if you have it). But we really have no
> way of knowing how bad it is without hard data. It would certainly be trivial
> to end up doing other stuff in the implementation which cost far more.
>

For pure InputRanges, that is pretty bad as the decoding of UTF chars basically have to be done 2 times, and each codepoint popped individually, or the lexer have to embed its own homebrew version of std.utf .