April 09, 2015
Am 08.04.2015 um 20:55 schrieb Iain Buclaw via Digitalmars-d:
> On 8 April 2015 at 20:32, tcha via Digitalmars-d
> <digitalmars-d@puremagic.com> wrote:
>> (...)
>> Also tried to dustmite the minimal failing version and here is a result:
>> http://pastebin.com/YjdvT3G4
>>
>> It's my first use of it so I hope it can help to solve this problem. It
>> outputs less errors, but also compiles fine in debug and fails to link in
>> release.

I've filed two tickets for linker errors (and added a workaround for the first one):
https://issues.dlang.org/show_bug.cgi?id=14425
https://issues.dlang.org/show_bug.cgi?id=14429

I'll try to reduce the pastebin reduced sample further, too, as it looks like it has yet another root cause.

>
> Frankly, if we are not as fast (or elegant) as Python's json library,
> it should be thrown out back to the drawing board.
>
> Iain.
>

As far as the profiler results can be trusted, a good chunk of the time gets spent for reading individual bytes from memory, but there must be something else low-level going on that make things this bad. However, there is nothing fundamental in the structure/design that would cause this, so I think spending more time with the profiler is the only logical step now. Unfortunately my VTune license has expired and perf on Linux makes the task quite a bit more involved.

If we want to be really fast, though, we need to add optimized SIMD paths, but this is currently outside of the possibilities of my time budget.
April 09, 2015
Am 09.04.2015 um 10:59 schrieb Sönke Ludwig:
> Am 08.04.2015 um 20:55 schrieb Iain Buclaw via Digitalmars-d:
>> On 8 April 2015 at 20:32, tcha via Digitalmars-d
>> <digitalmars-d@puremagic.com> wrote:
>>> (...)
>>> Also tried to dustmite the minimal failing version and here is a result:
>>> http://pastebin.com/YjdvT3G4
>>>
>>> It's my first use of it so I hope it can help to solve this problem. It
>>> outputs less errors, but also compiles fine in debug and fails to
>>> link in
>>> release.
>
> I've filed two tickets for linker errors (and added a workaround for the
> first one):
> https://issues.dlang.org/show_bug.cgi?id=14425
> https://issues.dlang.org/show_bug.cgi?id=14429
>
> I'll try to reduce the pastebin reduced sample further, too, as it looks
> like it has yet another root cause.
>

Actually it seems to be the first issue (14425). I've already added a workaround for that on GIT master.
April 09, 2015
On 04/08/2015 08:32 PM, tcha wrote:

Now with release numbers.

> D new - debug - 14.98s, 1782.0Mb
8.53s, 1786.8Mb
> D new Gdc - debug - 29.08s, 1663.9Mb
GDC still misses @nogc support.
> D new Ldc - 16.99s, 1663.0Mb
18.76s, 1664.1Mb
> D new lazy - debug - 11.50s, 213.2Mb
4.57s, 206Mb
> D new lazy Gdc - 13.66s, 206.1Mb
Can't compile stdx.data.json with gdc-4.9.0 which doesn't yet support @nogc.
> D new lazy Ldc - 3.59s, 205.4Mb
4.0s, 205.4Mb

LDC doesn't yet have the GC improvements, therefor is much slower for the DOM parsing benchmarks.
April 09, 2015
On Thursday, 9 April 2015 at 11:49:00 UTC, Martin Nowak wrote:
> On 04/08/2015 08:32 PM, tcha wrote:
>
> Now with release numbers.
>
>> D new - debug - 14.98s, 1782.0Mb
> 8.53s, 1786.8Mb
>> D new Gdc - debug - 29.08s, 1663.9Mb
> GDC still misses @nogc support.
>> D new Ldc - 16.99s, 1663.0Mb
> 18.76s, 1664.1Mb
>> D new lazy - debug - 11.50s, 213.2Mb
> 4.57s, 206Mb
>> D new lazy Gdc - 13.66s, 206.1Mb
> Can't compile stdx.data.json with gdc-4.9.0 which doesn't yet support @nogc.
>> D new lazy Ldc - 3.59s, 205.4Mb
> 4.0s, 205.4Mb
>
> LDC doesn't yet have the GC improvements, therefor is much slower for
> the DOM parsing benchmarks.

Still getting trounced across the board by rapidjson.
April 09, 2015
On 04/09/2015 02:10 PM, John Colvin wrote:
> 
> Still getting trounced across the board by rapidjson.

Yep, anyone knows why? They don't even use a lazy parser.
April 09, 2015
On 04/08/2015 03:56 PM, Sönke Ludwig wrote:
>>
> 
> The problem is that even the pull parser alone is relatively slow. Also, for some reason the linker reports unresolved symbols as soon as I build without the -debug flag...

The review hasn't yet started and I'm already against the "stream"
parser, because it hardly deserves the names parser, it's more like a lexer.

Because the benchmark code by tcha was a very specific hack for the used data structure, I tried to write a proper stream parser to have a fair comparison. This is where I stopped (it doesn't work).

http://dpaste.dzfl.pl/8282d70a1254

The biggest problem with that interface is, that you have to count matching start/end markers for objects and arrays in order to skip an entry, not much fun and definitely needs a dedicated skip value function.

There are 2 very nice alternative approaches in the benchmark repo.

https://github.com/kostya/benchmarks/blob/master/json/test_pull.cr https://github.com/kostya/benchmarks/blob/master/json/test_schema.cr
April 09, 2015
On 9 April 2015 at 13:48, Martin Nowak via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On 04/08/2015 08:32 PM, tcha wrote:
>
> Now with release numbers.
>
>> D new - debug - 14.98s, 1782.0Mb
> 8.53s, 1786.8Mb
>> D new Gdc - debug - 29.08s, 1663.9Mb
> GDC still misses @nogc support.


Wasn't @nogc introduced in 2.066 ?

https://github.com/D-Programming-GDC/GDC/blob/master/gcc/d/VERSION
April 09, 2015
On Thursday, 9 April 2015 at 12:16:43 UTC, Martin Nowak wrote:
> On 04/09/2015 02:10 PM, John Colvin wrote:
>> 
>> Still getting trounced across the board by rapidjson.
>
> Yep, anyone knows why? They don't even use a lazy parser.

simd optimized scanning and format-optimized inlined conversion
April 09, 2015
Am 09.04.2015 um 14:25 schrieb Martin Nowak:
> On 04/08/2015 03:56 PM, Sönke Ludwig wrote:
>>>
>>
>> The problem is that even the pull parser alone is relatively slow. Also,
>> for some reason the linker reports unresolved symbols as soon as I build
>> without the -debug flag...
>
> The review hasn't yet started and I'm already against the "stream"
> parser, because it hardly deserves the names parser, it's more like a lexer.
>
> Because the benchmark code by tcha was a very specific hack for the used
> data structure, I tried to write a proper stream parser to have a fair
> comparison. This is where I stopped (it doesn't work).
>
> http://dpaste.dzfl.pl/8282d70a1254
>
> The biggest problem with that interface is, that you have to count
> matching start/end markers for objects and arrays in order to skip an
> entry, not much fun and definitely needs a dedicated skip value function.
>
> There are 2 very nice alternative approaches in the benchmark repo.
>
> https://github.com/kostya/benchmarks/blob/master/json/test_pull.cr
> https://github.com/kostya/benchmarks/blob/master/json/test_schema.cr
>

That would be a nice intermediate level parser. However, the range based parser as it is now is also useful for things like automated deserialization (which I don't want to include at this point) where you don't know in which order fields come in and where you have to use that style of filtering through the data anyway.

But if inlining works properly, it should be no problem to implement other APIs on top of it.
April 09, 2015
On 04/09/2015 10:59 AM, Sönke Ludwig wrote:
> As far as the profiler results can be trusted, a good chunk of the time gets spent for reading individual bytes from memory, but there must be something else low-level going on that make things this bad. However, there is nothing fundamental in the structure/design that would cause this, so I think spending more time with the profiler is the only logical step now. Unfortunately my VTune license has expired and perf on Linux makes the task quite a bit more involved.

I didn't found too many issues.

Most of the time is spent in parseJSONValue (looks like there are some
expansive struct copies)
https://github.com/s-ludwig/std_data_json/blob/1da3f828ae6c4fd7cac7f7e13ae9e51ec93e6f02/source/stdx/data/json/parser.d#L148

and skipWhitespace. This function could really take some optimization, e.g. avoid UTF decoding.

https://github.com/s-ludwig/std_data_json/blob/1da3f828ae6c4fd7cac7f7e13ae9e51ec93e6f02/source/stdx/data/json/lexer.d#L345

Libdparse has some optimized ASM function, might be useful. https://github.com/Hackerpilot/libdparse/blob/51b7d9d321aac0fcc4a9be99bbbed5db3158326c/src/std/d/lexer.d#L2233