October 13, 2014
Am 13.10.2014 16:36, schrieb Daniel Murphy:
> "Sönke Ludwig"  wrote in message news:m1ge08$10ub$1@digitalmars.com...
>
>> Oh, I've read "both line and column into a single uint", because of
>> "four words per token" - considering that "word == 16bit", but Andrei
>> obviously meant "word == (void*).sizeof". If simply using uint instead
>> of size_t is meant, then that's of course a different thing.
>
> I suppose a 4GB single-line json file is still possible.

If we make that assumption, we'd have to change it from size_t to ulong, but my feeling is that this case (format error at >4GB && human tries to look at that place using an editor) should be rare enough that we can make the compromise in favor of a smaller struct size.
October 13, 2014
On Monday, 13 October 2014 at 17:21:44 UTC, Sönke Ludwig wrote:
> Am 13.10.2014 16:36, schrieb Daniel Murphy:
>> "Sönke Ludwig"  wrote in message news:m1ge08$10ub$1@digitalmars.com...
>>
>>> Oh, I've read "both line and column into a single uint", because of
>>> "four words per token" - considering that "word == 16bit", but Andrei
>>> obviously meant "word == (void*).sizeof". If simply using uint instead
>>> of size_t is meant, then that's of course a different thing.
>>
>> I suppose a 4GB single-line json file is still possible.
>
> If we make that assumption, we'd have to change it from size_t to ulong, but my feeling is that this case (format error at
> >4GB && human tries to look at that place using an editor)
> should be rare enough that we can make the compromise in favor of a smaller struct size.

What are you using the location structs for?

In D:YAML they're only used for info about errors, so I use ushorts and ushort.max means "65535 or more".
October 13, 2014
On 10/13/14, 10:21 AM, Sönke Ludwig wrote:
> Am 13.10.2014 16:36, schrieb Daniel Murphy:
>> "Sönke Ludwig"  wrote in message news:m1ge08$10ub$1@digitalmars.com...
>>
>>> Oh, I've read "both line and column into a single uint", because of
>>> "four words per token" - considering that "word == 16bit", but Andrei
>>> obviously meant "word == (void*).sizeof". If simply using uint instead
>>> of size_t is meant, then that's of course a different thing.
>>
>> I suppose a 4GB single-line json file is still possible.
>
> If we make that assumption, we'd have to change it from size_t to ulong,
> but my feeling is that this case (format error at >4GB && human tries to
> look at that place using an editor) should be rare enough that we can
> make the compromise in favor of a smaller struct size.

Agreed. -- Andrei
October 13, 2014
Am 13.10.2014 19:40, schrieb Kiith-Sa:
> On Monday, 13 October 2014 at 17:21:44 UTC, Sönke Ludwig wrote:
>> Am 13.10.2014 16:36, schrieb Daniel Murphy:
>>> "Sönke Ludwig"  wrote in message news:m1ge08$10ub$1@digitalmars.com...
>>>
>>>> Oh, I've read "both line and column into a single uint", because of
>>>> "four words per token" - considering that "word == 16bit", but Andrei
>>>> obviously meant "word == (void*).sizeof". If simply using uint instead
>>>> of size_t is meant, then that's of course a different thing.
>>>
>>> I suppose a 4GB single-line json file is still possible.
>>
>> If we make that assumption, we'd have to change it from size_t to
>> ulong, but my feeling is that this case (format error at
>> >4GB && human tries to look at that place using an editor)
>> should be rare enough that we can make the compromise in favor of a
>> smaller struct size.
>
> What are you using the location structs for?
>
> In D:YAML they're only used for info about errors, so I use ushorts and
> ushort.max means "65535 or more".

Within the package itself they are also only used for error information. But they are also generally available with each token/node/value, so people could do very different things with them.
October 17, 2014
On 8/21/14, 7:35 PM, Sönke Ludwig wrote:
> Following up on the recent "std.jgrandson" thread [1], I've picked up
> the work (a lot earlier than anticipated) and finished a first version
> of a loose blend of said std.jgrandson, vibe.data.json and some changes
> that I had planned for vibe.data.json for a while. I'm quite pleased by
> the results so far, although without a serialization framework it still
> misses a very important building block.
>
> Code: https://github.com/s-ludwig/std_data_json
> Docs: http://s-ludwig.github.io/std_data_json/
> DUB: http://code.dlang.org/packages/std_data_json
>
> Destroy away! ;)
>
> [1]: http://forum.dlang.org/thread/lrknjl$co7$1@digitalmars.com

Once its done you can compare its performance against other languages with this benchmark:

https://github.com/kostya/benchmarks/tree/master/json
October 18, 2014
On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:
>
> Once its done you can compare its performance against other languages with this benchmark:
>
> https://github.com/kostya/benchmarks/tree/master/json

Wow, the C++Rapid parser is really impressive.  I threw together a test with my own parser for comparison, and Rapid still beat it.  It's the first parser I've encountered that's faster.


Ruby
0.4995479721139979
0.49977992077421846
0.49981146157805545
7.53s, 2330.9Mb

Python
0.499547972114
0.499779920774
0.499811461578
12.01s, 1355.1Mb

C++ Rapid
0.499548
0.49978
0.499811
1.75s, 1009.0Mb

JEP (mine)
0.49954797
0.49977992
0.49981146
2.38s, 203.4Mb
October 19, 2014
On Saturday, 18 October 2014 at 19:53:23 UTC, Sean Kelly wrote:
> On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:
>>
>> Once its done you can compare its performance against other languages with this benchmark:
>>
>> https://github.com/kostya/benchmarks/tree/master/json
>
> Wow, the C++Rapid parser is really impressive.  I threw together a test with my own parser for comparison, and Rapid still beat it.  It's the first parser I've encountered that's faster.
>
> C++ Rapid
> 0.499548
> 0.49978
> 0.499811
> 1.75s, 1009.0Mb
>
> JEP (mine)
> 0.49954797
> 0.49977992
> 0.49981146
> 2.38s, 203.4Mb

I just commented out the sscanf() call that was parsing the float and re-ran the test to see what the difference would be.  Here's the new timing:

JEP (mine)
0.00000000
0.00000000
0.00000000
1.23s, 203.1Mb

So nearly half of the total execution time was spent simply parsing floats.  For this reason, I'm starting to think that this isn't the best benchmark of JSON parser performance.  The other issue with my parser is that it's written in C, and so all of the user-defined bits are called via a bank of function pointers.  If it were converted to C++ or D where this could be done via templates it would be much faster.  Just as a test I nulled out the function pointers I'd set to see what the cost of indirection was, and here's the result:

JEP (mine)
nan
nan
nan
0.57s, 109.4Mb

The memory difference is interesting, and I can't entirely explain it other than to say that it's probably an artifact of my mapping in the file as virtual memory rather than reading it into an allocated buffer.  Either way, roughly 0.60s can be attributed to indirect function calls and the bit of logic on the other side, which seems like a good candidate for optimization.
October 19, 2014
On 10/18/14, 4:53 PM, Sean Kelly wrote:
> On Friday, 17 October 2014 at 18:27:34 UTC, Ary Borenszweig wrote:
>>
>> Once its done you can compare its performance against other languages
>> with this benchmark:
>>
>> https://github.com/kostya/benchmarks/tree/master/json
>
> Wow, the C++Rapid parser is really impressive.  I threw together a test
> with my own parser for comparison, and Rapid still beat it.  It's the
> first parser I've encountered that's faster.
>
>
> Ruby
> 0.4995479721139979
> 0.49977992077421846
> 0.49981146157805545
> 7.53s, 2330.9Mb
>
> Python
> 0.499547972114
> 0.499779920774
> 0.499811461578
> 12.01s, 1355.1Mb
>
> C++ Rapid
> 0.499548
> 0.49978
> 0.499811
> 1.75s, 1009.0Mb
>
> JEP (mine)
> 0.49954797
> 0.49977992
> 0.49981146
> 2.38s, 203.4Mb

Yes, C++ rapid seems to be really, really fast. It has some sse2/see4 specific optimizations and I guess a lot more. I have to investigate more in order to do something similar :-)
October 20, 2014
On Saturday, 18 October 2014 at 19:53:23 UTC, Sean Kelly wrote:

> Python
> 0.499547972114
> 0.499779920774
> 0.499811461578
> 12.01s, 1355.1Mb

I assume this is the standard json module? I am wondering how
ujson is performing, which is considered the fastest python
module.
February 05, 2015
On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote:
> ...

Added to the review queue as a work in progress with relevant links:

    http://wiki.dlang.org/Review_Queue