October 17, 2015
On Saturday, 17 October 2015 at 16:14:01 UTC, Andrei Alexandrescu wrote:
> On 10/17/15 6:43 PM, Sean Kelly wrote:
>> If this is the benchmark I'm remembering, the bulk of the time is spent
>> parsing the floating point numbers. So it isn't a test of JSON parsing
>> in general so much as the speed of scanf.
>
> In many cases the use of scanf can be replaced with drastically faster methods, as I discuss in my talks on optimization (including Brasov recently). I hope they'll release the videos soon. -- Andrei

Oh absolutely. My issue with the benchmark is just that it claims to be a JSON parser benchmark but the bulk of CPU time is actually spent parsing floats. I'm on my phone though so perhaps this is a different benchmark--I can't easily check. The one I recall came up a year or so ago and was discussed on D.general.

October 17, 2015
Am Sat, 17 Oct 2015 16:27:06 +0000
schrieb Sean Kelly <sean@invisibleduck.org>:

> On Saturday, 17 October 2015 at 16:14:01 UTC, Andrei Alexandrescu wrote:
> > On 10/17/15 6:43 PM, Sean Kelly wrote:
> >> If this is the benchmark I'm remembering, the bulk of the time
> >> is spent
> >> parsing the floating point numbers. So it isn't a test of JSON
> >> parsing
> >> in general so much as the speed of scanf.
> >
> > In many cases the use of scanf can be replaced with drastically faster methods, as I discuss in my talks on optimization (including Brasov recently). I hope they'll release the videos soon. -- Andrei
> 
> Oh absolutely. My issue with the benchmark is just that it claims to be a JSON parser benchmark but the bulk of CPU time is actually spent parsing floats. I'm on my phone though so perhaps this is a different benchmark--I can't easily check. The one I recall came up a year or so ago and was discussed on D.general.

1/4 to 1/3 of the time is spent parsing numbers in highly
optimized code. You see that in a profiler the number parsing
shows up on top, but the benchmark also exercises the
structural parsing a lot. It is not a very broad benchmark
though, lacking serialization, UTF-8 decoding, validation of
results etc. I believe the author didn't realize how over time
it became the go-to performance test. The author of RapidJSON
has a very in-depth benchmark suite, but it would be a bit of
work to get something non-C++ integrated:
https://github.com/miloyip/nativejson-benchmark
It includes conformance tests as well.

-- 
Marco

October 18, 2015
On Wednesday, 14 October 2015 at 07:01:49 UTC, Marco Leise wrote:
> JSON parsing in D has come a long way, especially when you look at it from the efficiency angle as a popular benchmark does that has been forked by well known D contributers like Martin Nowak or Sönke Ludwig.
>
> [...]

Slightly OT:
You have a std.simd file in your repo, was this written by you or is there a current std.simd proposal that I'm unaware of?
October 18, 2015
On Saturday, 17 October 2015 at 09:35:47 UTC, Sönke Ludwig wrote:
> Am 17.10.2015 um 13:16 schrieb Marco Leise:
>> Am Sat, 17 Oct 2015 09:27:46 +0200
>> schrieb Sönke Ludwig <sludwig@rejectedsoftware.com>:
>>> Okay, I obviously misread that as a once familiar issue. Maybe it indeed
>>> makes sense to add a "JavaScript" quirks mode that behaves exactly like
>>> a JavaScript interpreter would.
>>
>> Ok, but remember: https://www.youtube.com/watch?v=20BySC_6HyY
>> And then think again. :D
>>
>
> What about just naming it SerializationMode.WAT?

At the very least that needs to be an undocumented alias easter egg. :)
October 18, 2015
Am Sun, 18 Oct 2015 03:40:52 +0000
schrieb rsw0x <anonymous@anonymous.com>:

> On Wednesday, 14 October 2015 at 07:01:49 UTC, Marco Leise wrote:
> > JSON parsing in D has come a long way, especially when you look at it from the efficiency angle as a popular benchmark does that has been forked by well known D contributers like Martin Nowak or Sönke Ludwig.
> >
> > [...]
> 
> Slightly OT:
> You have a std.simd file in your repo, was this written by you or
> is there a current std.simd proposal that I'm unaware of?

Manu wrote that back in the days with the idea that it would
help writing portable SIMD code on many architectures:
https://github.com/TurkeyMan/simd
Working in the 3D visualization business and having held at
least one talk about SIMD it was no coincidence that he was
interested in better vector math support. Inclusion into
Phobos was planned. DMD needs some upgrading of the somewhat
ad hoc SIMD intrinsic implementation though:
https://issues.dlang.org/buglist.cgi?keywords=SIMD&resolution=---
Many instructions cannot be expressed outside of inline
assembly which doesn't inline.

-- 
Marco

October 19, 2015
On Saturday, 17 October 2015 at 16:27:08 UTC, Sean Kelly wrote:
> Oh absolutely. My issue with the benchmark is just that it claims to be a JSON parser benchmark but the bulk of CPU time is actually spent parsing floats.

Well, most of such language-comparison benchmarks are just for fun/marketing. In the real world big JSON files would be compressed and most likely retrieved over a network connection (like a blob from a database). Pull-parsing of mmap'ed memory is a rather unusual scenario for JSON.

October 19, 2015
Am 16.10.2015 um 18:04 schrieb Marco Leise:
> Every value that is read (as opposed to skipped) is validated
> according to RFC 7159. That includes UTF-8 validation. Full
> validation (i.e. readJSONFile!validateAll(…);) may add up to
> 14% overhead here.
>

Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further.
October 21, 2015
On Wednesday, 14 October 2015 at 07:01:49 UTC, Marco Leise wrote:

> The test is pretty simple: Parse a JSON object, containing an array of 1_000_000 3D coordinates in the range [0..1) and average them.
>
> The performance of std.json in parsing those was horrible still in the DMD 2.066 days*:
>
> DMD     : 41.44s,  934.9Mb
> Gdc     : 29.64s,  929.7Mb
> Python  : 12.30s, 1410.2Mb
> Ruby    : 13.80s, 2101.2Mb
>
> Then with 2.067 std.json got a major 3x speed improvement and rivaled the popular dynamic languages Ruby and Python:
>
> DMD     : 13.02s, 1324.2Mb
>
> In the mean time several other D JSON libraries appeared with varying focus on performance or API:
>
> Medea         : 56.75s, 1753.6Mb  (GDC)
> libdjson      : 24.47s, 1060.7Mb  (GDC)
> stdx.data.json:  2.76s,  207.1Mb  (LDC)
>
> Yep, that's right. stdx.data.json's pull parser finally beats the dynamic languages with native efficiency. (I used the default options here that provide you with an Exception and line number on errors.)
>
> A few days ago I decided to get some practical use out of my pet project 'fast' by implementing a JSON parser myself, that could rival even the by then fastest JSON parser, RapidJSON. The result can be seen in the benchmark results right now:
>
> https://github.com/kostya/benchmarks#json
>
> fast:	   0.34s, 226.7Mb (GDC)
> RapidJSON: 0.79s, 687.1Mb (GCC)
>
> (* Timings from my computer, Haswell CPU, Linux

Very impressive.

Is this not quite interesting ?  Such a basic web back end operation, and yet it's a very different picture from those who say that one is I/O or network bound.  I already have JSON files of a couple of gig, and they're only going to be bigger over time, and this is a more generally interesting question.

Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today...
October 21, 2015
On Monday, 19 October 2015 at 07:48:16 UTC, Sönke Ludwig wrote:
> Am 16.10.2015 um 18:04 schrieb Marco Leise:
>> Every value that is read (as opposed to skipped) is validated
>> according to RFC 7159. That includes UTF-8 validation. Full
>> validation (i.e. readJSONFile!validateAll(…);) may add up to
>> 14% overhead here.
>>
>
> Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further.

Is there any chance that new json parser can be include in next versions of vibed? And what need to including its to Phobos?
October 21, 2015
On Wednesday, 21 October 2015 at 04:17:19 UTC, Laeeth Isharc wrote:
>
> Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today...

Not many consumer drives give more than 500-600 MB/s (SATA3 limit) yet. There are only a couple that I know of that reach 2000 MB/s, like Samsung's SM951, and they're generally a fair bit more expensive than what most consumers tend to buy (but at about $1 / GB, still affordable for businesses certainly).