Fastest JSON parser in the world is a D project (page 7)

On Saturday, 17 October 2015 at 16:14:01 UTC, Andrei Alexandrescu wrote: > On 10/17/15 6:43 PM, Sean Kelly wrote: >> If this is the benchmark I'm remembering, the bulk of the time is spent >> parsing the floating point numbers. So it isn't a test of JSON parsing >> in general so much as the speed of scanf. > > In many cases the use of scanf can be replaced with drastically faster methods, as I discuss in my talks on optimization (including Brasov recently). I hope they'll release the videos soon. -- Andrei Oh absolutely. My issue with the benchmark is just that it claims to be a JSON parser benchmark but the bulk of CPU time is actually spent parsing floats. I'm on my phone though so perhaps this is a different benchmark--I can't easily check. The one I recall came up a year or so ago and was discussed on D.general.

Am Sat, 17 Oct 2015 16:27:06 +0000 schrieb Sean Kelly <sean@invisibleduck.org>: > On Saturday, 17 October 2015 at 16:14:01 UTC, Andrei Alexandrescu wrote: > > On 10/17/15 6:43 PM, Sean Kelly wrote: > >> If this is the benchmark I'm remembering, the bulk of the time > >> is spent > >> parsing the floating point numbers. So it isn't a test of JSON > >> parsing > >> in general so much as the speed of scanf. > > > > In many cases the use of scanf can be replaced with drastically faster methods, as I discuss in my talks on optimization (including Brasov recently). I hope they'll release the videos soon. -- Andrei > > Oh absolutely. My issue with the benchmark is just that it claims to be a JSON parser benchmark but the bulk of CPU time is actually spent parsing floats. I'm on my phone though so perhaps this is a different benchmark--I can't easily check. The one I recall came up a year or so ago and was discussed on D.general. 1/4 to 1/3 of the time is spent parsing numbers in highly optimized code. You see that in a profiler the number parsing shows up on top, but the benchmark also exercises the structural parsing a lot. It is not a very broad benchmark though, lacking serialization, UTF-8 decoding, validation of results etc. I believe the author didn't realize how over time it became the go-to performance test. The author of RapidJSON has a very in-depth benchmark suite, but it would be a bit of work to get something non-C++ integrated: https://github.com/miloyip/nativejson-benchmark It includes conformance tests as well. -- Marco

On Wednesday, 14 October 2015 at 07:01:49 UTC, Marco Leise wrote: > JSON parsing in D has come a long way, especially when you look at it from the efficiency angle as a popular benchmark does that has been forked by well known D contributers like Martin Nowak or Sönke Ludwig. > > [...] Slightly OT: You have a std.simd file in your repo, was this written by you or is there a current std.simd proposal that I'm unaware of?

On Saturday, 17 October 2015 at 09:35:47 UTC, Sönke Ludwig wrote: > Am 17.10.2015 um 13:16 schrieb Marco Leise: >> Am Sat, 17 Oct 2015 09:27:46 +0200 >> schrieb Sönke Ludwig <sludwig@rejectedsoftware.com>: >>> Okay, I obviously misread that as a once familiar issue. Maybe it indeed >>> makes sense to add a "JavaScript" quirks mode that behaves exactly like >>> a JavaScript interpreter would. >> >> Ok, but remember: https://www.youtube.com/watch?v=20BySC_6HyY >> And then think again. :D >> > > What about just naming it SerializationMode.WAT? At the very least that needs to be an undocumented alias easter egg. :)

Am Sun, 18 Oct 2015 03:40:52 +0000 schrieb rsw0x <anonymous@anonymous.com>: > On Wednesday, 14 October 2015 at 07:01:49 UTC, Marco Leise wrote: > > JSON parsing in D has come a long way, especially when you look at it from the efficiency angle as a popular benchmark does that has been forked by well known D contributers like Martin Nowak or Sönke Ludwig. > > > > [...] > > Slightly OT: > You have a std.simd file in your repo, was this written by you or > is there a current std.simd proposal that I'm unaware of? Manu wrote that back in the days with the idea that it would help writing portable SIMD code on many architectures: https://github.com/TurkeyMan/simd Working in the 3D visualization business and having held at least one talk about SIMD it was no coincidence that he was interested in better vector math support. Inclusion into Phobos was planned. DMD needs some upgrading of the somewhat ad hoc SIMD intrinsic implementation though: https://issues.dlang.org/buglist.cgi?keywords=SIMD&resolution=--- Many instructions cannot be expressed outside of inline assembly which doesn't inline. -- Marco

On Saturday, 17 October 2015 at 16:27:08 UTC, Sean Kelly wrote: > Oh absolutely. My issue with the benchmark is just that it claims to be a JSON parser benchmark but the bulk of CPU time is actually spent parsing floats. Well, most of such language-comparison benchmarks are just for fun/marketing. In the real world big JSON files would be compressed and most likely retrieved over a network connection (like a blob from a database). Pull-parsing of mmap'ed memory is a rather unusual scenario for JSON.

Am 16.10.2015 um 18:04 schrieb Marco Leise: > Every value that is read (as opposed to skipped) is validated > according to RFC 7159. That includes UTF-8 validation. Full > validation (i.e. readJSONFile!validateAll(…);) may add up to > 14% overhead here. > Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further.

October 21, 2015

Re: Fastest JSON parser in the world is a D project

Posted by Laeeth Isharc
in reply to Marco Leise

Permalink

Laeeth Isharc

Posted in reply to Marco Leise

Permalink

On Wednesday, 14 October 2015 at 07:01:49 UTC, Marco Leise wrote:

> The test is pretty simple: Parse a JSON object, containing an array of 1_000_000 3D coordinates in the range [0..1) and average them.
>
> The performance of std.json in parsing those was horrible still in the DMD 2.066 days*:
>
> DMD     : 41.44s,  934.9Mb
> Gdc     : 29.64s,  929.7Mb
> Python  : 12.30s, 1410.2Mb
> Ruby    : 13.80s, 2101.2Mb
>
> Then with 2.067 std.json got a major 3x speed improvement and rivaled the popular dynamic languages Ruby and Python:
>
> DMD     : 13.02s, 1324.2Mb
>
> In the mean time several other D JSON libraries appeared with varying focus on performance or API:
>
> Medea         : 56.75s, 1753.6Mb  (GDC)
> libdjson      : 24.47s, 1060.7Mb  (GDC)
> stdx.data.json:  2.76s,  207.1Mb  (LDC)
>
> Yep, that's right. stdx.data.json's pull parser finally beats the dynamic languages with native efficiency. (I used the default options here that provide you with an Exception and line number on errors.)
>
> A few days ago I decided to get some practical use out of my pet project 'fast' by implementing a JSON parser myself, that could rival even the by then fastest JSON parser, RapidJSON. The result can be seen in the benchmark results right now:
>
> https://github.com/kostya/benchmarks#json
>
> fast:	   0.34s, 226.7Mb (GDC)
> RapidJSON: 0.79s, 687.1Mb (GCC)
>
> (* Timings from my computer, Haswell CPU, Linux

Very impressive.

Is this not quite interesting ?  Such a basic web back end operation, and yet it's a very different picture from those who say that one is I/O or network bound.  I already have JSON files of a couple of gig, and they're only going to be bigger over time, and this is a more generally interesting question.

Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today...

On Monday, 19 October 2015 at 07:48:16 UTC, Sönke Ludwig wrote: > Am 16.10.2015 um 18:04 schrieb Marco Leise: >> Every value that is read (as opposed to skipped) is validated >> according to RFC 7159. That includes UTF-8 validation. Full >> validation (i.e. readJSONFile!validateAll(…);) may add up to >> 14% overhead here. >> > > Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further. Is there any chance that new json parser can be include in next versions of vibed? And what need to including its to Phobos?

On Wednesday, 21 October 2015 at 04:17:19 UTC, Laeeth Isharc wrote: > > Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today... Not many consumer drives give more than 500-600 MB/s (SATA3 limit) yet. There are only a couple that I know of that reach 2000 MB/s, like Samsung's SM951, and they're generally a fair bit more expensive than what most consumers tend to buy (but at about $1 / GB, still affordable for businesses certainly).

Forums