Fastest JSON parser in the world is a D project (page 8)

October 21, 2015

Re: Fastest JSON parser in the world is a D project

Posted by Jonathan M Davis
in reply to Suliman

Permalink

Jonathan M Davis

Posted in reply to Suliman

Permalink

On Wednesday, October 21, 2015 06:36:31 Suliman via Digitalmars-d-announce wrote:
> On Monday, 19 October 2015 at 07:48:16 UTC, Sönke Ludwig wrote:
> > Am 16.10.2015 um 18:04 schrieb Marco Leise:
> >> Every value that is read (as opposed to skipped) is validated
> >> according to RFC 7159. That includes UTF-8 validation. Full
> >> validation (i.e. readJSONFile!validateAll(…);) may add up to
> >> 14% overhead here.
> >>
> >
> > Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further.
>
> Is there any chance that new json parser can be include in next versions of vibed? And what need to including its to Phobos?

It's already available on code.dlang.org:

http://code.dlang.org/packages/std_data_json

For it to get into Phobos, it has to get through the review process and be voted in. It was put up for formal review two or three months ago, but that didn't get to the point that it was voted on (I assume that there was more work that needed to be done on it first; I haven't really read through that thread though, so I don't know - I was too busy when the review started to get involved in it). So, whatever needs to be done for it to be ready for a formal vote needs to be done, and then it can be voted in, but all of that takes time, so if you want to use it soon, you might as well just grab it from code.dlang.org - and it will make it so that you're in a better position to give feedback on it as well so that it will be that much better if/when it makes it into Phobos.

- Jonathan M Davis

>> > Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further. >> >> Is there any chance that new json parser can be include in next versions of vibed? And what need to including its to Phobos? > > It's already available on code.dlang.org: > http://code.dlang.org/packages/std_data_json Jonatan, I mean https://github.com/mleise/fast :)

On Wednesday, 21 October 2015 at 09:59:09 UTC, Kapps wrote: > On Wednesday, 21 October 2015 at 04:17:19 UTC, Laeeth Isharc wrote: >> >> Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today... > > Not many consumer drives give more than 500-600 MB/s (SATA3 limit) yet. There are only a couple that I know of that reach 2000 MB/s, like Samsung's SM951, and they're generally a fair bit more expensive than what most consumers tend to buy (but at about $1 / GB, still affordable for businesses certainly). Yes - that's the one I had in mind. It's not dirt cheap, but at GBP280 if you have some money and want speed, the price is hardly an important factor. I should have said consumer grade rather than consumer, but anyway you get my point. That's today, in 2015. Maybe one can do even better than that by striping data, although it sounds like it's not that easy, but still. "The future is here already; just unevenly distributed". Seems like if you're processing JSON, which is not the most difficult task one might reasonably want to be doing, then CPU+memory is the bottleneck more than the SSD. I don't know what outlook is for drive speeds (except they probably won't go down), but data sets are certainly not shrinking. So I am intrigued by the difference between what people say is typical and what seems to be the case, certainly in what I want to do.

Am Wed, 21 Oct 2015 17:00:39 +0000 schrieb Suliman <evermind@live.ru>: > >> > Nice! I see you are using bitmasking trickery in multiple places. stdx.data.json is mostly just the plain lexing algorithm, with the exception of whitespace skipping. It was already very encouraging to get those benchmark numbers that way. Good to see that it pays off to go further. > >> > >> Is there any chance that new json parser can be include in next versions of vibed? And what need to including its to Phobos? > > > > It's already available on code.dlang.org: http://code.dlang.org/packages/std_data_json > > > Jonatan, I mean https://github.com/mleise/fast :) That's nice, but it has a different license and I don't think Phobos devs would be happy to see all the inline assembly I used and duplicate functionality like the number parsing and UTF-8 validation and missing range support. -- Marco

Am Wed, 21 Oct 2015 04:17:16 +0000 schrieb Laeeth Isharc <Laeeth.nospam@nospam-laeeth.com>: > Very impressive. > > Is this not quite interesting ? Such a basic web back end operation, and yet it's a very different picture from those who say that one is I/O or network bound. I already have JSON files of a couple of gig, and they're only going to be bigger over time, and this is a more generally interesting question. > > Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today... You have this huge amount of Reddit API JSON, right? I wonder if your processing could benefit from the fast skipping routines or even reading it as "trusted JSON". -- Marco

On Wednesday, 21 October 2015 at 22:24:30 UTC, Marco Leise wrote: > Am Wed, 21 Oct 2015 04:17:16 +0000 > schrieb Laeeth Isharc <Laeeth.nospam@nospam-laeeth.com>: > >> Very impressive. >> >> Is this not quite interesting ? Such a basic web back end operation, and yet it's a very different picture from those who say that one is I/O or network bound. I already have JSON files of a couple of gig, and they're only going to be bigger over time, and this is a more generally interesting question. >> >> Seems like you now get 2.1 gigbytes/sec sequential read from a cheap consumer SSD today... > > You have this huge amount of Reddit API JSON, right? > I wonder if your processing could benefit from the fast > skipping routines or even reading it as "trusted JSON". The couple of gig were just Quandl metadata for one provider, but you're right I have that Reddit data too. And that's just a beginning. What some have been doing for a while, I'm beginning to do now, and many others will be doing in the next few years - just as soon as they have finished having meetings about what to do... I don't suppose they'll be using python, at least not for long. I am sure it could benefit - I kind of need to get some other parts going first. (For once it truly is a case of Knuth's 97%). But I'll be coming back to look at best way, for json, but text files more generally. Have you thought about writing up your experience with writing fast json? A bit like Walter's Dr Dobbs's article on wielding a profiler to speed up dmd. And actually if you have time, would you mind dropping me an email? laeeth at .... kaledicassociates.com Thanks. Laeeth.

On 10/21/2015 04:38 PM, Laeeth Isharc wrote: > On Wednesday, 21 October 2015 at 19:03:56 UTC, Suliman wrote: >> Could anybody reddit this benchmark? > > done > https://www.reddit.com/r/programming/comments/3pojrz/the_fastest_json_parser_in_the_world/ Getting good press. Congratulations! -- Andrei

On 10/21/2015 1:38 PM, Laeeth Isharc wrote: > On Wednesday, 21 October 2015 at 19:03:56 UTC, Suliman wrote: >> Could anybody reddit this benchmark? > > done > https://www.reddit.com/r/programming/comments/3pojrz/the_fastest_json_parser_in_the_world/ > > It's item 9 on the front page of https://news.ycombinator.com/ too! Link to actual article (don't click on this link, or your upvote will not be counted): https://news.ycombinator.com/item?id=10430951

Forums