May 07, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Tuesday, May 07, 2013 20:36:19 Sean Kelly wrote:
> Now obviously, in many cases convenience is preferable to raw speed, but I think code in Phobos should be an option for both types of uses whenever possible. What I'd really like to see is the variant-type front-end layered on top of an event-based parser so the user could just use parseJSON as-is to generate a tree of JSON objects or call the event-driven parser directly when performance is desired. I don't think the parser needs to be resumable either, since in most cases JSON is transported in an HTTP message, so a plain old recursive descent parser is fine.
Yeah. For both JSON and XML, it should be quite possible to implement a low- level API which gives you raw speed and then build more convenient APIs on top of them, thereby giving users the choice. And given how slices work, parsers like this should be able to beat the pants off of most parsers in other languages, especially with the low-level API.
- Jonathan M Davis
|
May 07, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On Tuesday, 7 May 2013 at 18:36:20 UTC, Sean Kelly wrote: > $ main > n = 1 > Milliseconds to call stdJson() n times: 73054 > Milliseconds to call newJson() n times: 44022 > Milliseconds to call jepJson() n times: 839 > newJson() is faster than stdJson() 1.66x times > jepJson() is faster than stdJson() 87.1x times > This is very interesting. This jepJson library seems to be pretty fast. I imagine this library works very similar to SAX, so you can save quite a bit on simply not having to allocate. Before I read this, I went about creating my own benchmark. Here is a .zip containing the source and some nice looking bar charts comparing std.json, vibe.d's json library, and my own against various arrays of objects held in memory as a string: http://www.mediafire.com/download.php?gabsvk8ta711q4u For those less interested in downloading and looking at the .ods file, here are the results for the largest input size. (Array of 100,000 small objects) std.json - 2689375370 ms vibe.data.json - 2835431576 ms dson - 3705095251 ms Where 'dson' is my library. I have done my duty and made my own library look the worst in benchmarks. I think overall these are all linear time algorithms that do very similar things, and the speed difference is very minor. As always with benchmarks, mileage may vary. Per request for examples of my library, I have produced this little snippet. http://pastebin.com/sU8heFXZ It's hard to enumerate all of the features I put in there at once, but that's a pretty good start. I also listed a few examples in a doc comment at the top of the json.d source. The idea presented in this thread of building a nice tagged union reader (like std.json, vibe.d, and my own) on top of a recursive event (SAX-like) parser seems pretty attractive to me now. I can envision re-writing my own library to work on top of such a parser. |
May 07, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On Tuesday, 7 May 2013 at 20:14:20 UTC, w0rp wrote: > On Tuesday, 7 May 2013 at 18:36:20 UTC, Sean Kelly wrote: > >> $ main >> n = 1 >> Milliseconds to call stdJson() n times: 73054 >> Milliseconds to call newJson() n times: 44022 >> Milliseconds to call jepJson() n times: 839 >> newJson() is faster than stdJson() 1.66x times >> jepJson() is faster than stdJson() 87.1x times > > This is very interesting. This jepJson library seems to be pretty fast. I imagine this library works very similar to SAX, so you can save quite a bit on simply not having to allocate. Yes, the jep parser does no allocation at all--all callbacks simply receive a slice of the value. It does full validation according to the spec, but there's no interpretation of the values beyond that either, so if you want the integer string you were passed converted to an int, for example, you'd do the conversion yourself. The same goes for unescaping of string data, and in practice I often end up unescaping the strings in-place since I typically never need to re-parse the input buffer. In practice, it's kind of a pain to use the jep parser for arbitrary processing so I have some functions layered on top of it that iterate across array values and object keys: int foreachArrayElem(char[] buf, scope int delegate(char[] value)); int foreachObjectField(char[] buf, scope int delegate(char[] name, char[] value)); This works basically the same as opApply, so having the delegate return a nonzero value causes parsing to abort and return that value from the foreach routine. The parser is sufficiently fast that I generally just nest calls to these foreach routines to parse complex types, even though this results in multiple passes across the same data. The only other thing I was careful to do is design the library in such a way that each parser callback could call a corresponding writer routine to simply pass through the input to an output buffer. This makes auto-reformatting a breeze because you just set a "format output" flag on the writer and implement a few one-line functions. > Before I read this, I went about creating my own benchmark. Here is a .zip containing the source and some nice looking bar charts comparing std.json, vibe.d's json library, and my own against various arrays of objects held in memory as a string: > > http://www.mediafire.com/download.php?gabsvk8ta711q4u > > For those less interested in downloading and looking at the .ods file, here are the results for the largest input size. (Array of 100,000 small objects) > > std.json - 2689375370 ms > vibe.data.json - 2835431576 ms > dson - 3705095251 ms These results don't seem correct. Is this really milliseconds? |
May 07, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | I completely missed something out there. Namely, my reasons why I just didn't like the existing implementations enough. Overall, the other libraries are all very similar, so I don't have major complaints, just little ones. For vibe.d, it's actually pretty close to what I wanted. My big objection is that I don't like the 'Undefined' types. I would rather experience runtime errors in those cases. I also have to pretty much depend on Vibe to use it, rather than just a JSON library. Aside from that, it's not far off from what I'm after. For Libdjson, it uses classes to represent json types. That just seems very awkward to use, and that shouts out "unnecessary garbage creation" to me. The standard library (std.json) seems to nail the parsing of JSON, but lacks the ability to write a JSON string to an output range, and doesn't really offer any conveniences for working with the JSON data structure itself. std.json, vibe.d, and my own representation of JSON are all very similar. They are tagged unions implemented with union {} and an enum. What makes vibe.d and my own library nice is all of the operator overloads, properties, and convenience functions. Another issue with std.json is lack of pretty-printing, which both vibe.d and my own library address. (Mine has toJSON!4 and writeJSON!8 for a string indented by 4 characters and writing to an output range indented by 8 characters, respectively.) So that's essentially my rationale. Overall, writing the library was mostly done because I found it to be a rather entertaining challenge for myself. |
May 07, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | >> std.json - 2689375370 ms
>> vibe.data.json - 2835431576 ms
>> dson - 3705095251 ms
>
> These results don't seem correct. Is this really milliseconds?
Well this is embarrassing. I do apologise. I appear to have printed the TickDuration object value itself instead of the milliseconds. I think I spent too much time writing the benchmark and too little looking at the actual results. I ran it again quickly correcting the error (.msecs) and got much more reasonable looking results on a size of 1,000:
std.json : 7370 ms
vibe.data.json : 6878 ms
json : 9150 ms
|
May 08, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On Tue, 07 May 2013 23:09:35 +0200
"w0rp" <devw0rp@gmail.com> wrote:
>
> So that's essentially my rationale. Overall, writing the library was mostly done because I found it to be a rather entertaining challenge for myself.
Parsing a simple grammar can indeed be very fun! I did that recently, too (not JSON though), partly to try my hand at LL for a change, and had a blast. Designing and implementing a good API can actually be the hard/tedius part (well, and the unittests can be pretty tedius).
|
May 08, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On Tuesday, 7 May 2013 at 20:14:20 UTC, w0rp wrote:
> Per request for examples of my library, I have produced this little snippet. http://pastebin.com/sU8heFXZ It's hard to enumerate all of the features I put in there at once, but that's a pretty good start. I also listed a few examples in a doc comment at the top of the json.d source.
>
The API look really nice ! I'd love to sse something similar into phobos APIwise.
But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.
|
May 08, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | > The API look really nice ! I'd love to sse something similar into phobos APIwise.
>
> But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.
I think that's a good point. I'll change them immediately and push to github.
|
May 09, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to w0rp | On Wednesday, 8 May 2013 at 21:05:55 UTC, w0rp wrote:
>> The API look really nice ! I'd love to sse something similar into phobos APIwise.
>>
>> But I don't like the shortcut choices. arr => array you only win 2 chars ! That is nothing and certainly not worth the confusion. Same for obj => object. With this kind of practices, everybody come with its own set of shortcut and you have to remember all of them for each library ! What seems like a speedup at first ends up being a slowdown.
>
> I think that's a good point. I'll change them immediately and push to github.
Awesome. Another nice thing you can do it to use alias this on a @property to allow for implicit conversion to int.
Overall, the API is super nice ! If performance don't matter, I definitively recommend to use the lib.
|
May 09, 2013 Re: I wrote a JSON library | ||||
---|---|---|---|---|
| ||||
Posted in reply to deadalnix | On Thursday, 9 May 2013 at 01:42:41 UTC, deadalnix wrote:
> Awesome. Another nice thing you can do it to use alias this on a @property to allow for implicit conversion to int.
>
> Overall, the API is super nice ! If performance don't matter, I definitively recommend to use the lib.
I'll have to experiment with the alias this idea.
There are still a few things I need to work out. I'm missing an overload for opCmp (plus the host of math operators), and the append behaviour is perhaps strange. I had to choose between ~ meaning a JSON array is added to the LHS, [] ~ [1, 2] == [[1, 2]], or an array is concatenated, like the normal D arrays, [] ~ [1, 2] == [1, 2]. I went with the former for now, but I might have made the wrong choice. It all came about because of this.
auto arr = jsonArray();
arr ~= 1; // [1]
arr ~= "foo"; // [1, "foo"]
arr ~= jsonArray() // Currently: [1, "foo", []]
auto another = jsonArray();
another ~= 3;
arr.array ~= another.array; // Always: [1, "foo", [], 3]
I swear that I wrote a concat(JSON, JSON) function for this, but it's not there. That would have accomplished this:
arr.concat(another)
|
Copyright © 1999-2021 by the D Language Foundation