Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
October 12, 2012 Save JSONValue binary in file? | ||||
---|---|---|---|---|
| ||||
Hello! I got this 109 MB json file that I read... and it takes over 32 seconds for parseJSON() to finish it. So I was wondering if it was a way to save it as binary or something like that so I can read it super fast? Thanks for all suggestions :) |
October 12, 2012 Re: Save JSONValue binary in file? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chopin | Chopin wrote: > Hello! > > I got this 109 MB json file that I read... and it takes over 32 > seconds for parseJSON() to finish it. So I was wondering if it > was a way to save it as binary or something like that so I can > read it super fast? > > Thanks for all suggestions :) Try this implementation: https://github.com/pszturmaj/json-streaming-parser, you can parse all to memory or do streaming style parsing. |
October 12, 2012 Re: Save JSONValue binary in file? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Piotr Szturmaj | Thanks! I tried using it: auto document = parseJSON(content).array; // this works with std.json :) Using json.d from the link: auto j = JSONReader!string(content); auto document = j.value.whole.array; // this doesn't.... "Error: undefined identifier 'array'" |
October 12, 2012 Re: Save JSONValue binary in file? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chopin | Chopin wrote:
> Thanks! I tried using it:
>
> auto document = parseJSON(content).array; // this works with std.json :)
>
> Using json.d from the link:
>
> auto j = JSONReader!string(content);
> auto document = j.value.whole.array; // this doesn't.... "Error:
> undefined identifier 'array'"
If you're sure that content is an array:
auto j = JSONReader!string(content);
auto jv = j.value.whole;
assert(jv.type == JSONType.array);
auto jsonArray = jv.as!(JSONValue[]);
alternatively you can replace last line with
alias JSONValue[] JSONArray;
auto jsonArray = jv.as!JSONArray;
|
October 12, 2012 Re: Save JSONValue binary in file? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chopin | On Oct 12, 2012, at 9:40 AM, Chopin <robert.bue@gmail.com> wrote: > > I got this 109 MB json file that I read... and it takes over 32 seconds for parseJSON() to finish it. So I was wondering if it was a way to save it as binary or something like that so I can read it super fast? The performance problem is because std.json works like a DOM parser for XML--it allocates a node per value in the JSON stream. What we really need is something that works more like a SAX parser with the DOM version as an optional layer built on top. Just for kicks, I grabbed the fourth (largest) JSON blob from here: http://www.json.org/example.html then wrapped it in array tags and duplicated the object until I had a ~350 MB input file. ie. [ paste, paste, paste, … ] Then I parsed it via this test app, based on an example in a SAX-style JSON parser I wrote in C: import core.stdc.stdlib; import core.sys.posix.unistd; import core.sys.posix.sys.stat; import core.sys.posix.fcntl; import std.json; void main() { auto filename = "input.txt\0".dup; stat_t st; stat(filename.ptr, &st); auto sz = st.st_size; auto buf = cast(char*) malloc(sz); auto fh = open(filename.ptr, O_RDONLY); read(fh, buf, sz); auto json = parseJSON(buf[0 .. sz]); } Here are my results: $ dmd -release -inline -O dtest $ ll input.txt -rw-r--r-- 1 sean staff 365105313 Oct 12 15:50 input.txt $ time dtest real 1m36.462s user 1m32.468s sys 0m1.102s Then I ran my SAX style parser example on the same input file: $ make example cc example.c -o example lib/release/myparser.a $ time example real 0m2.191s user 0m1.944s sys 0m0.241s So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream. Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file. In short, DOM style parsers are great for small data and terrible for large data. |
October 13, 2012 Re: Save JSONValue binary in file? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On 2012-10-13 01:26, Sean Kelly wrote: > Here are my results: > > > $ dmd -release -inline -O dtest > $ ll input.txt > -rw-r--r-- 1 sean staff 365105313 Oct 12 15:50 input.txt > $ time dtest > > real 1m36.462s > user 1m32.468s > sys 0m1.102s > > > Then I ran my SAX style parser example on the same input file: > > > $ make example > cc example.c -o example lib/release/myparser.a > $ time example > > real 0m2.191s > user 0m1.944s > sys 0m0.241s > > > So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream. Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file. In short, DOM style parsers are great for small data and terrible for large data. I tried JSON parser in Tango, using D2, this is the results I got for a file just below 360 MB: real 1m2.848s user 0m58.321s sys 0m1.423s Since the XML parser in Tango is so fast I expected more from the JSON parser as well. But I have no idea what kind of parser the JSON parser uses. -- /Jacob Carlborg |
Copyright © 1999-2021 by the D Language Foundation