Thread overview
Save JSONValue binary in file?
Oct 12, 2012
Chopin
Oct 12, 2012
Piotr Szturmaj
Oct 12, 2012
Chopin
Oct 12, 2012
Piotr Szturmaj
Oct 12, 2012
Sean Kelly
Oct 13, 2012
Jacob Carlborg
October 12, 2012
Hello!

I got this 109 MB json file that I read... and it takes over 32
seconds for parseJSON() to finish it. So I was wondering if it
was a way to save it as binary or something like that so I can
read it super fast?

Thanks for all suggestions :)
October 12, 2012
Chopin wrote:
> Hello!
>
> I got this 109 MB json file that I read... and it takes over 32
> seconds for parseJSON() to finish it. So I was wondering if it
> was a way to save it as binary or something like that so I can
> read it super fast?
>
> Thanks for all suggestions :)

Try this implementation: https://github.com/pszturmaj/json-streaming-parser, you can parse all to memory or do streaming style parsing.
October 12, 2012
Thanks! I tried using it:

auto document = parseJSON(content).array; // this works with std.json :)

Using json.d from the link:

auto j = JSONReader!string(content);
auto document = j.value.whole.array; // this doesn't.... "Error: undefined identifier 'array'"
October 12, 2012
Chopin wrote:
> Thanks! I tried using it:
>
> auto document = parseJSON(content).array; // this works with std.json :)
>
> Using json.d from the link:
>
> auto j = JSONReader!string(content);
> auto document = j.value.whole.array; // this doesn't.... "Error:
> undefined identifier 'array'"

If you're sure that content is an array:

auto j = JSONReader!string(content);
auto jv = j.value.whole;
assert(jv.type == JSONType.array);
auto jsonArray = jv.as!(JSONValue[]);

alternatively you can replace last line with

alias JSONValue[] JSONArray;
auto jsonArray = jv.as!JSONArray;
October 12, 2012
On Oct 12, 2012, at 9:40 AM, Chopin <robert.bue@gmail.com> wrote:
> 
> I got this 109 MB json file that I read... and it takes over 32 seconds for parseJSON() to finish it. So I was wondering if it was a way to save it as binary or something like that so I can read it super fast?

The performance problem is because std.json works like a DOM parser for XML--it allocates a node per value in the JSON stream.  What we really need is something that works more like a SAX parser with the DOM version as an optional layer built on top.  Just for kicks, I grabbed the fourth (largest) JSON blob from here:

http://www.json.org/example.html

then wrapped it in array tags and duplicated the object until I had a ~350 MB input file.  ie.

[ paste, paste, paste, … ]

Then I parsed it via this test app, based on an example in a SAX-style JSON parser I wrote in C:


import core.stdc.stdlib;
import core.sys.posix.unistd;
import core.sys.posix.sys.stat;
import core.sys.posix.fcntl;
import std.json;

void main()
{
    auto filename = "input.txt\0".dup;

    stat_t st;
    stat(filename.ptr, &st);
    auto sz = st.st_size;
    auto buf = cast(char*) malloc(sz);
    auto fh = open(filename.ptr, O_RDONLY);
    read(fh, buf, sz);

    auto json = parseJSON(buf[0 .. sz]);
}


Here are my results:


$ dmd -release -inline -O dtest
$ ll input.txt
-rw-r--r--  1 sean  staff  365105313 Oct 12 15:50 input.txt
$ time dtest

real  1m36.462s
user 1m32.468s
sys   0m1.102s


Then I ran my SAX style parser example on the same input file:


$ make example
cc example.c -o example lib/release/myparser.a
$ time example

real  0m2.191s
user 0m1.944s
sys   0m0.241s


So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream.  Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file.  In short, DOM style parsers are great for small data and terrible for large data.


October 13, 2012
On 2012-10-13 01:26, Sean Kelly wrote:

> Here are my results:
>
>
> $ dmd -release -inline -O dtest
> $ ll input.txt
> -rw-r--r--  1 sean  staff  365105313 Oct 12 15:50 input.txt
> $ time dtest
>
> real  1m36.462s
> user 1m32.468s
> sys   0m1.102s
>
>
> Then I ran my SAX style parser example on the same input file:
>
>
> $ make example
> cc example.c -o example lib/release/myparser.a
> $ time example
>
> real  0m2.191s
> user 0m1.944s
> sys   0m0.241s
>
>
> So clearly the problem isn't parsing JSON in general but rather generating an object tree for a large input stream.  Note that the D app used gigabytes of memory to process this file--I believe the total VM footprint was around 3.5 GB--while my app used a fixed amount roughly equal to the size of the input file.  In short, DOM style parsers are great for small data and terrible for large data.

I tried JSON parser in Tango, using D2, this is the results I got for a file just below 360 MB:

real	1m2.848s
user	0m58.321s
sys	0m1.423s

Since the XML parser in Tango is so fast I expected more from the JSON parser as well. But I have no idea what kind of parser the JSON parser uses.

-- 
/Jacob Carlborg