Thread overview
Getting underlying struct for parseJSON
Feb 28, 2017
Alexey H
Feb 28, 2017
Alexey H
Feb 28, 2017
Seb
Feb 28, 2017
Adam D. Ruppe
Mar 02, 2017
Alexey H
February 28, 2017
Hello, guys!

I'm working on a project that involves parsing of huge JSON datasets in real-time.
Just an example of what i'm dealing with is here:

https://gist.githubusercontent.com/gdmka/125014058bb7d7f01b867fac56300a61/raw/f0c6b5be5fb01b16dd83f07c577b72f76f72c855/data.json

Can't think of any tools other that D or Go to solve this problem.

My experience of solving the problem with Go has led me to Stackoverflow and the community out there seemed too reluctant to help so i assumed that the language cannot handle such a set of operations on complex datastructures.

My experience with D was like a charm. Where i have had ~100 lines of Go code i did the same  with 12 in with D. But, nevertheless, i did some profiling (unfortunately on OS X) and saw much heavier CPU usage with D than Go. Probably because the Go solution was unpacking all the data strictly to struct.

So, my real question is: can i actually, by any change, get the description of an underlying struct that the call to parseJSON generates?

The goers have this thing https://mholt.github.io/json-to-go/ to generate structs from JSON automatically.

Since D easily parses JSON by type inference, i assume it builds a JSONValue struct which holds all the fields and the data.

If it is possible, then i can build a similar JSON to D tool just for the sake of saving people's time and patience.





February 28, 2017
On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
> Hello, guys!
>
> I'm working on a project that involves parsing of huge JSON datasets in real-time.
> Just an example of what i'm dealing with is here:
>
> https://gist.githubusercontent.com/gdmka/125014058bb7d7f01b867fac56300a61/raw/f0c6b5be5fb01b16dd83f07c577b72f76f72c855/data.json
>
> Can't think of any tools other that D or Go to solve this problem.
>
> My experience of solving the problem with Go has led me to Stackoverflow and the community out there seemed too reluctant to help so i assumed that the language cannot handle such a set of operations on complex datastructures.
>
> My experience with D was like a charm. Where i have had ~100 lines of Go code i did the same  with 12 in with D. But, nevertheless, i did some profiling (unfortunately on OS X) and saw much heavier CPU usage with D than Go. Probably because the Go solution was unpacking all the data strictly to struct.
>
> So, my real question is: can i actually, by any change, get the description of an underlying struct that the call to parseJSON generates?
>
> The goers have this thing https://mholt.github.io/json-to-go/ to generate structs from JSON automatically.
>
> Since D easily parses JSON by type inference, i assume it builds a JSONValue struct which holds all the fields and the data.
>
> If it is possible, then i can build a similar JSON to D tool just for the sake of saving people's time and patience.

If you really care about performance, have a look this: http://forum.dlang.org/post/20151014090114.60780ad6@marco-toshiba

std.json is not tuned for performance, so don't expect good results from it.
February 28, 2017
On Tuesday, 28 February 2017 at 20:48:33 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
>> Hello, guys!
>>
>> I'm working on a project that involves parsing of huge JSON datasets in real-time.
>> Just an example of what i'm dealing with is here:
>>
>> https://gist.githubusercontent.com/gdmka/125014058bb7d7f01b867fac56300a61/raw/f0c6b5be5fb01b16dd83f07c577b72f76f72c855/data.json
>>
>> Can't think of any tools other that D or Go to solve this problem.
>>
>> My experience of solving the problem with Go has led me to Stackoverflow and the community out there seemed too reluctant to help so i assumed that the language cannot handle such a set of operations on complex datastructures.
>>
>> My experience with D was like a charm. Where i have had ~100 lines of Go code i did the same  with 12 in with D. But, nevertheless, i did some profiling (unfortunately on OS X) and saw much heavier CPU usage with D than Go. Probably because the Go solution was unpacking all the data strictly to struct.
>>
>> So, my real question is: can i actually, by any change, get the description of an underlying struct that the call to parseJSON generates?
>>
>> The goers have this thing https://mholt.github.io/json-to-go/ to generate structs from JSON automatically.
>>
>> Since D easily parses JSON by type inference, i assume it builds a JSONValue struct which holds all the fields and the data.
>>
>> If it is possible, then i can build a similar JSON to D tool just for the sake of saving people's time and patience.
>
> If you really care about performance, have a look this: http://forum.dlang.org/post/20151014090114.60780ad6@marco-toshiba
>
> std.json is not tuned for performance, so don't expect good results from it.

I am not expecting good results from stdlib's json. As for now i just need a concise way to get 1.2-1.5 MB JSON and dump it into a struct to perform numeric computations. Since it's not the only data source i will be parsing, i need a straightforward way to generate D structs right out of the predefined JSON schema. So i am willing to sacrifice some speed for convenience at this point.

Fastjson might be good when dealing with trusted input, but this is not my case.

February 28, 2017
On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
> So, my real question is: can i actually, by any change, get the description of an underlying struct that the call to parseJSON generates?

It doesn't actually generate one, it just returns a tagged union (a kind of dynamic type).

But, my json inspector program (never finished btw) has something that can kinda generate structs from json: https://github.com/adamdruppe/inspector

I ran your data.json through my program, and it spat this out (well, not directly, I did a few minor tweaks by hand since it doesn't output 100% valid D, but it is a good start):

struct Json_t {
	struct sports_t {
		long regionId;
		string name;
		long id;
		string sortOrder;
		long parentId;
		string kind;
	}
	sports_t[] sports;

	long siteVersion;

	struct eventBlocks_t {
		long[] factors;
		long eventId;
		string state;
	}
	eventBlocks_t[] eventBlocks;

	struct customFactors_t {
		long lo;
		long e;
		bool isLive;
		double v;
		string pt;
		long f;
		long p;
		long hi;
	}
	customFactors_t[] customFactors;

	struct announcements_t {
		long segmentId;
		string place;
		bool liveHalf;
		long regionId;
		string name;
		long[] tv;
		string segmentName;
		string segmentSortOrder;
		long id;
		long num;
		string namePrefix;
		string team1;
		long startTime;
		long sportId;
		string team2;
	}
	announcements_t[] announcements;

	struct events_t {
		long level;
		string sortOrder;
		string place;
		string name;
		long rootKind;
		long parentId;
		long id;
		long num;
		string namePrefix;
		string team1;
		long kind;
		struct state_t {
			bool inHotList;
			bool willBeLive;
			bool liveHalf;
		}
		state_t state;
		long startTime;
		long sportId;
		long priority;
		string team2;
	}
	events_t[] events;
	long packetVersion;
	struct eventMiscs_t {
		long liveDelay;
		long timerUpdateTimestamp;
		long[] tv;
		string comment;
		long score2;
		long servingTeam;
		long id;
		long timerSeconds;
		long timerDirection;
		long score1;
	}
	eventMiscs_t[] eventMiscs;
	long fromVersion;
	long factorsVersion;
}


And I *think* my jsvar.d has magic methods to load up one of those.... but since my jsvar.d just uses std.json and then builds up junk on top of it, it will necessarily be even slower than what you already have :S so meh.

February 28, 2017
On Tuesday, 28 February 2017 at 20:48:33 UTC, Petar Kirov [ZombineDev] wrote:
> On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
>> [...]
>
> If you really care about performance, have a look this: http://forum.dlang.org/post/20151014090114.60780ad6@marco-toshiba
>
> std.json is not tuned for performance, so don't expect good results from it.

It's a bit OT, but asdf is even faster and has a simple API:

https://github.com/tamediadigital/asdf

In terms of performance:

> Reading JSON line separated values and parsing them to ASDF - 300+ MB per second (SSD).
> Writing ASDF range to JSON line separated values - 300+ MB per second (SSD).

Another good library is std.data.json (Json parsing extracted from Vibe.d):

https://github.com/s-ludwig/std_data_json
March 02, 2017
On Tuesday, 28 February 2017 at 21:21:30 UTC, Adam D. Ruppe wrote:
> On Tuesday, 28 February 2017 at 20:27:25 UTC, Alexey H wrote:
>> [...]
>
> It doesn't actually generate one, it just returns a tagged union (a kind of dynamic type).
>
> [...]

Superb, Adam, thank you! I need to check out inspector.

The std.json will be used solely to generate proper structs.
I expect to do all the heavy stuff via http://code.dlang.org/packages/jsonserialized
since it uses vibe.d's JSON implementation, my expectations are that it would be faster.