std.data.json formal review (page 10)

On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote: > However, my goal when implementing this has never been to make the DOM representation as efficient as possible. The simple reason is that a DOM representation is inherently inefficient when compared to operating on the structure using either the pull parser or using a deserializer that directly converts into a static D type. IMO these should be advertised instead of trying to milk a dead cow (in terms of performance). Maybe it is better to just focus on having a top-of-the-line parser and then let competing DOM implementations build on top of it. I'm personally only interested in structured JSON, I think most webapps use structured JSON informally.

> Anyway, I've just started to work on a generic variant of an enum based > algebraic type that exploits as much static type information as > possible. If that works out (compiler bugs?), it would be a great thing > to have in Phobos, so maybe it's worth to delay the JSON module for that > if necessary. > First proof of concept: https://gist.github.com/s-ludwig/7a8a60150f510239f071#file-taggedalgebraic-d-L148 It probably still has issues with const/immutable and ref in some places, but the basics seem to work as expected.

On Wednesday, 12 August 2015 at 07:19:05 UTC, Sönke Ludwig wrote: > We also discussed an alternative approach similar to opt(n).foo.bar[1].baz, where n is a JSONValue and opt() creates a wrapper that enables safe navigation within the DOM, propagating any missing/mismatched fields to the final result instead of throwing. This could also be combined with a final type query: opt!string(n).foo.bar In relation to that, you may find this thread interesting: http://forum.dlang.org/post/lnsc0c$1sip$1@digitalmars.com

On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote: > Just to state explicitly what I mean: This strategy has the most efficient in-memory storage format and profits from all the static type checking niceties of the compiler. It also means that there is a documented schema in the code that be used for reference by the developers and that will automatically be verified by the serializer, resulting in less and better checked code. So where applicable I claim that this is the best strategy to work with such data. > > For maximum efficiency, it can also be transparently combined with the pull parser. The pull parser can for example be used to jump between array entries and the serializer then reads each single array entry. Thing is, the schema is not always known perfectly? Typical case is JSON used for configuration, and diverse version of the software adding new configurations capabilities, or ignoring old ones.

Am 12.08.2015 um 19:10 schrieb deadalnix: > On Wednesday, 12 August 2015 at 08:21:41 UTC, Sönke Ludwig wrote: >> Just to state explicitly what I mean: This strategy has the most >> efficient in-memory storage format and profits from all the static >> type checking niceties of the compiler. It also means that there is a >> documented schema in the code that be used for reference by the >> developers and that will automatically be verified by the serializer, >> resulting in less and better checked code. So where applicable I claim >> that this is the best strategy to work with such data. >> >> For maximum efficiency, it can also be transparently combined with the >> pull parser. The pull parser can for example be used to jump between >> array entries and the serializer then reads each single array entry. > > Thing is, the schema is not always known perfectly? Typical case is JSON > used for configuration, and diverse version of the software adding new > configurations capabilities, or ignoring old ones. > For example in the serialization framework of vibe.d you can have @optional or Nullable fields, you can choose to ignore or error out on unknown fields, and you can have fields of type "Json" or associative arrays to match arbitrary structures. This usually gives enough flexibility, assuming that the program is just interested in fields that it knows about. Of course there are situations where you really just want to access the raw JSON structure, possibly because you are just interested in a small subset of the data. Both, the DOM or the pull parser based approaches, fit in there, based on convenience vs. performance considerations. But things like storing data as JSON in a database or implementing a JSON based protocol usually fit the schema based approach perfectly.

On 8/12/2015 10:10 AM, deadalnix wrote: > Thing is, the schema is not always known perfectly? Typical case is JSON used > for configuration, and diverse version of the software adding new configurations > capabilities, or ignoring old ones. Hah, I'd like to replace dmd.conf with a .json file.

August 13, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Atila Neves

Permalink

Sönke Ludwig

Posted in reply to Atila Neves

Permalink

Am 11.08.2015 um 19:08 schrieb Atila Neves:
> On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
>> Start of the two week process, folks.
>>
>> Code: https://github.com/s-ludwig/std_data_json
>> Docs: http://s-ludwig.github.io/std_data_json/
>>
>> Atila
>
> I forgot to give warnings that the two week period was about to be up,
> and was unsure from comments if this would be ready for voting, so let's
> give it another two days unless there are objections.
>
> Atila

I think we really need to have an informal pre-vote about the BigInt and DOM efficiency vs. functionality issues. Basically there are three options for each:

1. Keep them: May have an impact on compile time for big DOMs (run time/memory consumption wouldn't be affected if a pointer to BigInt is stored). But provides an out-of-the-box experience for a broad set of applications.

2. Remove them: Results in a slim and clean API that is fast (to run/compile), but also one that will be less useful for certain applications.

3. Make them CT configurable: Best of both worlds in terms of speed, at the cost of a more complex API.

4. Use a string representation instead of BigInt: This has it's own set of issues, but would also enable some special use cases [1] [2] ([2] is also solved by BigInt/Decimal support, though).

I'd also like to postpone the main vote, if there are no objections, until the question of using a general enum based alternative to Algebraic is answered. I've published an initial candidate for this now [3].

These were, AFAICS, the only major open issues (a decision for an opt() variant would be nice, but fortunately that's not a fundamental decision in any way). There is also the topic of avoiding any redundancy in symbol names, which I don't agree with, but I would of course change it if the inclusion depends on that.

[1]: https://github.com/rejectedsoftware/vibe.d/issues/431
[2]: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/10098/
[3]: http://code.dlang.org/packages/taggedalgebraic

On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote: > On 8/12/2015 10:10 AM, deadalnix wrote: >> Thing is, the schema is not always known perfectly? Typical case is JSON used >> for configuration, and diverse version of the software adding new configurations >> capabilities, or ignoring old ones. > > > Hah, I'd like to replace dmd.conf with a .json file. Not .json! No configuration file should be in a format that doesn't support comments.

On Thursday, 13 August 2015 at 03:44:14 UTC, Walter Bright wrote: > Hah, I'd like to replace dmd.conf with a .json file. There's an awful lot of people out there replacing json with more ini-like files....

Forums