std.data.json formal review (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.data.json formal review (page 4)

July 29, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Walter Bright

Sönke Ludwig

Posted in reply to Walter Bright

Am 29.07.2015 um 07:43 schrieb Walter Bright:
> On 7/28/2015 3:55 PM, Walter Bright wrote:
>>> OTOH, some people might want the option of parser-driven data processing
>>> instead (e.g. the JSON data is very large and we don't want to store the
>>> whole thing in memory at once).
>>
>> That is a good point.
>
> So it appears that JSON can be in one of 3 useful states:
>
> 1. a range of characters (rc)
> 2. a range of nodes (rn)
> 3. a container of JSON values (values)
>
> What's necessary is simply the ability to convert between these states:
>
> (names are just for illustration)
>
>     rn = rc.toNodes();
>     values = rn.toValues();
>     rn = values.toNodes();
>     rc = rn.toChars();
>
> So, if I wanted to simply pretty print a JSON string s:
>
>     s.toNodes.toChars();
>
> I.e. it's all composable.

There are actually even four levels:
1. Range of characters
2. Range of tokens
3. Range of nodes
4. DOM value

Having a special case for range of DOM values may or may not be a worthwhile thing to optimize for handling big JSON arrays of values. But there is always the pull parser for that kind of data processing.

Currently not all, but most, conversions between the levels are implemented, and sometimes a level is skipped for efficiency. The question is if it would be worth the effort and the API complexity to implement all of them.

lexJSON: character range -> token range
parseJSONStream: character range -> node range
parseJSONStream: token range -> node range
parseJSONValue: character range -> DOM value
parseJSONValue: token range -> DOM value (same for toJSONValue)
writeJSON: token range -> character range (output range)
writeJSON: node range -> character range (output range)
writeJSON: DOM value -> character range (output range)
writeJSON: to -> character range (output range)
(same for toJSON with string output)

Adding an InputStream based version of writeJSON would be an option, but the question is how performant that would be and how to go about implementing the number->InputRange functionality.

July 29, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Andrea Fontana

Sönke Ludwig

Posted in reply to Andrea Fontana

Am 29.07.2015 um 09:46 schrieb Andrea Fontana:
> On Tuesday, 28 July 2015 at 14:07:19 UTC, Atila Neves wrote:
>> Start of the two week process, folks.
>>
>> Code: https://github.com/s-ludwig/std_data_json
>> Docs: http://s-ludwig.github.io/std_data_json/
>>
>> Atila
>
> Why don't do a shortcut like:
>
> jv.opt("/this/is/a/path") ?
>
> I use it in my json/bson binding.

That would be another possibility. What do you think about the opt(jv).foo.bar[12].baz alternative? One advantage is that it could work without parsing a string and the implications thereof (error handling?).

> Anyway, opt(...).isNull return true if that sub-obj doesn't exists.
> How can I check instead if that sub-object is actually null?
>
> Something like:  { "a" : { "b" : null} } ?

opt(...) == null

>
> It would be nice to have a way to get a default if it doesn't exists.
> On my library that behave in a different way i write:
>
> Object is :  { address : { number: 15 } }
>
> // as!xxx try to get a value of that type, if it can't it tries to
> convert it using .to!xxx if it fails again it returns default
>
> // Converted as string
> assert(obj["/address/number"].as!string == "15");
>
> // This doesn't exists
> assert(obj["/address/asdasd"].as!int == int.init);
>
> // A default value is specified
> assert(obj["/address/asdasd"].as!int(50) == 50);
>
> // A default value is specified (but value exists)
> assert(obj["/address/number"].as!int(50) == 15);
>
> // This doesn't exists
> assert(!obj["address"]["number"]["this"].exists);
>
> My library has a get!xxx string too (that throws an exception if value
> is not xxx) and to!xxx that throws an exception if value can't converted
> to xxx.

I try to build this from existing building blocks in Phobos, so opt basically returns a Nullable!Algebraic. I guess some of it could simply be implemented in Algebraic, for example by adding an overload of .get that takes a default value. Instead of .to, you already have .coerce.

The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.bar

>
> Other feature:
> // This field doesn't exists return default value
> auto tmpField = obj["/address/asdasd"].as!int(50);
> assert(tmpField.error == true);   // Value is defaulted ...
> assert(tmpField.exists == false); // ... because it doesn't exists
> assert(tmpField == 50);
>
> // This field exists, but can't be converted to int. Return default value.
> tmpField = obj["/tags/0"].as!int(50);
> assert(tmpField.error == true);   // Value is defaulted ...
> assert(tmpField.exists == true);  // ... but a field is actually here
> assert(tmpField == 50);

July 29, 2015

Re: std.data.json formal review

Posted by Andrea Fontana
in reply to Sönke Ludwig

Andrea Fontana

Posted in reply to Sönke Ludwig

On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:
> That would be another possibility. What do you think about the opt(jv).foo.bar[12].baz alternative? One advantage is that it could work without parsing a string and the implications thereof (error handling?).

I implemented it too, but I removed.
Many times fields name are functions name or similar and it breaks the code.
In my implementation it creates a lot of temporary objects (one for each subobj) using the string instead, i just create the last one.

It's not easy for me to use assignments with that syntax. Something like:

obj.with.a.new.field = 3;

It's difficult to implement. It's much easier to implement:

obj["/field/doesnt/exists"] = 3

It's much easier to write formatted-string paths.
It allows future implementation of something like xpath/jquery style

If your json contains keys with "/" inside, you can still use old plain syntax...

String parsing it's quite easy (at compile time too) of course. If a part of path doesn't exists it works like a part of opt("a", "b", "c") doesn't. It's just syntax sugar. :)

>> Anyway, opt(...).isNull return true if that sub-obj doesn't exists.
>> How can I check instead if that sub-object is actually null?
>>
>> Something like:  { "a" : { "b" : null} } ?
>
> opt(...) == null

Does it works? Anyway it seems ambiguous:
opt(...) == null   => false
opt(...).isNull    => true

>>
>> It would be nice to have a way to get a default if it doesn't exists.
>> On my library that behave in a different way i write:
>>
>> Object is :  { address : { number: 15 } }
>>
>> // as!xxx try to get a value of that type, if it can't it tries to
>> convert it using .to!xxx if it fails again it returns default
>>
>> // Converted as string
>> assert(obj["/address/number"].as!string == "15");
>>
>> // This doesn't exists
>> assert(obj["/address/asdasd"].as!int == int.init);
>>
>> // A default value is specified
>> assert(obj["/address/asdasd"].as!int(50) == 50);
>>
>> // A default value is specified (but value exists)
>> assert(obj["/address/number"].as!int(50) == 15);
>>
>> // This doesn't exists
>> assert(!obj["address"]["number"]["this"].exists);
>>
>> My library has a get!xxx string too (that throws an exception if value
>> is not xxx) and to!xxx that throws an exception if value can't converted
>> to xxx.
>
> I try to build this from existing building blocks in Phobos, so opt basically returns a Nullable!Algebraic. I guess some of it could simply be implemented in Algebraic, for example by adding an overload of .get that takes a default value. Instead of .to, you already have .coerce.
>
> The other possible approach, which would be more convenient to use, would be add a "default value" overload to "opt", for example: jv.opt("defval").foo.bar

Isn't jv.opt("defval") taking the value of ("defval") rather than setting a default value?

July 29, 2015

Re: std.data.json formal review

Posted by Jacob Carlborg
in reply to Walter Bright

Jacob Carlborg

Posted in reply to Walter Bright

On 2015-07-29 06:57, Walter Bright wrote:

> A JSON value is a tagged union of the various types.

But in most cases I think there will be one root node, of type object. In that case it would be range with only one element? How does that help?

-- 
/Jacob Carlborg

July 29, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Jacob Carlborg

Sönke Ludwig

Posted in reply to Jacob Carlborg

Am 29.07.2015 um 12:10 schrieb Jacob Carlborg:
> On 2015-07-29 06:57, Walter Bright wrote:
>
>> A JSON value is a tagged union of the various types.
>
> But in most cases I think there will be one root node, of type object.
> In that case it would be range with only one element? How does that help?
>

I think a better approach that to add such a special case is to add a readValue function that takes a range of parser nodes and reads into a single JSONValue. That way one can use the pull parser to jump between array or object entries and then extract individual values, or maybe even use nodes.map!readValue to get a range of values...

July 29, 2015

Re: std.data.json formal review

Posted by Don
in reply to Walter Bright

Don

Posted in reply to Walter Bright

On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
> On 7/28/2015 7:07 AM, Atila Neves wrote:
>> Start of the two week process, folks.
>
> Thank you very much, Sönke, for taking this on. Thank you, Atila, for taking on the thankless job of being review manager.
>
> Just looking at the documentation only, some general notes:
>
> 1. Not sure that 'JSON' needs to be embedded in the public names. 'parseJSONStream' should just be 'parseStream', etc. Name disambiguation, if needed, should be ably taken care of by a number of D features for that purpose. Additionally, I presume that the stdx.data package implies a number of different formats. These formats should all use the same names with as similar as possible APIs - this won't work too well if JSON is embedded in the APIs.
>
> 2. JSON is a trivial format, http://json.org/. But I count 6 files and 30 names in the public API.
>
> 3. Stepping back a bit, when I think of parsing JSON data, I think:
>
>     auto ast = inputrange.toJSON();
>
> where toJSON() accepts an input range and produces a container, the ast. The ast is just a JSON value. Then, I can just query the ast to see what kind of value it is (using overloading), and walk it as necessary. To create output:
>
>     auto r = ast.toChars();  // r is an InputRange of characters
>     writeln(r);
>
> So, we'll need:
>     toJSON
>     toChars
>     JSONException
>
> The possible JSON values are:
>     string
>     number
>     object (associative arrays)
>     array
>     true
>     false
>     null
>
> Since these are D builtin types, they can actually be a simple union of D builtin types.

Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file).

Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in.

And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case.

It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.

July 29, 2015

Re: std.data.json formal review

Posted by H. S. Teoh
in reply to Don

H. S. Teoh

Posted in reply to Don

On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:
> On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
[...]
> >The possible JSON values are:
> >    string
> >    number
> >    object (associative arrays)
> >    array
> >    true
> >    false
> >    null
> >
> >Since these are D builtin types, they can actually be a simple union of D builtin types.
> 
> Related to this: it should not be importing std.bigint. Note that if std.bigint were fully implemented, it would be very heavyweight (optimal multiplication of enormous integers involves fast fourier transforms and all kinds of odd stuff, that's really bizarre to pull in if you're just parsing a trivial little JSON config file).
> 
> Although it is possible for JSON to contain numbers which are larger than can fit into long or ulong, it's an abnormal case. Many apps (probably, almost all) will want to reject such numbers immediately. BigInt should be opt-in.
> 
> And, it is also possible to have floating point numbers that are not representable in double or real. BigInt doesn't solve that case.
> 
> It might be adequate to simply present it as a raw number (an unconverted string) if it isn't a built-in type. Parse it for validity, but don't actually convert it.
[...]

Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to?  The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.)

T

-- 
Be in denial for long enough, and one day you'll deny yourself of things you wish you hadn't.

July 29, 2015

Re: std.data.json formal review

Posted by Laeeth Isharc
in reply to H. S. Teoh

Laeeth Isharc

Posted in reply to H. S. Teoh

> Here's a thought: what about always storing JSON numbers as strings (albeit tagged with the "number" type, to differentiate them from actual strings in the input), and the user specifies what type to convert it to?  The default type can be something handy, like int, but the user has the option to ask for size_t, or double, or even BigInt if they want (IIRC, the BigInt ctor can initialize an instance from a digit string, so if we adopt the convention that non-built-in number-like types can be initialized from digit strings, then std.json can simply take a template parameter for the output type, and hand it the digit string. This way, we can get rid of the std.bigint dependency, except where the user actually wants to use BigInt.)

Some JSON files can be quite large...

For example, I have a compressed 175 Gig of Reddit comments (one file per month) I would like to work with using D, and time + memory demands = money.

Wouldn't it be a pain not to store numbers directly when parsing in those cases (if I understood you correctly)?

July 29, 2015

Re: std.data.json formal review

Posted by sigod
in reply to Laeeth Isharc

sigod

Posted in reply to Laeeth Isharc

On Wednesday, 29 July 2015 at 17:04:33 UTC, Laeeth Isharc wrote:
>> [...]
>
> Some JSON files can be quite large...
>
> For example, I have a compressed 175 Gig of Reddit comments (one file per month) I would like to work with using D, and time + memory demands = money.
>
> Wouldn't it be a pain not to store numbers directly when parsing in those cases (if I understood you correctly)?

I think in your case it wouldn't matter. Comments are text, mostly. There's probably just one or two fields with "number" type.

July 29, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Don

Sönke Ludwig

Posted in reply to Don

Am 29.07.2015 um 17:22 schrieb Don:
>
> Related to this: it should not be importing std.bigint. Note that if
> std.bigint were fully implemented, it would be very heavyweight (optimal
> multiplication of enormous integers involves fast fourier transforms and
> all kinds of odd stuff, that's really bizarre to pull in if you're just
> parsing a trivial little JSON config file).
>
> Although it is possible for JSON to contain numbers which are larger
> than can fit into long or ulong, it's an abnormal case. Many apps
> (probably, almost all) will want to reject such numbers immediately.
> BigInt should be opt-in.

BigInt is opt-in, at least as far as the lexer goes. But why would such a number be rejected? Any of the usual floating point parsers would simply parse the number and just lose precision if it can't be represented exactly. And after all, it's still valid JSON.

But note that I've only added this due to multiple requests, it doesn't seem to be that uncommon. We *could* in theory make the JSONNumber type a template and make the bigint fields optional. That would be the only thing missing to making the import optional, too.

>
> And, it is also possible to have floating point numbers that are not
> representable in double or real. BigInt doesn't solve that case.
>
> It might be adequate to simply present it as a raw number (an
> unconverted string) if it isn't a built-in type. Parse it for validity,
> but don't actually convert it.

If we'd have a Decimal type in Phobos, I would have integrated that, too. The string representation may be an alternative, but since the weight of the import is the main argument, I'd rather choose the more comfortable/logical option - or probably rather try to avoid std.bigint being such a heavy import (such as local imports to defer secondary imports).

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation