July 29, 2015
Am 29.07.2015 um 18:47 schrieb H. S. Teoh via Digitalmars-d:
> On Wed, Jul 29, 2015 at 03:22:05PM +0000, Don via Digitalmars-d wrote:
>> On Tuesday, 28 July 2015 at 22:29:01 UTC, Walter Bright wrote:
> [...]
>>> The possible JSON values are:
>>>     string
>>>     number
>>>     object (associative arrays)
>>>     array
>>>     true
>>>     false
>>>     null
>>>
>>> Since these are D builtin types, they can actually be a simple union
>>> of D builtin types.
>>
>> Related to this: it should not be importing std.bigint. Note that if
>> std.bigint were fully implemented, it would be very heavyweight
>> (optimal multiplication of enormous integers involves fast fourier
>> transforms and all kinds of odd stuff, that's really bizarre to pull
>> in if you're just parsing a trivial little JSON config file).
>>
>> Although it is possible for JSON to contain numbers which are larger
>> than can fit into long or ulong, it's an abnormal case. Many apps
>> (probably, almost all) will want to reject such numbers immediately.
>> BigInt should be opt-in.
>>
>> And, it is also possible to have floating point numbers that are not
>> representable in double or real. BigInt doesn't solve that case.
>>
>> It might be adequate to simply present it as a raw number (an
>> unconverted string) if it isn't a built-in type. Parse it for
>> validity, but don't actually convert it.
> [...]
>
> Here's a thought: what about always storing JSON numbers as strings
> (albeit tagged with the "number" type, to differentiate them from actual
> strings in the input), and the user specifies what type to convert it
> to?  The default type can be something handy, like int, but the user has
> the option to ask for size_t, or double, or even BigInt if they want
> (IIRC, the BigInt ctor can initialize an instance from a digit string,
> so if we adopt the convention that non-built-in number-like types can be
> initialized from digit strings, then std.json can simply take a template
> parameter for the output type, and hand it the digit string. This way,
> we can get rid of the std.bigint dependency, except where the user
> actually wants to use BigInt.)
>
>
> T

That means a performance hit, because the string has to be parsed twice - once for validation and once for conversion. And it means that for non-string inputs the lexer has to allocate for each number. It also doesn't know the length of the number in advance, so it can't allocate in a generally efficient way.

July 29, 2015
Hi Sonke,

Great to see your module moving towards phobos inclusion (I have not been following the latest progress of D sadly :() ! Just a small remark from the documentation example.

Maybe it would be better to replace :

    value.toJSONString!true()

by

    value.toJSONString!prettify()

using a well-named enum instead of a boolean which could seem obscure I now Eigen C++ lib use a similar thing for static vs dynamic matrix.

Thanks for the read. Regards,

matovitch
July 29, 2015
On 7/29/2015 3:10 AM, Jacob Carlborg wrote:
> On 2015-07-29 06:57, Walter Bright wrote:
>
>> A JSON value is a tagged union of the various types.
>
> But in most cases I think there will be one root node, of type object.

An object is a collection of other Values.


> In that case it would be range with only one element? How does that help?

I don't understand the question.
July 29, 2015
On 7/28/2015 10:49 PM, H. S. Teoh via Digitalmars-d wrote:
> How does a linear range of nodes convey a nested structure?

You'd need to add a special node type, 'end'. So an array [1,true] would look like:

    array number true end


July 29, 2015
On 7/29/2015 1:37 AM, Sönke Ludwig wrote:
> There are actually even four levels:
> 1. Range of characters
> 2. Range of tokens
> 3. Range of nodes
> 4. DOM value

What's the need for users to see a token stream? I don't know what the DOM value is - is that just JSON as an ast?


> Having a special case for range of DOM values may or may not be a worthwhile
> thing to optimize for handling big JSON arrays of values.

I see no point for that.


> Currently not all, but most, conversions between the levels are implemented, and
> sometimes a level is skipped for efficiency. The question is if it would be
> worth the effort and the API complexity to implement all of them.
>
> lexJSON: character range -> token range
> parseJSONStream: character range -> node range
> parseJSONStream: token range -> node range
> parseJSONValue: character range -> DOM value
> parseJSONValue: token range -> DOM value (same for toJSONValue)
> writeJSON: token range -> character range (output range)
> writeJSON: node range -> character range (output range)
> writeJSON: DOM value -> character range (output range)
> writeJSON: to -> character range (output range)
> (same for toJSON with string output)

I don't see why there are more than the 3 I mentioned.

July 29, 2015
On 2015-07-29 20:33, Walter Bright wrote:

> On 7/29/2015 3:10 AM, Jacob Carlborg wrote:
>> But in most cases I think there will be one root node, of type object.
>
> An object is a collection of other Values.
>
>
>  > In that case it would be range with only one element? How does that
> help?
>
> I don't understand the question.

I guess I'm finding it difficult to picture a JSON structure as a range. How would the following JSON be returned as a range?

{
  "a": 1,
  "b": [2, 3],
  "c": { "d": 4 }
}

-- 
/Jacob Carlborg
July 29, 2015
Am 29.07.2015 um 20:21 schrieb matovitch:
> Hi Sonke,
>
> Great to see your module moving towards phobos inclusion (I have not
> been following the latest progress of D sadly :() ! Just a small remark
> from the documentation example.
>
> Maybe it would be better to replace :
>
>      value.toJSONString!true()
>
> by
>
>      value.toJSONString!prettify()
>
> using a well-named enum instead of a boolean which could seem obscure I
> now Eigen C++ lib use a similar thing for static vs dynamic matrix.
>
> Thanks for the read. Regards,
>
> matovitch

Hm, that example is outdated, I'll fix it ASAP. Currently it uses toJSON and a separate toPrettyJSON function. An obvious alternative would be to add an entry GeneratorOptions.prettify, because toJSON already takes that as a template argument: toJSON!(GeneratorOptions.prettify)
July 29, 2015
Am 29.07.2015 um 20:44 schrieb Walter Bright:
> On 7/29/2015 1:37 AM, Sönke Ludwig wrote:
>> There are actually even four levels:
>> 1. Range of characters
>> 2. Range of tokens
>> 3. Range of nodes
>> 4. DOM value
>
> What's the need for users to see a token stream? I don't know what the
> DOM value is - is that just JSON as an ast?

Yes.

>> Having a special case for range of DOM values may or may not be a
>> worthwhile
>> thing to optimize for handling big JSON arrays of values.
>
> I see no point for that.

Hm, I misread "container of JSON values" as "range of JSON values". I guess you just meant JSONValue, so my comment doesn't apply.

>> Currently not all, but most, conversions between the levels are
>> implemented, and
>> sometimes a level is skipped for efficiency. The question is if it
>> would be
>> worth the effort and the API complexity to implement all of them.
>>
>> lexJSON: character range -> token range
>> parseJSONStream: character range -> node range
>> parseJSONStream: token range -> node range
>> parseJSONValue: character range -> DOM value
>> parseJSONValue: token range -> DOM value (same for toJSONValue)
>> writeJSON: token range -> character range (output range)
>> writeJSON: node range -> character range (output range)
>> writeJSON: DOM value -> character range (output range)
>> writeJSON: to -> character range (output range)
>> (same for toJSON with string output)
>
> I don't see why there are more than the 3 I mentioned.

The token level is useful for reasoning about the text representation. It could be used for example to implement syntax highlighting, or for using the location information to mark errors in the source code.

July 29, 2015
On 7/29/2015 11:51 AM, Jacob Carlborg wrote:
> I guess I'm finding it difficult to picture a JSON structure as a range. How
> would the following JSON be returned as a range?
>
> {
>    "a": 1,
>    "b": [2, 3],
>    "c": { "d": 4 }
> }


It if was returned as a range of nodes, it would be:

   Object, string, number, string, array, number, number, end, string, object, string, number, end, end

If was returned as a Value, then you could ask the value to return a range of nodes.

A container is not a range, although it may offer a way to get range that iterates over its contents.

July 30, 2015
W dniu 2015-07-29 o 00:37, H. S. Teoh via Digitalmars-d pisze:
> On Tue, Jul 28, 2015 at 03:29:02PM -0700, Walter Bright via Digitalmars-d wrote:
> [...]
>> 3. Stepping back a bit, when I think of parsing JSON data, I think:
>>
>>      auto ast = inputrange.toJSON();
>>
>> where toJSON() accepts an input range and produces a container, the
>> ast. The ast is just a JSON value. Then, I can just query the ast to
>> see what kind of value it is (using overloading), and walk it as
>> necessary.
>
> +1. The API should be as simple as possible.
>
> Ideally, I'd say hook it up to std.conv.to for maximum flexibility. Then
> you can just use to() to convert between a JSON container and the value
> that it represents (assuming the types are compatible).
>
> OTOH, some people might want the option of parser-driven data processing
> instead (e.g. the JSON data is very large and we don't want to store the
> whole thing in memory at once). I'm not sure what a good API for that
> would be, though.

Here's mine range based parser, you can parse 1 TB json file without a single allocation. It needs heavy polishing, but I didnt have time/need to do it. Basically a WIP, but maybe someone will find it useful.

https://github.com/pszturmaj/json-streaming-parser