August 22, 2014
Some thoughts about the API:

1) Instead of `parseJSONValue` and `lexJSON`, how about static methods `JSON.parse` and `JSON.lex`, or even a module level functions `std.data.json.parse` etc.? The "JSON" part of the name is redundant.

2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to have different names. They can be distinguished by their parameter types.

3) `toJSONString` shouldn't just take a boolean as flag for pretty-printing. It should either use something like `Pretty.YES`, or the function should be called `toPrettyJSONString` (I believe I have seen this latter convention elsewhere).
We should also think about whether we can just call the functions `toString` and `toPrettyString`. Alternatively, `toJSON` and `toPrettyJSON` should be considered.
August 22, 2014
Am 22.08.2014 16:53, schrieb Ary Borenszweig:
> On 8/22/14, 3:33 AM, Sönke Ludwig wrote:
>> Without a serialization framework it would in theory work like this:
>>
>>      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
>>      auto p = new Person(v["name"].get!string, v["age"].get!int);
>>
>> unfortunately the operator overloading doesn't work like this currently,
>> so this is needed:
>>
>>      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
>>      auto p = new Person(
>>          v.get!(Json[string])["name"].get!string,
>>          v.get!(Json[string])["age"].get!int);
>
> But does this parse the whole json into JSONValue? I want to create a
> Person without creating an intermediate JSONValue for the whole json.
> Can this be done?

That would be done by the serialization framework. Instead of using parseJSON(), it could use parseJSONStream() to populate the Person instance on the fly, without putting the whole JSON into memory. But I'd like to leave that for a later addition, because we'd otherwise end up with duplicate functionality once std.serialization gets finalized.

Manually it would work similar to this:

auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`);
with (JSONParserNode.Kind) {
	enforce(nodes.front == objectStart);
	nodes.popFront();
	while (nodes.front != objectEnd) {
		auto key = nodes.front.key;
		nodes.popFront();
		if (key == "name")
			person.name = nodes.front.literal.string;
		else if (key == "age")
			person.age = nodes.front.literal.number;
	}
}
August 22, 2014
It would be nice to have integers treated separately to doubles. I know it makes the number parsing simpler to just treat everything as double, but still, it could be annoying when you expect an integer type.

I'd also like to see some benchmarks, particularly against some of the high performance C++ parsers, i.e. rapidjson, gason, sajson. Or even some of the "not bad" performance parsers with better APIs, i.e. QJsonDocument, jsoncpp and jsoncons (slow but perhaps comparable interface to this proposal?).
August 22, 2014
Am 22.08.2014 18:15, schrieb "Marc Schütz" <schuetzm@gmx.net>":
> Some thoughts about the API:
>
> 1) Instead of `parseJSONValue` and `lexJSON`, how about static methods
> `JSON.parse` and `JSON.lex`, or even a module level functions
> `std.data.json.parse` etc.? The "JSON" part of the name is redundant.

For those functions it may be acceptable, although I really dislike that style, because it makes the code harder to read (what exactly does this parse?) and the functions are rarely used, so that that typing that additional "JSON" should be no issue at all. On the other hand, if you always type "JSON.lex" it's more to type than just "lexJSON".

But for "[JSON]Value" it gets ugly really quick, because "Value"s are such a common thing and quickly occur in multiple kinds in the same source file.

>
> 2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to
> have different names. They can be distinguished by their parameter types.

Actually they take exactly the same parameters and just differ in their return value. It would be more descriptive to name them parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? The current naming is somewhat modeled after std.conv's "to!T" and "parse!T".

>
> 3) `toJSONString` shouldn't just take a boolean as flag for
> pretty-printing. It should either use something like `Pretty.YES`, or
> the function should be called `toPrettyJSONString` (I believe I have
> seen this latter convention elsewhere).
> We should also think about whether we can just call the functions
> `toString` and `toPrettyString`. Alternatively, `toJSON` and
> `toPrettyJSON` should be considered.

Agreed, a boolean isn't good for a public interface, renaming the current writeAsString to private writeAsStringImpl and then adding "(writeAs/to)[Pretty]String" sounds reasonable. Actually I've done it that way for vibe.data.json.
August 22, 2014
Am 22.08.2014 18:31, schrieb Christian Manning:
> It would be nice to have integers treated separately to doubles. I know
> it makes the number parsing simpler to just treat everything as double,
> but still, it could be annoying when you expect an integer type.

That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?

>
> I'd also like to see some benchmarks, particularly against some of the
> high performance C++ parsers, i.e. rapidjson, gason, sajson. Or even
> some of the "not bad" performance parsers with better APIs, i.e.
> QJsonDocument, jsoncpp and jsoncons (slow but perhaps comparable
> interface to this proposal?).

That would indeed be nice to have, but I'm not sure if I can manage to squeeze that in besides finishing the module itself. My time frame for working on this is quite limited.
August 22, 2014
On 8/22/14, 1:24 PM, Sönke Ludwig wrote:
> Am 22.08.2014 16:53, schrieb Ary Borenszweig:
>> On 8/22/14, 3:33 AM, Sönke Ludwig wrote:
>>> Without a serialization framework it would in theory work like this:
>>>
>>>      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
>>>      auto p = new Person(v["name"].get!string, v["age"].get!int);
>>>
>>> unfortunately the operator overloading doesn't work like this currently,
>>> so this is needed:
>>>
>>>      JSONValue v = parseJSON(`{"age": 10, "name": "John"}`);
>>>      auto p = new Person(
>>>          v.get!(Json[string])["name"].get!string,
>>>          v.get!(Json[string])["age"].get!int);
>>
>> But does this parse the whole json into JSONValue? I want to create a
>> Person without creating an intermediate JSONValue for the whole json.
>> Can this be done?
>
> That would be done by the serialization framework. Instead of using
> parseJSON(), it could use parseJSONStream() to populate the Person
> instance on the fly, without putting the whole JSON into memory. But I'd
> like to leave that for a later addition, because we'd otherwise end up
> with duplicate functionality once std.serialization gets finalized.
>
> Manually it would work similar to this:
>
> auto nodes = parseJSONStream(`{"age": 10, "name": "John"}`);
> with (JSONParserNode.Kind) {
>      enforce(nodes.front == objectStart);
>      nodes.popFront();
>      while (nodes.front != objectEnd) {
>          auto key = nodes.front.key;
>          nodes.popFront();
>          if (key == "name")
>              person.name = nodes.front.literal.string;
>          else if (key == "age")
>              person.age = nodes.front.literal.number;
>      }
> }

Cool, that looks good :-)
August 22, 2014
On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:
> Am 22.08.2014 18:15, schrieb "Marc Schütz" <schuetzm@gmx.net>":
>> Some thoughts about the API:
>>
>> 1) Instead of `parseJSONValue` and `lexJSON`, how about static methods
>> `JSON.parse` and `JSON.lex`, or even a module level functions
>> `std.data.json.parse` etc.? The "JSON" part of the name is redundant.
>
> For those functions it may be acceptable, although I really dislike that style, because it makes the code harder to read (what exactly does this parse?) and the functions are rarely used, so that that typing that additional "JSON" should be no issue at all. On the other hand, if you always type "JSON.lex" it's more to type than just "lexJSON".

I'm not really concerned about the amount of typing, it just seemed a bit odd to have the redundant JSON in there, as we have module names for namespacing. Your argument about readability is true nevertheless. But...

>
> But for "[JSON]Value" it gets ugly really quick, because "Value"s are such a common thing and quickly occur in multiple kinds in the same source file.
>
>>
>> 2) Also, `parseJSONValue` and `parseJSONStream` probably don't need to
>> have different names. They can be distinguished by their parameter types.
>
> Actually they take exactly the same parameters and just differ in their return value. It would be more descriptive to name them parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or parseJSONToValue? The current naming is somewhat modeled after std.conv's "to!T" and "parse!T".

... why not use exactly the same convention then? => `parse!JSONValue`

Would be nice to have a "pluggable" API where you just need to specify the type in a factory method to choose the input format. Then there could be `parse!BSON`, `parse!YAML`, with the same style as `parse!(int[])`.

I know this sound a bit like bike-shedding, but the API shouldn't stand by itself, but fit into the "big picture", especially as there will probably be other parsers (you already named the module std._data_.json).
August 22, 2014
On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
> Am 22.08.2014 18:31, schrieb Christian Manning:
>> It would be nice to have integers treated separately to doubles. I know
>> it makes the number parsing simpler to just treat everything as double,
>> but still, it could be annoying when you expect an integer type.
>
> That's how I've done it for vibe.data.json, too. For the new implementation, I've just used the number parsing routine from Andrei's std.jgrandson module. Does anybody have reservations about representing integers as "long" instead?

It should automatically fall back to double on overflow. Maybe even use BigInt if applicable?
August 22, 2014
Am 22.08.2014 19:24, schrieb "Marc Schütz" <schuetzm@gmx.net>":
> On Friday, 22 August 2014 at 16:48:44 UTC, Sönke Ludwig wrote:
>>
>> Actually they take exactly the same parameters and just differ in
>> their return value. It would be more descriptive to name them
>> parseAsJSONValue and parseAsJSONStream - or maybe parseJSONAsValue or
>> parseJSONToValue? The current naming is somewhat modeled after
>> std.conv's "to!T" and "parse!T".
>
> ... why not use exactly the same convention then? => `parse!JSONValue`
>
> Would be nice to have a "pluggable" API where you just need to specify
> the type in a factory method to choose the input format. Then there
> could be `parse!BSON`, `parse!YAML`, with the same style as
> `parse!(int[])`.
>
> I know this sound a bit like bike-shedding, but the API shouldn't stand
> by itself, but fit into the "big picture", especially as there will
> probably be other parsers (you already named the module std._data_.json).

That would be nice, but then it should also work together with std.conv, which basically is exactly this pluggable API. Just like this it would result in an ambiguity error if both std.data.json and std.conv are imported at the same time.

Is there a way to make std.conv work properly with JSONValue? I guess the only theoretical way would be to put something in JSONValue, but that would result in a slightly ugly cyclic dependency between parser.d and value.d.
August 22, 2014
Am 22.08.2014 19:27, schrieb "Marc Schütz" <schuetzm@gmx.net>":
> On Friday, 22 August 2014 at 16:56:26 UTC, Sönke Ludwig wrote:
>> Am 22.08.2014 18:31, schrieb Christian Manning:
>>> It would be nice to have integers treated separately to doubles. I know
>>> it makes the number parsing simpler to just treat everything as double,
>>> but still, it could be annoying when you expect an integer type.
>>
>> That's how I've done it for vibe.data.json, too. For the new
>> implementation, I've just used the number parsing routine from
>> Andrei's std.jgrandson module. Does anybody have reservations about
>> representing integers as "long" instead?
>
> It should automatically fall back to double on overflow. Maybe even use
> BigInt if applicable?

I guess BigInt + exponent would be the only lossless way to represent any JSON number. That could then be converted to any desired smaller type as required.

But checking for overflow during number parsing would definitely have an impact on parsing speed, as well as using a BigInt of course, so the question is how we want set up the trade off here (or if there is another way that is overhead-free).