July 30, 2015
On 7/29/2015 1:41 PM, Sönke Ludwig wrote:
> The token level is useful for reasoning about the text representation. It could
> be used for example to implement syntax highlighting, or for using the location
> information to mark errors in the source code.

Ok, I see your point. The question then becomes does the node stream really add enough value to justify its existence, as it greatly overlaps the token stream.

July 30, 2015
On 7/29/2015 1:18 AM, Sönke Ludwig wrote:
> Am 29.07.2015 um 00:29 schrieb Walter Bright:
>> 1. Not sure that 'JSON' needs to be embedded in the public names.
>> 'parseJSONStream' should just be 'parseStream', etc. Name
>> disambiguation, if needed, should be ably taken care of by a number of D
>> features for that purpose. Additionally, I presume that the stdx.data
>> package implies a number of different formats. These formats should all
>> use the same names with as similar as possible APIs - this won't work
>> too well if JSON is embedded in the APIs.
>
> This is actually one of my pet peeves. Having a *readable* API that tells the
> reader immediately what happens is IMO one of the most important aspects (far
> more important than an API that allows quick typing). A number of times I've
> seen D code that omits part of what it actually does in its name and the result
> was that it was constantly necessary to scroll up to see where a particular name
> might come from. So I have a strong preference to keep "JSON", because it's an
> integral part of the semantics.

I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.



>> 2. JSON is a trivial format, http://json.org/. But I count 6 files and
>> 30 names in the public API.
>
> The whole thing provides a stream parser with high level helpers to make it
> convenient to use, a DOM module, a separate lexer and a generator module that
> operates in various different modes (maybe two additional modes still to come!).
> Every single function provides real and frequently useful benefits. So if
> anything, there are still some little things missing.

I understand there is a purpose to each of those things, but there's also considerable value in a simpler API.


> All in all, even if JSON may be a simple format, the source code is already
> almost 5k LOC (includes unit tests of course).

I don't count unit tests as LOC :-)

> But apart from maintainability
> they have mainly been separated to minimize the amount of code that needs to be
> dragged in for a particular functionality (not only other JSON modules, but also
> from different parts of Phobos).

They are so strongly related I don't see this as a big issue. Also, if they are templates, they don't get compiled in if not used.


>> 3. Stepping back a bit, when I think of parsing JSON data, I think:
>>
>>      auto ast = inputrange.toJSON();
>>
>> where toJSON() accepts an input range and produces a container, the ast.
>> The ast is just a JSON value. Then, I can just query the ast to see what
>> kind of value it is (using overloading), and walk it as necessary.
>
> We can drop the "Value" part of the name of course, if we expect that function
> to be used a lot, but there is still the parseJSONStream function which is
> arguably not less important. BTW, you just mentioned the DOM part so far, but
> for any code that where performance is a priority, the stream based pull parser
> is basically the way to go. This would also be the natural entry point for any
> serialization library.

Agreed elsewhere. But still, I am not seeing a range interface on the functions. The lexer, for example, does not accept an input range of characters.

Having a range interface is absolutely critical, and is the thing I am the most adamant about with all new Phobos additions. Any function that accepts arbitrarily long data should accept an input range instead, any function that generates arbitrary data should present that as an input range.

Any function that builds a container should accept an input range to fill that container with. Any function that builds a container should also be an output range.


> And my prediction is, if we do it right, that working with JSON will in most
> cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct
> that gets populated with the deserialized JSON data.

json_input must be a input range of characters.


> Where that doesn't fit,
> performance oriented code would use the pull parser.

I am not sure what you mean by 'pull parser'. Do you mean the parser presents an input range as its output, and incrementally parses only as the next value is requested?


> So the DOM part of the
> system, which is the only thing the current JSON module has, will only be left
> as a niche functionality.

That's ok. Is it normal practice to call the JSON data structure a Document Object Model?


>> To create output:
>>
>>      auto r = ast.toChars();  // r is an InputRange of characters
>>      writeln(r);
>
> Do we have an InputRange version of the various number-to-string conversions?

We do now, at least for integers. I plan to do ones for floating point.


> It would be quite inconvenient to reinvent those (double, long, BigInt) in the JSON
> package.

Right. It's been reinvented multiple times in Phobos, which is absurd. If you're reinventing them in std.data.json, then we're doing something wrong again.


> Of course, using to!string internally would be an option, but it would
> obviously destroy all @nogc opportunities and performance benefits.

That's exactly why the range versions were done.


>> So, we'll need:
>>      toJSON
>>      toChars
>>      JSONException
>>
>> The possible JSON values are:
>>      string
>>      number
>>      object (associative arrays)
>>      array
>>      true
>>      false
>>      null
>>
>> Since these are D builtin types, they can actually be a simple union of
>> D builtin types.
>
> The idea is to have JSONValue be a simple alias to Algebraic!(...), just that
> there are currently still some workarounds for DMD < 2.067.0 on top, which means
> that JSONValue is a struct that "alias this" inherits from Algebraic for the
> time being. Those workarounds will be removed when the code is actually put into
> Phobos.
>
> But a simple union would obviously not be enough, it still needs a type tag of
> some form and needs to provide a @safe interface on top of it.

Agreed.


> Algebraic is the only thing that comes close right now,
> but I'd really prefer to have a fully
> statically typed version of Algebraic that uses an enum as the type tag instead
> of working with delegates/typeinfo.

If Algebraic is not good enough for this, it is a failure and must be fixed.


>> There is a decision needed about whether toJSON() allocates data or
>> returns slices into its inputrange. This can be 'static if' tested by:
>> if inputrange can return immutable slices.
>
> The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but
> slicing is done in those cases and the non-DOM parser interface is even @nogc as
> long as exceptions are disabled.

With a range interface, you can test for 1) hasSlicing and 2) if ElementEncodingType is immutable.

Why is ubyte being accepted? The ECMA-404 spec sez: "Conforming JSON text is a sequence of Unicode code points".


>> toChars() can take a compile
>> time argument to determine if it is 'pretty' or not.
>
> As long as JSON DOM values are stored in a generic Algebraic (which is a huge
> win in terms of interoperability!), toChars won't suffice as a name.

Why not?

> It would have to be toJSON(Chars) (as it basically is now). I've gave the "pretty"
> version a separate name simply because it's more convenient to use and pretty
> printing will probably be by far the most frequently used option when converting
> to a string.

So make pretty printing the default. In fact, I'm skeptical that a non-pretty printed version is worth while. Note that an adapter algorithm can strip redundant whitespace.

July 30, 2015
If this implementation will be merged with phobos will vibed migrate to it, or it would two similar libs?
July 30, 2015
Am 30.07.2015 um 05:25 schrieb Walter Bright:
> On 7/29/2015 1:41 PM, Sönke Ludwig wrote:
>> The token level is useful for reasoning about the text representation.
>> It could
>> be used for example to implement syntax highlighting, or for using the
>> location
>> information to mark errors in the source code.
>
> Ok, I see your point. The question then becomes does the node stream
> really add enough value to justify its existence, as it greatly overlaps
> the token stream.
>

I agree that in case of JSON their difference can be a bit subtle. Basically the node stream adds knowledge about the nesting of elements, as well as adding semantic meaning to special token sequences that the library users would otherwise have to parse themselves. Finally, it also guarantees a valid JSON structure, while a token range could have tokens in any order.

Especially the knowledge about nesting is also a requirement for the high-level pull parser functions (skipToKey, readArray, readString etc.) that make working with that kind of pull parser interface actually bearable, outside of mechanic code like a generic serialization framework.
July 30, 2015
Am 30.07.2015 um 09:27 schrieb Suliman:
> If this implementation will be merged with phobos will vibed migrate to
> it, or it would two similar libs?

I'll then make the vibe.d JSON module compatible using "alias this" implicit conversions and then deprecate it over a longer period of time before it gets removed. And of course the serialization framework will be adjusted to work with the new JSON module.
July 30, 2015
Am 29.07.2015 um 11:58 schrieb Andrea Fontana:
> On Wednesday, 29 July 2015 at 08:55:20 UTC, Sönke Ludwig wrote:
>> That would be another possibility. What do you think about the
>> opt(jv).foo.bar[12].baz alternative? One advantage is that it could
>> work without parsing a string and the implications thereof (error
>> handling?).
>
> I implemented it too, but I removed.
> Many times fields name are functions name or similar and it breaks the
> code.

In this case, since it would be a separate type, there are no static members apart from the automatically generated ones and maybe something like opIndex/opAssign. It can of course also overload opIndex with a string argument, so that there is a generic alternative in case of conflicts or runtime key names.

> In my implementation it creates a lot of temporary objects (one for each
> subobj) using the string instead, i just create the last one.

If the temporary objects are cheap, I don't see an issue there. Without keeping track of the path, a simple pointer to a JSONValue should be sufficient (the temporary objects have to be made non-copyable).

>
> It's not easy for me to use assignments with that syntax. Something like:
>
> obj.with.a.new.field = 3;
>
> It's difficult to implement. It's much easier to implement:
>
> obj["/field/doesnt/exists"] = 3

Maybe more difficult, but certainly possible. If the complexity doesn't explode, I'd say that shouldn't be a primary concern, since this is all still pretty simple.

> It's much easier to write formatted-string paths.
> It allows future implementation of something like xpath/jquery style

Advanced path queries could indeed be interesting, possibly even more interesting if applied to the pull parser.

> If your json contains keys with "/" inside, you can still use old plain
> syntax...

A possible alternative would be to support some kind of escape syntax.

> String parsing it's quite easy (at compile time too) of course. If a
> part of path doesn't exists it works like a part of opt("a", "b", "c")
> doesn't. It's just syntax sugar. :)

Granted, it's not really much in this case, but you do get less static checking, which means that some things will only be caught at run time. Also, you'll get an ambiguity if you want to support array indices, too. Finally, it may even be security relevant, because an attacker might try to sneak in a key that contains slash characters to access/overwrite fields that would normally not be possible. So every user input that may end up in a path query will have to be validated first now.

> Does it works? Anyway it seems ambiguous:
> opt(...) == null   => false
> opt(...).isNull    => true

The former gets forwarded to Algebraic, while the latter is a method of the enclosing Nullable. I've tested it and it works. But I also agree it isn't particularly pretty in this case, but that's what we have in D as basic building blocks (or do we have an Optional type somewhere, yet).

>> The other possible approach, which would be more convenient to use,
>> would be add a "default value" overload to "opt", for example:
>> jv.opt("defval").foo.bar
>
> Isn't jv.opt("defval") taking the value of ("defval") rather than
> setting a default value?
>

It would be an opt with different semantics, just a theoretical alternative. This behavior would be mutually exclusive to the current opt.
July 30, 2015
On Thursday, 30 July 2015 at 04:41:51 UTC, Walter Bright wrote:
> I agree with your goal of readability. And if someone wants to write code that emphasizes it's JSON, they can write it as std.data.json.parseStream. (It's not about saving typing, it's about avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favor of establishing a new convention that repeats the module name. It eschews one of the advantages of having module name spaces in the first place, and evokes the old C style naming conventions.

Is there any reason why D doesn't allow json.parseStream() in this case? I remember the requirement of having the full module path being my first head scratcher while learning D. The first example in TDPL had some source code that called split() (if memory serves) and phobos had changed since the book was written and you needed to disambiguate. I found it very odd that you have type the whole thing when just the next level up would suffice to disambiguate it.

The trend seems to be toward more deeply nested modules in Phobos so having to type the full path will increasingly be a wart of D's.

If we can't have the minimal necessary module paths then I'm completely in favor of parseJSONStream over the more general parseStream. I want that "json" in there one way or another (preferably by the method which makes it optional while maintaining brevity).
July 30, 2015
On 2015-07-30 01:34, Walter Bright wrote:

> It if was returned as a range of nodes, it would be:
>
>     Object, string, number, string, array, number, number, end, string,
> object, string, number, end, end

Ah, that make sense. Never though of an "end" mark like that, pretty cleaver.

-- 
/Jacob Carlborg
July 30, 2015
On 2015-07-30 06:41, Walter Bright wrote:

> I agree with your goal of readability. And if someone wants to write
> code that emphasizes it's JSON, they can write it as
> std.data.json.parseStream. (It's not about saving typing, it's about
> avoiding extra redundant redundancy, I'm a big fan of Strunk & White :-)
> ) This is not a huge deal for me, but I'm not in favor of establishing a
> new convention that repeats the module name. It eschews one of the
> advantages of having module name spaces in the first place, and evokes
> the old C style naming conventions.

I kind of agree with that, but at the same time, if one always need to use the fully qualified name (or an alias) because there's a conflict then that's quite annoying.

A prefect example of that is the Path module in Tango. It has functions as "split" and "join". Every time I use it I alias the import:

import Path = tango.io.Path;

Because otherwise it will conflict with the string manipulating functions with the same names. In Phobos the names in the path module are different compared to the string functions.

For example, I think "Value" and "parse" are too generic to not include "JSON" in their name.

-- 
/Jacob Carlborg
July 30, 2015
On 7/30/2015 9:58 AM, Brad Anderson wrote:
> If we can't have the minimal necessary module paths then I'm completely in favor
> of parseJSONStream over the more general parseStream. I want that "json" in
> there one way or another (preferably by the method which makes it optional while
> maintaining brevity).

I would think it unlikely to be parsing two different formats in one file. But in any case, you can always do this:

   import std.data.json : parseJSON = parse;

Or put the import in a scope:

   void doNaughtyThingsWithJson()
   {
       import std.data.json;
       ...
       x.parse();
   }

The latter seems to be becoming the preferred D style.