August 03, 2014
On 8/3/14, 11:03 AM, Sönke Ludwig wrote:
> Am 03.08.2014 17:14, schrieb Andrei Alexandrescu:
[snip]
> Ah okay, *phew* ;) But in that case I'd actually think about leaving off
> the backslash decoding in the low level parser, so that slices could be
> used for immutable inputs in all cases - maybe with a name of
> "rawString" for the stored data and an additional "string" property that
> decodes on the fly. This may come in handy when the first comparative
> benchmarks together with rapidjson and the like are done.

Yah, that's awesome.

>> There's a public opCast(Payload) that gives the end user access to the
>> Payload inside a Value. I forgot to add documentation to it.
>
> I see. Suppose that opDispatch would be dropped, would anything speak
> against "alias this"ing _payload to avoid the need for the manually
> defined operators?

Correct. In fact the conversion was there but I removed it for the sake of opDispatch.

>> What advantages are to a tagged union? (FWIW: to me Algebraic and
>> Variant are also tagged unions, just that the tags are not 0, 1, ..., n.
>> That can be easily fixed for Algebraic by defining operations to access
>> the index of the currently-stored type.)
>
> The two major points are probably that it's possible to use "final
> switch" on the type tag if it's an enum,

So I just tried this: http://dpaste.dzfl.pl/eeadac68fac0. Sadly, the cast doesn't take. Without the cast the enum does compile, but not the switch. I submitted https://issues.dlang.org/show_bug.cgi?id=13247.

> and the type id can be easily stored in both integer and string form
> (which is not as conveniently possible with a TypeInfo).

I think here pointers to functions "win" because getting a string (or anything else for that matter) is an indirect call away.

std.variant has been among the first artifacts I wrote for D. It's a topic I've been dabbling in for a long time in a C++ context (http://goo.gl/zqUwFx), with always almost-satisfactory results. I told myself if I get to implement things in D properly, then this language has good potential. Replacing the integral tag I'd always used with a pointer to function is, I think, net progress. Things turned out fine, save for the switch matter.

> An enum based tagged union design also currently has the unfortunate
> property that the order of enum values and that of the accepted types
> must be defined consistently, or bad things will happen. Supporting UDAs
> on enum values would be a possible direction to fix this:
>
>      enum JsonType {
>          @variantType!string string,
>          @variantType!(JsonValue[]) array,
>          @variantType!(JsonValue[string]) object
>      }
>      alias JsonValue = TaggedUnion!JsonType;
>
> But then there are obviously still issues with cyclic type references.
> So, anyway, this is something that still requires some thought. It could
> also be designed in a way that is backwards compatible with a pure
> "Algebraic", so it shouldn't be a blocker for the current design.

I think something can be designed along these lines if necessary.


Andrei
August 03, 2014
On Sunday, 3 August 2014 at 17:40:48 UTC, Andrei Alexandrescu wrote:
> On 8/3/14, 10:19 AM, Sean Kelly wrote:
>> I don't want to pay for anything I don't use.  No allocations should
>> occur within the parser and it should simply slice up the input.
>
> What to do about arrays and objects, which would naturally allocate arrays and associative arrays respectively? What about strings with backslash-encoded characters?

This is tricky with a range. With an event-based parser I'd have events for object and array begin / end, but with a range you end up having an element that's a token, which is pretty weird. For encoded characters (and you need to make sure you handle surrogate pairs in your decoder) I'd still provide some means of decoding on demand. If nothing else, decode lazily when the user asks for the string value.  That way the user isn't paying to decode strings he isn't interested in.


> No allocation works for tokenization, but parsing is a whole different matter.
>
>> So the
>> lowest layer should allow me to iterate across symbols in some way.
>
> Yah, that would be the tokenizer.

But that will halt on comma and colon and such, correct?  That's a tad lower than I'd want, though I guess it would be easy enough to build a parser on top of it.


>> When I've done this in the past it was SAX-style (ie. a callback per
>> type) but with the range interface that shouldn't be necessary.
>>
>> The parser shouldn't decode or convert anything unless I ask it to.
>> Most of the time I only care about specific values, and paying for
>> conversions on everything is wasted process time.
>
> That's tricky. Once you scan for 2 specific characters you may as well scan for a couple more, the added cost is negligible. In contrast, scanning once for finding termination and then again for decoding purposes will definitely be a lot more expensive.

I think I'm getting a bit confused. For the JSON parser I wrote, the parser performs full validation but leaves the content as-is, then provides a routine to decode values from their string representation if the user wishes to. I'm not sure where scanning figures in here.
> Andrei

August 03, 2014
On Sunday, 3 August 2014 at 19:36:43 UTC, Sönke Ludwig wrote:
> Do you have a specific case in mind where the data format doesn't fit the process used by vibe.data.serialization? The data format iteration part *is* abstracted away there in basically a kind of traits structure (the "Serializer"). When serializing, the data always gets written in the order defined by the input value, while during deserialization the serializer defines how aggregates are iterated. This seems to fit all of the data formats that I had in mind.

For example we use special binary serialization format for structs where serialized content is actually a valid D struct - after updating internal array pointers one can simply do `cast(S*) buffer.ptr` and work with it normally. Doing this efficiently requires breadth-first traversal and keeping track of one upper level to update the pointers. This does not fit very well with classical depth-first recursive traversal usually required by JSON-structure formats.
August 04, 2014
On Sunday, 3 August 2014 at 17:19:04 UTC, Sean Kelly wrote:

> Is there support for output?  I see the makeArray and makeObject routines...  Ideally, there should be a way to serialize JSON against an OutputRange with optional formatting.

I think it should only provide very primitive functions to serialize basic data types. Then Phobos should provide a separate module/package for generic serialization where JSON is an archive type using this module as its backend.

--
/Jacob Carlborg
August 04, 2014
On Sunday, 3 August 2014 at 18:44:37 UTC, Dicebot wrote:

> Before going this route one needs to have a good vision how it may interact with imaginary std.serialization to avoid later deprecation.

I suggest only provide functions for serializing primitive types. A separate serialization module/package with a JSON archive type would use this module as its backend.

> At the same time I have recently started to think that dedicated serialization module that decouples aggregate iteration from data storage format is in most cases impractical for performance reasons - different serialization methods imply very different efficient iteration strategies. Probably it is better to define serialization compile-time traits instead and require each `std.data.*` provider to implement those on its own in the most effective fashion.

I'm not sure I agree with that. In my work on std.serialization I have not seen this to be a problem. What problems have you found?

--
/Jacob Carlborg
August 04, 2014
On Sunday, 3 August 2014 at 20:40:47 UTC, Sean Kelly wrote:

> This is tricky with a range. With an event-based parser I'd have events for object and array begin / end, but with a range you end up having an element that's a token, which is pretty weird.

Have a look at Token.Kind in the top of the module [1]. The enum has objectStart, objectEnd, arrayStart and arrayEnd. By just looking that that, it seems it already works very similar to an event parser, but with a range API. This is exactly like the XML pull parser in Tango.

[1] http://erdani.com/d/jgrandson.d

--
/Jacob Carlborg
August 04, 2014
On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu wrote:
> We need a better json library at Facebook. I'd discussed with Sönke the possibility of taking vibe.d's json to std but he said it needs some more work. So I took std.jgrandson to proof of concept state and hence ready for destruction:
>
> http://erdani.com/d/jgrandson.d
> http://erdani.com/d/phobos-prerelease/std_jgrandson.html
>
> Here are a few differences compared to vibe.d's library. I think these are desirable to have in that library as well:
>
> * Parsing strings is decoupled into tokenization (which is lazy and only needs an input range) and parsing proper. Tokenization is lazy, which allows users to create their own advanced (e.g. partial/lazy) parsing if needed. The parser itself is eager.
>
> * There's no decoding of strings.
>
> * The representation is built on Algebraic, with the advantages that it benefits from all of its primitives. Implementation is also very compact because Algebraic obviates a bunch of boilerplate. Subsequent improvements to Algebraic will also reflect themselves into improvements to std.jgrandson.
>
> * The JSON value (called std.jgrandson.Value) has no named member variables or methods except for __payload. This is so there's no clash between dynamic properties exposed via opDispatch.
>
> Well that's about it. What would it take for this to become a Phobos proposal? Destroy.
>
>
> Andrei

On my bson library I found very useful to have some methods to know if a field exists or not, and to get a "defaulted" value. Something like:

auto assume(T)(Value v, T default = T.init);

Another good method could be something like xpath to get a deep value:

Value v = value["/path/to/sub/object"];

Moreover in my library I actually have three different methods to read a value:

T get(T)() // Exception if value is not a T or not valid or value doesn't exist
T to(T)()  // Try to convert value to T using to!string. Exception if doesn't exists or not valid

BsonField!T as(T)(lazy T default = T.init)  // Always return a value

BsonField!T is an "alias this"-ed struct with two fields: T value and bool error(). T value is the aliased field, and error() tells you if value is defaulted (because of an error: field not exists or can't convert to T)

So I can write something like this:

int myvalue = json["/that/deep/property"].as!int;

or

auto myvalue = json["/that/deep/property"].as!int(10);

if (myvalue.error) writeln("Property doesn't exists, I'm using default value);

writeln("Property value: ", myvalue);

I hope this can be useful...



August 04, 2014
On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu wrote:
> We need a better json library at Facebook. I'd discussed with Sönke the possibility of taking vibe.d's json to std but he said it needs some more work. So I took std.jgrandson to proof of concept state and hence ready for destruction:
>
> http://erdani.com/d/jgrandson.d
> http://erdani.com/d/phobos-prerelease/std_jgrandson.html

* Could you please put it on Github to get syntax highlighting and all the other advantages
* It doesn't completely follow the Phobos naming conventions
* The indentation is off in some places
* The unit tests is a bit lacking for the separate parsing functions
* There are methods for getting the strings and numbers, what about booleans?
* Shouldn't it be called TokenRange?
* Shouldn't this be built using the lexer generator you so strongly have been pushing for?

* The unit tests for TokenStream is very dense. I would prefer empty newlines for grouping "assert" and calls to "popFront" belonging together

--
/Jacob Carlborg
August 04, 2014
"Jacob Carlborg"  wrote in message news:bjecckhwlmkwkeqegwqa@forum.dlang.org...

> I suggest only provide functions for serializing primitive types.

This is exactly what I need in most projects.  Basic types, arrays, AAs, and structs are usually enough. 

August 04, 2014
On Sunday, 3 August 2014 at 19:54:12 UTC, Sönke Ludwig wrote:
> Am 03.08.2014 20:57, schrieb w0rp:
>> My issue with is is that if you ask for a key in an object which doesn't
>> exist, you get an 'undefined' value back, just like JavaScript. I'd
>> rather that be propagated as a RangeError, which is more consistent with
>> associative arrays in the language and probably more correct.
>
> Yes, this is what I meant with the JavaScript part of API. In addition to opIndex(), there should of course also be a .get(key, default_value) style accessor and the "in" operator.

There is a parallel discussion about the concept of associative ranges:
http://forum.dlang.org/thread/jheurakujksdlrjaoncs@forum.dlang.org

Maybe you could also have a look there, because JSON seems to be a good candidate for an associative range.