August 03, 2014
Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
> On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
> [snip]
>
> We need to address the matter of std.jgrandson competing with
> vibe.data.json. Clearly at a point only one proposal will have to be
> accepted so the other would be wasted work.
>
> Following our email exchange I decided to work on this because (a) you
> mentioned more work is needed and your schedule was unclear, (b) we need
> this at FB sooner rather than later, (c) there were a few things I
> thought can be improved in vibe.data.json. I hope that taking
> std.jgrandson to proof spurs things into action.
>
> Would you want to merge some of std.jgrandson's deltas into a new
> proposal std.data.json based on vibe.data.json? Here's a few things that
> I consider necessary:
>
> 1. Commit to a schedule. I can't abandon stuff in wait for the perfect
> design that may or may not come someday.

This may be the crux w.r.t. the vibe.data.json implementation. My schedule will be very crowded this month, so I could only really start to work on it beginning of September. But apart from the mentioned points, I think your implementation is already the closest thing to what I have in mind, so I'm all for going the clean slate route (I'll have to do a lot in terms of deprecation work in vibe.d anyway).

>
> 2. Avoid UTF decoding.
>
> 3. Offer a lazy token stream as a basis for a non-lazy parser. A lazy
> general parser would be considerably more difficult to write and would
> only serve a small niche. On the other hand, a lazy tokenizer is easy to
> write and make efficient, and serve as a basis for user-defined
> specialized lazy parsers if the user wants so.
>
> 4. Avoid string allocation. String allocation can be replaced with
> slices of the input when these two conditions are true: (a) input type
> is string, immutable(byte)[], or immutable(ubyte)[]; (b) there are no
> backslash-encoded sequences in the string, i.e. the input string and the
> actual string are the same.
>
> 5. Build on std.variant through and through. Again, anything that
> doesn't work is a usability bug in std.variant, which was designed for
> exactly this kind of stuff. Exposing the representation such that user
> code benefits of the Algebraic's primitives may be desirable.
>
> 6. Address w0rp's issue with undefined. In fact std.Algebraic does have
> an uninitialized state :o).
>
> Sönke, what do you think?

My requirements would be the same, except for 6.

The "undefined" state in the vibe.d version was necessary due to early API decisions and it's more or less a prominent part of it (specifically because the API was designed to behave similar to JavaScript). In hindsight, I'd definitely avoid that. However, I don't think its existence (also in the form of Algebraic.init) is an issue per se, as long as such values are properly handled when converting the runtime value back to a JSON string (i.e. skipped or treated as null values).

August 03, 2014
On Sunday, 3 August 2014 at 08:04:40 UTC, Johannes Pfau wrote:
> API looks great but I'd like to see some simple serialize/deserialize
> functions as in vibed:
> http://vibed.org/api/vibe.data.json/deserializeJson
> http://vibed.org/api/vibe.data.json/serializeToJson

Before going this route one needs to have a good vision how it may interact with imaginary std.serialization to avoid later deprecation.

At the same time I have recently started to think that dedicated serialization module that decouples aggregate iteration from data storage format is in most cases impractical for performance reasons - different serialization methods imply very different efficient iteration strategies. Probably it is better to define serialization compile-time traits instead and require each `std.data.*` provider to implement those on its own in the most effective fashion.
August 03, 2014
On Sunday, 3 August 2014 at 18:37:48 UTC, Sönke Ludwig wrote:
> Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
>> 6. Address w0rp's issue with undefined. In fact std.Algebraic does have
>> an uninitialized state :o).
>
> My requirements would be the same, except for 6.
>
> The "undefined" state in the vibe.d version was necessary due to early API decisions and it's more or less a prominent part of it (specifically because the API was designed to behave similar to JavaScript). In hindsight, I'd definitely avoid that. However, I don't think its existence (also in the form of Algebraic.init) is an issue per se, as long as such values are properly handled when converting the runtime value back to a JSON string (i.e. skipped or treated as null values).

My issue with is is that if you ask for a key in an object which doesn't exist, you get an 'undefined' value back, just like JavaScript. I'd rather that be propagated as a RangeError, which is more consistent with associative arrays in the language and probably more correct. A minor issue is being able to create a Json object which isn't a valid Json object by itself. I'd rather the initial value was just 'null', which would match how pointers and class instances behave in the language.
August 03, 2014
On 8/3/14, 11:08 AM, Johannes Pfau wrote:
> Am Sun, 03 Aug 2014 09:17:57 -0700
> schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:
>
>> On 8/3/14, 8:51 AM, Johannes Pfau wrote:
>>>
>>> Variant uses TypeInfo internally, right?
>>
>> No.
>>
>
> https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L210

That's a query for the TypeInfo.

> https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L371

That could be translated to a comparison of pointers to functions.

> https://github.com/D-Programming-Language/phobos/blob/master/std/variant.d#L696

That, too, could be translated to a comparison of pointers to functions.

It's a confision Let me clarify this. What Variant does is to use pointers to functions instead of integers. The space overhead (one word) is generally the same due to alignment issues.

> Also the handler function concept will always have more overhead than a
> simple tagged union. It is certainly useful if you want to store any
> type, but if you only want a limited set of types there are more
> efficient implementations.

I'm not sure at all actually. The way I see it a pointer to a function offers most everything an integer does, plus universal functionality by actually calling the function. What it doesn't offer is ordering of small integers, but that can be easily arranged at a small cost.


Andrei

August 03, 2014
Am 03.08.2014 20:44, schrieb Dicebot:
> On Sunday, 3 August 2014 at 08:04:40 UTC, Johannes Pfau wrote:
>> API looks great but I'd like to see some simple serialize/deserialize
>> functions as in vibed:
>> http://vibed.org/api/vibe.data.json/deserializeJson
>> http://vibed.org/api/vibe.data.json/serializeToJson
>
> Before going this route one needs to have a good vision how it may
> interact with imaginary std.serialization to avoid later deprecation.
>
> At the same time I have recently started to think that dedicated
> serialization module that decouples aggregate iteration from data
> storage format is in most cases impractical for performance reasons -
> different serialization methods imply very different efficient iteration
> strategies. Probably it is better to define serialization compile-time
> traits instead and require each `std.data.*` provider to implement those
> on its own in the most effective fashion.

Do you have a specific case in mind where the data format doesn't fit the process used by vibe.data.serialization? The data format iteration part *is* abstracted away there in basically a kind of traits structure (the "Serializer"). When serializing, the data always gets written in the order defined by the input value, while during deserialization the serializer defines how aggregates are iterated. This seems to fit all of the data formats that I had in mind.
August 03, 2014
On 8/3/14, 11:37 AM, Sönke Ludwig wrote:
> Am 03.08.2014 17:34, schrieb Andrei Alexandrescu:
>> On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
>> [snip]
>>
>> We need to address the matter of std.jgrandson competing with
>> vibe.data.json. Clearly at a point only one proposal will have to be
>> accepted so the other would be wasted work.
>>
>> Following our email exchange I decided to work on this because (a) you
>> mentioned more work is needed and your schedule was unclear, (b) we need
>> this at FB sooner rather than later, (c) there were a few things I
>> thought can be improved in vibe.data.json. I hope that taking
>> std.jgrandson to proof spurs things into action.
>>
>> Would you want to merge some of std.jgrandson's deltas into a new
>> proposal std.data.json based on vibe.data.json? Here's a few things that
>> I consider necessary:
>>
>> 1. Commit to a schedule. I can't abandon stuff in wait for the perfect
>> design that may or may not come someday.
>
> This may be the crux w.r.t. the vibe.data.json implementation. My
> schedule will be very crowded this month, so I could only really start
> to work on it beginning of September. But apart from the mentioned
> points, I think your implementation is already the closest thing to what
> I have in mind, so I'm all for going the clean slate route (I'll have to
> do a lot in terms of deprecation work in vibe.d anyway).

What would be your estimated time of finishing?

Would anyone want to take vibe.data.json and std.jgrandson, put them in a crucible, and have std.data.json emerge from it in a timely manner? My understanding is that everyone involved would be cool with that.


Andrei

August 03, 2014
Am 03.08.2014 20:57, schrieb w0rp:
> On Sunday, 3 August 2014 at 18:37:48 UTC, Sönke Ludwig wrote:
>>
>> The "undefined" state in the vibe.d version was necessary due to early
>> API decisions and it's more or less a prominent part of it
>> (specifically because the API was designed to behave similar to
>> JavaScript). In hindsight, I'd definitely avoid that. However, I don't
>> think its existence (also in the form of Algebraic.init) is an issue
>> per se, as long as such values are properly handled when converting
>> the runtime value back to a JSON string (i.e. skipped or treated as
>> null values).
>
> My issue with is is that if you ask for a key in an object which doesn't
> exist, you get an 'undefined' value back, just like JavaScript. I'd
> rather that be propagated as a RangeError, which is more consistent with
> associative arrays in the language and probably more correct.

Yes, this is what I meant with the JavaScript part of API. In addition to opIndex(), there should of course also be a .get(key, default_value) style accessor and the "in" operator.

> A minor
> issue is being able to create a Json object which isn't a valid Json
> object by itself. I'd rather the initial value was just 'null', which
> would match how pointers and class instances behave in the language.

This is what I meant with not being an issue by itself. But having such a special value of course has its pros and cons, and I could personally definitely also live with JSON values being initialized to JSON "null", if somebody hacks Algebraic to support that kind of use case.
August 03, 2014
03-Aug-2014 21:40, Andrei Alexandrescu пишет:
> On 8/3/14, 10:19 AM, Sean Kelly wrote:
>> I don't want to pay for anything I don't use.  No allocations should
>> occur within the parser and it should simply slice up the input.
>
> What to do about arrays and objects, which would naturally allocate
> arrays and associative arrays respectively? What about strings with
> backslash-encoded characters?
>

SAX-style would imply that array is "parsed" by calling 6 user-defined callbacks inside of a parser:
startArray, endArray, startObject, endObject, id and value.

A simplified pseudo-code of JSON-parser inner loop is then:

if(cur == '[')
       startArray();
else if(cur == '{'){
	startObject();
else if(cur == '}')
	endObject();
else if(cur == ']')
	endArray();
else{
     if(expectObjectKey){
	id(parseAsIdentifier());
     }
     else
	value(parseAsValue());
}

This is as barebones as it can get and is very fast in practice esp. in context of searching/extracting/matching specific sub-tries of JSON documents.

-- 
Dmitry Olshansky
August 03, 2014
03-Aug-2014 23:54, Dmitry Olshansky пишет:
> 03-Aug-2014 21:40, Andrei Alexandrescu пишет:
> A simplified pseudo-code of JSON-parser inner loop is then:
>
> if(cur == '[')
>         startArray();
> else if(cur == '{'){

Aw. Stray brace..


-- 
Dmitry Olshansky
August 03, 2014
On 8/3/2014 2:16 AM, Andrei Alexandrescu wrote:
> We need a better json library at Facebook. I'd discussed with Sönke the
> possibility of taking vibe.d's json to std but he said it needs some
> more work. So I took std.jgrandson to proof of concept state and hence
> ready for destruction:
>
> http://erdani.com/d/jgrandson.d
> http://erdani.com/d/phobos-prerelease/std_jgrandson.html
>
> Here are a few differences compared to vibe.d's library. I think these
> are desirable to have in that library as well:
>
> * Parsing strings is decoupled into tokenization (which is lazy and only
> needs an input range) and parsing proper. Tokenization is lazy, which
> allows users to create their own advanced (e.g. partial/lazy) parsing if
> needed. The parser itself is eager.
>
> * There's no decoding of strings.
>
> * The representation is built on Algebraic, with the advantages that it
> benefits from all of its primitives. Implementation is also very compact
> because Algebraic obviates a bunch of boilerplate. Subsequent
> improvements to Algebraic will also reflect themselves into improvements
> to std.jgrandson.
>
> * The JSON value (called std.jgrandson.Value) has no named member
> variables or methods except for __payload. This is so there's no clash
> between dynamic properties exposed via opDispatch.
>
> Well that's about it. What would it take for this to become a Phobos
> proposal? Destroy.
>
>
> Andrei

If your looking for serialization from statically known type layouts then I believe my JSON (de)serialization code (https://github.com/Orvid/JSONSerialization) might actually be of interest to you, as it uses no intermediate representation, nor does it allocate when it converts an object to JSON. As far as I know, even when only compiled with DMD, it's among the fastest JSON (de)serialization libraries.

Unless it needs to convert a floating point number to a string, in which case I suppose you could certainly use a local buffer to write to, but at the moment it just converts it to a normal string that gets written to the output range. It also supports (de)serializing from, what I called at the time, dynamic types, such as std.variant, which isn't actually supported because that code is only there because I needed it for something else, and wasn't using std.variant at the time.