Jump to page: 1 28  
Page
Thread overview
std.jgrandson
Aug 03, 2014
Johannes Pfau
Aug 03, 2014
ponce
Aug 03, 2014
Sönke Ludwig
Aug 03, 2014
Dicebot
Aug 03, 2014
Sönke Ludwig
Aug 03, 2014
Dicebot
Aug 04, 2014
Jacob Carlborg
Aug 04, 2014
Daniel Murphy
Aug 04, 2014
Jacob Carlborg
Aug 05, 2014
Daniel Murphy
Aug 05, 2014
Andrea Fontana
Aug 05, 2014
Daniel Murphy
Aug 05, 2014
Dicebot
Aug 05, 2014
Sean Kelly
Aug 05, 2014
Dicebot
Aug 05, 2014
Marc Schütz
Aug 05, 2014
H. S. Teoh
Aug 06, 2014
Andrea Fontana
Aug 05, 2014
Jacob Carlborg
Aug 05, 2014
Daniel Murphy
Aug 06, 2014
Jacob Carlborg
Aug 06, 2014
Daniel Murphy
Aug 06, 2014
Jacob Carlborg
Aug 06, 2014
Daniel Murphy
Aug 06, 2014
Sean Kelly
Aug 04, 2014
Dicebot
Aug 04, 2014
Jacob Carlborg
Aug 04, 2014
Dicebot
Aug 04, 2014
Jacob Carlborg
Aug 05, 2014
Sönke Ludwig
Aug 05, 2014
Dicebot
Aug 05, 2014
Jacob Carlborg
Aug 03, 2014
bearophile
Aug 03, 2014
Sönke Ludwig
Aug 03, 2014
Dicebot
Aug 03, 2014
Sönke Ludwig
Aug 04, 2014
Wyatt
Aug 03, 2014
Johannes Pfau
Aug 03, 2014
Johannes Pfau
Aug 03, 2014
Sönke Ludwig
Aug 03, 2014
w0rp
Aug 03, 2014
Sönke Ludwig
Aug 04, 2014
Marc Schütz
Aug 05, 2014
Sönke Ludwig
Aug 03, 2014
w0rp
Aug 03, 2014
Daniel Gibson
Aug 03, 2014
Sean Kelly
Aug 03, 2014
Dmitry Olshansky
Aug 03, 2014
Dmitry Olshansky
Aug 03, 2014
Sean Kelly
Aug 04, 2014
Jacob Carlborg
Aug 04, 2014
Jacob Carlborg
Aug 03, 2014
Orvid King
Aug 04, 2014
Andrea Fontana
Aug 05, 2014
Andrea Fontana
Aug 04, 2014
Jacob Carlborg
Aug 04, 2014
Jacob Carlborg
August 03, 2014
We need a better json library at Facebook. I'd discussed with Sönke the possibility of taking vibe.d's json to std but he said it needs some more work. So I took std.jgrandson to proof of concept state and hence ready for destruction:

http://erdani.com/d/jgrandson.d
http://erdani.com/d/phobos-prerelease/std_jgrandson.html

Here are a few differences compared to vibe.d's library. I think these are desirable to have in that library as well:

* Parsing strings is decoupled into tokenization (which is lazy and only needs an input range) and parsing proper. Tokenization is lazy, which allows users to create their own advanced (e.g. partial/lazy) parsing if needed. The parser itself is eager.

* There's no decoding of strings.

* The representation is built on Algebraic, with the advantages that it benefits from all of its primitives. Implementation is also very compact because Algebraic obviates a bunch of boilerplate. Subsequent improvements to Algebraic will also reflect themselves into improvements to std.jgrandson.

* The JSON value (called std.jgrandson.Value) has no named member variables or methods except for __payload. This is so there's no clash between dynamic properties exposed via opDispatch.

Well that's about it. What would it take for this to become a Phobos proposal? Destroy.


Andrei
August 03, 2014
Am Sun, 03 Aug 2014 00:16:04 -0700
schrieb Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org>:

> We need a better json library at Facebook. I'd discussed with Sönke the possibility of taking vibe.d's json to std but he said it needs some more work. So I took std.jgrandson to proof of concept state and hence ready for destruction:
> 
> http://erdani.com/d/jgrandson.d http://erdani.com/d/phobos-prerelease/std_jgrandson.html
> 
> Here are a few differences compared to vibe.d's library. I think these are desirable to have in that library as well:
> 
> * Parsing strings is decoupled into tokenization (which is lazy and only needs an input range) and parsing proper. Tokenization is lazy, which allows users to create their own advanced (e.g. partial/lazy) parsing if needed. The parser itself is eager.
> 
> * There's no decoding of strings.
> 
> * The representation is built on Algebraic, with the advantages that it benefits from all of its primitives. Implementation is also very compact because Algebraic obviates a bunch of boilerplate. Subsequent improvements to Algebraic will also reflect themselves into improvements to std.jgrandson.
> 
> * The JSON value (called std.jgrandson.Value) has no named member variables or methods except for __payload. This is so there's no clash between dynamic properties exposed via opDispatch.
> 
> Well that's about it. What would it take for this to become a Phobos proposal? Destroy.
> 
> 
> Andrei

API looks great but I'd like to see some simple serialize/deserialize functions as in vibed: http://vibed.org/api/vibe.data.json/deserializeJson http://vibed.org/api/vibe.data.json/serializeToJson

vibe uses UDAs to customize the serialization output. That's actually not json specific and therefore shouldn't be part of this module. But a simple deserializeJson which simply fills in all fields of a struct given a TokenStream is very useful and can be done without allocations (so it's much faster than going through the DOM).

Nitpicks:

* I'd make Token only store strings, then convert to double/number only
  when requested. If a user is simply skipping some tokens these
  conversions are unnecessary overhead.
* parseString really shouldn't use appender. Make it somehow possible
  to supply a buffer to TokenStream and use that. (This way there's no
  memory allocation. If a user want to keep the string he has to .dup
  it). A BufferedRange concept might even be better, because you can
  read in blocks and reuse buffers.

August 03, 2014
Andrei Alexandrescu:

> * The representation is built on Algebraic,

Good.

But here I'd like a little more readable type:

alias Payload = std.variant.VariantN!(16LU, typeof(null), bool, double, string, Value[], Value[string]).VariantN;

Like:

alias Payload = std.variant.Algebraic!(typeof(null), bool, double, string, Value[], Value[string]);

Bye,
bearophile
August 03, 2014
>
> API looks great but I'd like to see some simple serialize/deserialize
> functions as in vibed:
> http://vibed.org/api/vibe.data.json/deserializeJson
> http://vibed.org/api/vibe.data.json/serializeToJson
>
> vibe uses UDAs to customize the serialization output. That's actually
> not json specific and therefore shouldn't be part of this module. But a
> simple deserializeJson which simply fills in all fields of a struct
> given a TokenStream is very useful and can be done without allocations
> (so it's much faster than going through the DOM).

That's what https://github.com/Orvid/JSONSerialization does.
Also msgpack-d https://github.com/msgpack/msgpack-d whose defaults need no UDAs

That makes the typical use case very fast to write.
August 03, 2014
A few thoughts based on my experience with vibe.data.json:

1. No decoding of strings appears to mean that "Value" also always contains encoded strings. This seems the be a leaky and also error prone leaky abstraction. For the token stream, performance should be top priority, so it's okay to not decode there, but "Value" is a high level abstraction of a JSON value, so it should really hide all implementation details of the storage format.

2. Algebraic is a good choice for its generic handling of operations on the contained types (which isn't exposed here, though). However, a tagged union type in my experience has quite some advantages for usability. Since adding a type tag possibly affects the interface in a non-backwards compatible way, this should be evaluated early on.

2.b) I'm currently working on a generic tagged union type that also enables operations between values in a natural generic way. This has the big advantage of not having to manually define operators like in "Value", which is error prone and often limited (I've had to make many fixes and additions in this part of the code over time).

3. Use of "opDispatch" for an open set of members has been criticized for vibe.data.json before and I agree with that criticism. The only advantage is saving a few keystrokes (json.key instead of json["key"]), but I came to the conclusion that the right approach to work with JSON values in D is to always directly deserialize when/if possible anyway, which mostly makes this is a moot point.

This approach has a lot of advantages, e.g. reduction of allocations, performance of field access and avoiding typos when accessing fields. Especially the last point is interesting, because opDispatch based field access gives the false impression that a static field is accessed.

The decision to minimize the number of static fields within "Value" reduces the chance of accidentally accessing a static field instead of hitting opDispatch, but there are still *some* static fields/methods and any later addition of a symbol must now be considered a breaking change.

3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and "assume" certainly look like they could be used with UFCS, but opDispatch destroys that possibility.

4. I know the stance on this is often "The D module system has enough facilities to disambiguate" (which is not really a valid argument, but rather just the lack of a counter argument, IMO), but I highly dislike the choice to leave off any mention of "JSON" or "Json" in the global symbol names. Using the module either requires to always use a renamed import or a manual alias, or the resulting source code will always leave the reader wondering what kind of data is actually handled. Handling multiple "value" types in a single piece of code, which is not uncommon (e.g. JSON + BSON/ini value/...) would always require explicit disambiguation. I'd certainly include the "JSON" or "Json" part in the names.

5. Whatever happens, *please* let's aim for a module name of std.data.json (similar to std.digest.*), so that any data formats added later are nicely organized. All existing data format support (XML + CSV) doesn't follow contemporary Phobos style, so they will need to be deprecated at some point anyway, freeing the way for a clean an non-breaking transition to a more organized module hierarchy.

6. (Possibly compile time optional) support for keeping track of line/column numbers is often important for better error messages, so that would be good to have included as part of the parser and in the "Token" type.

Sönke
August 03, 2014
Am 03.08.2014 10:25, schrieb ponce:
>>
>> API looks great but I'd like to see some simple serialize/deserialize
>> functions as in vibed:
>> http://vibed.org/api/vibe.data.json/deserializeJson
>> http://vibed.org/api/vibe.data.json/serializeToJson
>>
>> vibe uses UDAs to customize the serialization output. That's actually
>> not json specific and therefore shouldn't be part of this module. But a
>> simple deserializeJson which simply fills in all fields of a struct
>> given a TokenStream is very useful and can be done without allocations
>> (so it's much faster than going through the DOM).
>
> That's what https://github.com/Orvid/JSONSerialization does.
> Also msgpack-d https://github.com/msgpack/msgpack-d whose defaults need
> no UDAs
>
> That makes the typical use case very fast to write.

The default mode for vibe.data.serialization also doesn't need any UDAs, but it's still often useful to be able to make customizations.
August 03, 2014
On 8/3/14, 1:19 AM, bearophile wrote:
> Andrei Alexandrescu:
>
>> * The representation is built on Algebraic,
>
> Good.
>
> But here I'd like a little more readable type:
>
> alias Payload = std.variant.VariantN!(16LU, typeof(null), bool, double,
> string, Value[], Value[string]).VariantN;
>
> Like:
>
> alias Payload = std.variant.Algebraic!(typeof(null), bool, double,
> string, Value[], Value[string]);

Yah, the latter is in the code. It's a ddoc problem. -- Andrei


August 03, 2014
On 8/3/14, 1:02 AM, Johannes Pfau wrote:
> API looks great but I'd like to see some simple serialize/deserialize
> functions as in vibed:
> http://vibed.org/api/vibe.data.json/deserializeJson
> http://vibed.org/api/vibe.data.json/serializeToJson

Agreed.

> vibe uses UDAs to customize the serialization output. That's actually
> not json specific and therefore shouldn't be part of this module. But a
> simple deserializeJson which simply fills in all fields of a struct
> given a TokenStream is very useful and can be done without allocations
> (so it's much faster than going through the DOM).

Nice.

> Nitpicks:
>
> * I'd make Token only store strings, then convert to double/number only
>    when requested. If a user is simply skipping some tokens these
>    conversions are unnecessary overhead.

Well... this is tricky. If the input has immutable characters, they can be stored because it can be assumed they'll live forever. If they're mutable or const, that assumption doesn't hold so every number must allocate. At that point it's probably cheaper to just convert to double.

One thing is I didn't treat integers specially, but I did notice some json parsers do make that distinction.

> * parseString really shouldn't use appender. Make it somehow possible
>    to supply a buffer to TokenStream and use that. (This way there's no
>    memory allocation. If a user want to keep the string he has to .dup
>    it). A BufferedRange concept might even be better, because you can
>    read in blocks and reuse buffers.

Good suggestion, thanks.


Andrei


August 03, 2014
On Sunday, 3 August 2014 at 07:16:05 UTC, Andrei Alexandrescu wrote:
> We need a better json library at Facebook. I'd discussed with Sönke the possibility of taking vibe.d's json to std but he said it needs some more work. So I took std.jgrandson to proof of concept state and hence ready for destruction:
>
> http://erdani.com/d/jgrandson.d
> http://erdani.com/d/phobos-prerelease/std_jgrandson.html
>
> Here are a few differences compared to vibe.d's library. I think these are desirable to have in that library as well:
>
> * Parsing strings is decoupled into tokenization (which is lazy and only needs an input range) and parsing proper. Tokenization is lazy, which allows users to create their own advanced (e.g. partial/lazy) parsing if needed. The parser itself is eager.
>
> * There's no decoding of strings.
>
> * The representation is built on Algebraic, with the advantages that it benefits from all of its primitives. Implementation is also very compact because Algebraic obviates a bunch of boilerplate. Subsequent improvements to Algebraic will also reflect themselves into improvements to std.jgrandson.
>
> * The JSON value (called std.jgrandson.Value) has no named member variables or methods except for __payload. This is so there's no clash between dynamic properties exposed via opDispatch.
>
> Well that's about it. What would it take for this to become a Phobos proposal? Destroy.
>
>
> Andrei

I like it. Here's what I think about it.

* When I wrote my JSON library, the thing I wanted most was
constructors and opAssign functions for creating JSON values easily.
JSON x = "some string"; You have this, so it's great.
* You didn't include an 'undefined' value like vibe.d, which is a very
minor detail, but something I dislike. This is good.
* I'd just name Value either 'JSON' or 'JSONValue.' So you can just import the module without using aliases.
* opDispatch is kind of "meh" for JSON objects. It works until you hit a name clash with a UFCS function. I don't mind typing the extra three characters.

That's all I could think of really.
August 03, 2014
On 8/3/14, 2:38 AM, Sönke Ludwig wrote:
> A few thoughts based on my experience with vibe.data.json:
>
> 1. No decoding of strings appears to mean that "Value" also always
> contains encoded strings. This seems the be a leaky and also error prone
> leaky abstraction. For the token stream, performance should be top
> priority, so it's okay to not decode there, but "Value" is a high level
> abstraction of a JSON value, so it should really hide all implementation
> details of the storage format.

Nonono. I think there's a confusion. The input strings are not UTF decoded for the simple need there's no need (all tokenization decisions are taken on the basis of ASCII characters/code units). The backslash-prefixed characters are indeed decoded.

An optimization I didn't implement yet is to use slices of the input wherever possible (when the input is string, immutable(byte)[], or immutable(ubyte)[]). That will reduce allocations considerably.

> 2. Algebraic is a good choice for its generic handling of operations on
> the contained types (which isn't exposed here, though). However, a
> tagged union type in my experience has quite some advantages for
> usability. Since adding a type tag possibly affects the interface in a
> non-backwards compatible way, this should be evaluated early on.

There's a public opCast(Payload) that gives the end user access to the Payload inside a Value. I forgot to add documentation to it.

What advantages are to a tagged union? (FWIW: to me Algebraic and Variant are also tagged unions, just that the tags are not 0, 1, ..., n. That can be easily fixed for Algebraic by defining operations to access the index of the currently-stored type.)

> 2.b) I'm currently working on a generic tagged union type that also
> enables operations between values in a natural generic way. This has the
> big advantage of not having to manually define operators like in
> "Value", which is error prone and often limited (I've had to make many
> fixes and additions in this part of the code over time).

I did notice that vibe.json has quite a repetitive implementation, so reducing it would be great.

The way I see it, good work on tagged unions must be either integrated within std.variant (either by modifying Variant/Algebraic or by adding new types to it). I am very strongly opposed to adding a tagged union type only for JSON purposes, which I'd consider essentially a usability bug in std.variant, the opposite of dogfooding, etc.

> 3. Use of "opDispatch" for an open set of members has been criticized
> for vibe.data.json before and I agree with that criticism. The only
> advantage is saving a few keystrokes (json.key instead of json["key"]),
> but I came to the conclusion that the right approach to work with JSON
> values in D is to always directly deserialize when/if possible anyway,
> which mostly makes this is a moot point.

Interesting. Well if experience with opDispatch is negative then it should probably not be used here, or only offered on an opt-in basis.

> This approach has a lot of advantages, e.g. reduction of allocations,
> performance of field access and avoiding typos when accessing fields.
> Especially the last point is interesting, because opDispatch based field
> access gives the false impression that a static field is accessed.

Good point.

> The decision to minimize the number of static fields within "Value"
> reduces the chance of accidentally accessing a static field instead of
> hitting opDispatch, but there are still *some* static fields/methods and
> any later addition of a symbol must now be considered a breaking change.

Right now the idea is that the only named member is __payload. Well then there's opXxxx as well. The idea is/was to add all other functionality as free functions.

> 3.b) Bad interaction of UFCS and opDispatch: Functions like "remove" and
> "assume" certainly look like they could be used with UFCS, but
> opDispatch destroys that possibility.

Yah, agreed.

The bummer is people coming from Python won't be able to continue using the same style without opDispatch.

> 4. I know the stance on this is often "The D module system has enough
> facilities to disambiguate" (which is not really a valid argument, but
> rather just the lack of a counter argument, IMO), but I highly dislike
> the choice to leave off any mention of "JSON" or "Json" in the global
> symbol names. Using the module either requires to always use a renamed
> import or a manual alias, or the resulting source code will always leave
> the reader wondering what kind of data is actually handled. Handling
> multiple "value" types in a single piece of code, which is not uncommon
> (e.g. JSON + BSON/ini value/...) would always require explicit
> disambiguation. I'd certainly include the "JSON" or "Json" part in the
> names.

Good point, I agree.

> 5. Whatever happens, *please* let's aim for a module name of
> std.data.json (similar to std.digest.*), so that any data formats added
> later are nicely organized. All existing data format support (XML + CSV)
> doesn't follow contemporary Phobos style, so they will need to be
> deprecated at some point anyway, freeing the way for a clean an
> non-breaking transition to a more organized module hierarchy.

I agree.

> 6. (Possibly compile time optional) support for keeping track of
> line/column numbers is often important for better error messages, so
> that would be good to have included as part of the parser and in the
> "Token" type.

Yah, saw that in vibe.d but forgot about it.


Thanks,

Andrei


« First   ‹ Prev
1 2 3 4 5 6 7 8