Thread overview
[phobos] For review: Improvements to std.variant and std.json
Nov 20, 2010
Robert Jacques
Feb 13, 2011
Lutger Blijdestijn
Feb 18, 2011
Robert Jacques
Feb 22, 2011
Masahiro Nakagawa
Feb 25, 2011
Robert Jacques
Apr 28, 2011
Sean Kelly
Apr 28, 2011
Jonathan M Davis
Apr 28, 2011
Robert Jacques
November 20, 2010
I have been working on a re-write of std.json. The purpose was to fix implementation bugs, better conform to the spec, provide a lightweight tokenizer (Sean) and to use an Algebraic type (Andrei) for JSON values. In the progress of doing this, I made my parser 2x faster and updated/fixed a bunch of issues with VariantN in order to fully support Algebraic types. Both of these libraries are at a solid beta level, so I'd like to get some feedback, and provide a patch for those being held back by the problems with Algebraic. The code and docs are available at: https://jshare.johnshopkins.edu/rjacque2/public_html/. These files were written against DMD 2.050 and both depend on some patches currently in bugzilla (see the top of each file or below)

Summary of Variant changes:
* Depends on Issue 5155's patch
* VariantN now properly supports types defined using "This".
* Additional template constraints and acceptance of implicit converters in
opAssign and ctor. i.e. if an Algebraic type supports reals, you can now
assign an int to it.
* Updated to using opBinary/opBinaryRight/opOpAssign. This adds right
support to several functions and is now generated via compile time
reflection + mixins: i.e. Algebraic types of user defined types should
work, etc.
* Added opIn support, though it currently on works for AAs.
* Added opIndexOpAssign support.
* Added opDispatch as an alternative indexing method. This allows Variants
of type Variant[string] to behave like prototype structs: i.e. var.x = 5;
instead of var["x"] = 5;

Notes:
* There's an bugzilla issue requesting opCall support in Variant. While I
can see the usefulness, syntactically this clashes with the ctor. Should
this issue be closed or should a method be used as an opCall surrogate?
* Could someone explain to me the meaning/intension of "Future additions
to Algebraic will allow compile-time checking that all possible types are
handled by user code, eliminating a large class of errors." Is this
something akin to final switch support?

Summary of JSON changes:
* Depends on the Variant improvements.
* Depends on Issue 5233's patch
* Depends on Issue 5236's patch
* Issue 5232's patch is also recommended
* The integer type was removed: JSON doesn't differentiate between
floating and integral numbers. Internally, reals are used and on systems
with 80-bit support, this encompasses all integral types.
* UTF escape characters are now correctly support.
* All routines/types were encapsulated in a JSON struct for name space
reasons.
* An Algebraic type is used for JSON values, with
serialization/de-serialization routines as free methods.
* Serialization/de-serialization centers around a set of input/output
range to/from token range routines, with separate parser/writer routines.
* Values can be written in either a concise (default) or pretty printed
format. (i.e. indented with line returns)
* Convenience toString and toStringHR routines exist.
* Simple Type to/from json routines exist, but are marked as to be
re-evaluated pending std.serialization.
* I've implemented a binary format customized for JSON. Besides preserving
numeric precision (i.e. 80-bit reals), I found it gave slightly smaller
file size (~20%) and 2-3x parsing performance for a large numeric dataset
of mine. I'm not sure if it's worth-while on the whole, so I'd appreciate
feedback.

Notes:
* Does anyone have a suggestion of a good way to attach methods to an
Algebraic type? And if we can, should we?
February 13, 2011
Hi, I just wanted to comment on the dson file format.

While perhaps less efficient, the mongodb nosql database also uses a binary json (bson) format to store data. It implements some extra types though. I suggest that if you want to implement binary json, to seriously consider adopting bson instead:

http://bsonspec.org/

<http://bsonspec.org/>
2010/11/20 Robert Jacques <sandford at jhu.edu>

> I have been working on a re-write of std.json. The purpose was to fix implementation bugs, better conform to the spec, provide a lightweight tokenizer (Sean) and to use an Algebraic type (Andrei) for JSON values. In the progress of doing this, I made my parser 2x faster and updated/fixed a bunch of issues with VariantN in order to fully support Algebraic types. Both of these libraries are at a solid beta level, so I'd like to get some feedback, and provide a patch for those being held back by the problems with Algebraic. The code and docs are available at: https://jshare.johnshopkins.edu/rjacque2/public_html/. These files were written against DMD 2.050 and both depend on some patches currently in bugzilla (see the top of each file or below)
>
> Summary of Variant changes:
> * Depends on Issue 5155's patch
> * VariantN now properly supports types defined using "This".
> * Additional template constraints and acceptance of implicit converters in
> opAssign and ctor. i.e. if an Algebraic type supports reals, you can now
> assign an int to it.
> * Updated to using opBinary/opBinaryRight/opOpAssign. This adds right
> support to several functions and is now generated via compile time
> reflection + mixins: i.e. Algebraic types of user defined types should work,
> etc.
> * Added opIn support, though it currently on works for AAs.
> * Added opIndexOpAssign support.
> * Added opDispatch as an alternative indexing method. This allows Variants
> of type Variant[string] to behave like prototype structs: i.e. var.x = 5;
> instead of var["x"] = 5;
>
> Notes:
> * There's an bugzilla issue requesting opCall support in Variant. While I
> can see the usefulness, syntactically this clashes with the ctor. Should
> this issue be closed or should a method be used as an opCall surrogate?
> * Could someone explain to me the meaning/intension of "Future additions to
> Algebraic will allow compile-time checking that all possible types are
> handled by user code, eliminating a large class of errors." Is this
> something akin to final switch support?
>
> Summary of JSON changes:
> * Depends on the Variant improvements.
> * Depends on Issue 5233's patch
> * Depends on Issue 5236's patch
> * Issue 5232's patch is also recommended
> * The integer type was removed: JSON doesn't differentiate between floating
> and integral numbers. Internally, reals are used and on systems with 80-bit
> support, this encompasses all integral types.
> * UTF escape characters are now correctly support.
> * All routines/types were encapsulated in a JSON struct for name space
> reasons.
> * An Algebraic type is used for JSON values, with
> serialization/de-serialization routines as free methods.
> * Serialization/de-serialization centers around a set of input/output range
> to/from token range routines, with separate parser/writer routines.
> * Values can be written in either a concise (default) or pretty printed
> format. (i.e. indented with line returns)
> * Convenience toString and toStringHR routines exist.
> * Simple Type to/from json routines exist, but are marked as to be
> re-evaluated pending std.serialization.
> * I've implemented a binary format customized for JSON. Besides preserving
> numeric precision (i.e. 80-bit reals), I found it gave slightly smaller file
> size (~20%) and 2-3x parsing performance for a large numeric dataset of
> mine. I'm not sure if it's worth-while on the whole, so I'd appreciate
> feedback.
>
> Notes:
> * Does anyone have a suggestion of a good way to attach methods to an
> Algebraic type? And if we can, should we?
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110213/c6c7c15e/attachment.html>
February 18, 2011
On Sun, 13 Feb 2011 13:50:19 -0500, Lutger Blijdestijn <lutger.blijdestijn at gmail.com> wrote:
> Hi, I just wanted to comment on the dson file format.
>
> While perhaps less efficient, the mongodb nosql database also uses a
> binary
> json (bson) format to store data. It implements some extra types though.
> I
> suggest that if you want to implement binary json, to seriously consider
> adopting bson instead:
>
> http://bsonspec.org/
>
> <http://bsonspec.org/>

Hi, I'm aware of the BSON format and have given it a decent review/reading (along with a few other serialization formats). As was noted in the JSON documentation BSON doesn't support all JSON types and has serious size and performance issues. First and foremost, BSON doesn't support true arrays; "arrays" are actually maps with integer keys. And it doesn't support 80-bit reals, (which will probably be a deficiency of all non-D specific formats.) Add in all the legacy/proprietary stuff and the in-memory/C design optimization, and it's really not attractive as a stand alone serial format. Also, Mongodb has a 4mb Document limit, which IIRC is assumed by many of the BSON implementations to be the BSON size limit as well. My main purpose behind a binary json format was to store/read a data table, which was originally quite small and now is quite large, in a more efficient manner. Long term, I'm thinking of dropping this functionality in favor of a customized D binary serial format.
February 22, 2011
On Sat, 19 Feb 2011 06:57:00 +0900, Robert Jacques <sandford at jhu.edu> wrote:

> On Sun, 13 Feb 2011 13:50:19 -0500, Lutger Blijdestijn <lutger.blijdestijn at gmail.com> wrote:
>> Hi, I just wanted to comment on the dson file format.
>>
>> While perhaps less efficient, the mongodb nosql database also uses a
>> binary
>> json (bson) format to store data. It implements some extra types
>> though. I
>> suggest that if you want to implement binary json, to seriously consider
>> adopting bson instead:
>>
>> http://bsonspec.org/
>>
>> <http://bsonspec.org/>
>
> Hi, I'm aware of the BSON format and have given it a decent review/reading (along with a few other serialization formats). As was noted in the JSON documentation BSON doesn't support all JSON types and has serious size and performance issues. First and foremost, BSON doesn't support true arrays; "arrays" are actually maps with integer keys. And it doesn't support 80-bit reals, (which will probably be a deficiency of all non-D specific formats.) Add in all the legacy/proprietary stuff and the in-memory/C design optimization, and it's really not attractive as a stand alone serial format. Also, Mongodb has a 4mb Document limit, which IIRC is assumed by many of the BSON implementations to be the BSON size limit as well.

You are right. BSON is not a JSON. BSON purposes traversable and editable
format.
So, BSON is less efficient than other binary formats.
If you try to implement MongoDB-like database, BSON is better.

>> 4mb

Currently, 16mb.

> My main purpose behind a binary json format was to store/read a data table, which was originally quite small and now is quite large, in a more efficient manner. Long term, I'm thinking of dropping this functionality in favor of a customized D binary serial format.

I think original binary format is not necessary.
When user uses JSON, object size and performance are not important.
Adding halfway implementation increases maintenance cost. It's a demerit.


Masahiro
February 25, 2011
On Tue, 22 Feb 2011 02:24:10 -0500, Masahiro Nakagawa <repeatedly at gmail.com> wrote:
> On Sat, 19 Feb 2011 06:57:00 +0900, Robert Jacques <sandford at jhu.edu> wrote:
[snip]
>> My main purpose behind a binary json format was to store/read a data table, which was originally quite small and now is quite large, in a more efficient manner. Long term, I'm thinking of dropping this functionality in favor of a customized D binary serial format.
>
> I think original binary format is not necessary.
> When user uses JSON, object size and performance are not important.
> Adding halfway implementation increases maintenance cost. It's a demerit.
>
>
> Masahiro

I agree with you that an original binary format is not necessary. But a binary format is, and I've had trouble finding an open binary specification with reference support. If you have a suggestion and specification link, it would be much appreciated.
April 28, 2011
On Nov 20, 2010, at 12:14 AM, Robert Jacques wrote:

> I have been working on a re-write of std.json.

I've lost track.  Is the new std.json in place?  I just submitted a bug report on std.json, but don't know whether I should have filed a report against that module.
April 28, 2011
> On Nov 20, 2010, at 12:14 AM, Robert Jacques wrote:
> > I have been working on a re-write of std.json.
> 
> I've lost track.  Is the new std.json in place?  I just submitted a bug report on std.json, but don't know whether I should have filed a report against that module.

It certainly hasn't been up for review on the newsgroup. I'm pretty sure that std.json is still the same std.json that we've always had.

- Jonathan M Davis
April 28, 2011
On Thu, 28 Apr 2011 15:28:58 -0400, Sean Kelly <sean at invisibleduck.org> wrote:
> On Nov 20, 2010, at 12:14 AM, Robert Jacques wrote:
>
>> I have been working on a re-write of std.json.
>
> I've lost track.  Is the new std.json in place?  I just submitted a bug report on std.json, but don't know whether I should have filed a report against that module.

The new json isn't in place yet, though both it and the new variant it relies on are ready / almost ready for review, respectively. There's just been other things in the queue. I've found the bug report (http://d.puremagic.com/issues/show_bug.cgi?id=5904) and it applies to my code as well. Fixing it now...