std.data.json formal review (page 15) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.data.json formal review (page 15)

August 15, 2015

Re: std.data.json formal review

Posted by Walter Bright
in reply to suliman

Walter Bright

Posted in reply to suliman

On 8/14/2015 9:58 PM, suliman wrote:
> On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
>> Config files will work fine with json format.
> Walter, and what I should to do for commenting stringin config for test purpose?
> How it's can be done with json?

{ "comment" : "this is a comment" }


> I really think that dmd should use same format as dub

json is a format that everybody understands, and dmd has json code already in it (as dmd generates json files)

August 15, 2015

Re: std.data.json formal review

Posted by Ola Fosheim Grøstad
in reply to Walter Bright

Ola Fosheim Grøstad

Posted in reply to Walter Bright

On Saturday, 15 August 2015 at 05:03:52 UTC, Walter Bright wrote:
> On 8/14/2015 9:58 PM, suliman wrote:
>> On Friday, 14 August 2015 at 20:44:59 UTC, Walter Bright wrote:
>>> Config files will work fine with json format.
>> Walter, and what I should to do for commenting stringin config for test purpose?
>> How it's can be done with json?
>
> { "comment" : "this is a comment" }
>
>
>> I really think that dmd should use same format as dub
>
> json is a format that everybody understands, and dmd has json code already in it (as dmd generates json files)

And you end up with each D tool having their own config format… :-(

http://www.json2yaml.com/

August 15, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Walter Bright

Sönke Ludwig

Posted in reply to Walter Bright

Am 14.08.2015 um 10:17 schrieb Walter Bright:
> On 8/13/2015 11:52 PM, Sönke Ludwig wrote:
>> Am 14.08.2015 um 02:26 schrieb Walter Bright:
>>> On 8/13/2015 3:51 AM, Sönke Ludwig wrote:
>>>> These were, AFAICS, the only major open issues (a decision for an
>>>> opt() variant
>>>> would be nice, but fortunately that's not a fundamental decision in
>>>> any way).
>>>
>>> 1. What about the issue of having the API be a composable range
>>> interface?
>>>
>>> http://s-ludwig.github.io/std_data_json/stdx/data/json/lexer/lexJSON.html
>>>
>>>
>>> I.e. the input range should be the FIRST argument, not the last.
>>
>> Hm, it *is* the first function argument, just the last template argument.
>
> Ok, my mistake. I didn't look at the others.
>
> I don't know what 'isStringInputRange' is. Whatever it is, it should be
> a 'range of char'.

I'll rename it to isCharInputRange. We don't have something like that in Phobos, right?

>>> 2. Why are integers acceptable as lexer input? The spec specifies
>>> Unicode.
>> In this case, the lexer will perform on-the-fly UTF validation of the
>> input. It
>> can do so more efficiently than first validating the input using a
>> wrapper
>> range, because it has to check the value of most incoming code units
>> anyway.
>
> There is no reason to validate UTF-8 input. The only place where
> non-ASCII code units can even legally appear is inside strings, and
> there they can just be copied verbatim while looking for the end of the
> string.

The idea is to assume that any char based input is already valid UTF (as D defines it), while integer based input comes from an unverified source, so that it still has to be validated before being cast/copied into a 'string'. I think this is a sensible approach, both semantically and performance-wise.

>
>
>>> 3. Why are there 4 functions that do the same thing?
>>>
>>> http://s-ludwig.github.io/std_data_json/stdx/data/json/generator.html
>>>
>>> After all, there already is a
>>> http://s-ludwig.github.io/std_data_json/stdx/data/json/generator/GeneratorOptions.html
>>>
>> There are two classes of functions that are not covered by
>> GeneratorOptions:
>> writing to a stream or returning a string.
>
> Why do both? Always return an input range. If the user wants a string,
> he can pipe the input range to a string generator, such as .array

Convenience for one. The lack of number to input range conversion functions is another concern. I'm not really keen to implement an input range style floating-point to string conversion routine just for this module.

Finally, I'm a little worried about performance. The output range based approach can keep a lot of state implicitly using the program counter register. But an input range would explicitly have to keep track of the current JSON element, as well as the current character/state within that element (and possibly one level deeper, for example for escape sequences). This means that it will require either multiple branches or indirection for each popFront().

August 15, 2015

Re: std.data.json formal review

Posted by Suliman
in reply to Sönke Ludwig

Suliman

Posted in reply to Sönke Ludwig

I talked with few people and they said that they are prefer current vibed's json implementation. What's wrong with it? Why do not stay old? They look more easier that new...

IMHO API of current is much harder.

August 15, 2015

Re: std.data.json formal review

Posted by Laeeth Isharc
in reply to Suliman

Laeeth Isharc

Posted in reply to Suliman

On Saturday, 15 August 2015 at 17:07:36 UTC, Suliman wrote:
> I talked with few people and they said that they are prefer current vibed's json implementation. What's wrong with it? Why do not stay old? They look more easier that new...
>
> IMHO API of current is much harder.

New stream parser is fast!  (See prior thread on benchmarks).

August 16, 2015

Re: std.data.json formal review

Posted by Walter Bright
in reply to Sönke Ludwig

Walter Bright

Posted in reply to Sönke Ludwig

On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
>> I don't know what 'isStringInputRange' is. Whatever it is, it should be
>> a 'range of char'.
>
> I'll rename it to isCharInputRange. We don't have something like that in Phobos,
> right?

That's right, there isn't one. But I use:

    if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))

I'm not a fan of more names for trivia, the deluge of names has its own costs.

>> There is no reason to validate UTF-8 input. The only place where
>> non-ASCII code units can even legally appear is inside strings, and
>> there they can just be copied verbatim while looking for the end of the
>> string.
> The idea is to assume that any char based input is already valid UTF (as D
> defines it), while integer based input comes from an unverified source, so that
> it still has to be validated before being cast/copied into a 'string'. I think
> this is a sensible approach, both semantically and performance-wise.

The json parser will work fine without doing any validation at all. I've been implementing string handling code in Phobos with the idea of doing validation only if the algorithm requires it, and only for those parts that require it.

There are many validation algorithms in Phobos one can tack on - having two implementations of every algorithm, one with an embedded reinvented validation and one without - is too much.

The general idea with algorithms is that they do not combine things, but they enable composition.

>> Why do both? Always return an input range. If the user wants a string,
>> he can pipe the input range to a string generator, such as .array
> Convenience for one.

Back to the previous point, that means that every algorithm in Phobos should have two versions, one that returns a range and the other a string? All these variations will result in a combinatorical explosion.

The other problem, of course, is that returning a string means the algorithm has to decide how to allocate that string. As much as possible, algorithms should not be making allocation decisions.

> The lack of number to input range conversion functions is
> another concern. I'm not really keen to implement an input range style
> floating-point to string conversion routine just for this module.

Not sure what you mean. Phobos needs such routines anyway, and you still have to do something about floating point.

> Finally, I'm a little worried about performance. The output range based approach
> can keep a lot of state implicitly using the program counter register. But an
> input range would explicitly have to keep track of the current JSON element, as
> well as the current character/state within that element (and possibly one level
> deeper, for example for escape sequences). This means that it will require
> either multiple branches or indirection for each popFront().

Often this is made up for by not needing to allocate storage. Also, that state is in the cached "hot zone" on top of the stack, which is much faster to access than a cold uninitialized array.

I share your concern with performance, and I had very good results with Warp by keeping all the state on the stack in this manner.

August 16, 2015

Re: std.data.json formal review

Posted by Jay Norwood
in reply to Sönke Ludwig

Jay Norwood

Posted in reply to Sönke Ludwig

On Thursday, 13 August 2015 at 10:51:47 UTC, Sönke Ludwig wrote:
> I think we really need to have an informal pre-vote about the BigInt and DOM efficiency vs. functionality issues. Basically there are three options for each:
>
> 1. Keep them: May have an impact on compile time for big DOMs (run time/memory consumption wouldn't be affected if a pointer to BigInt is stored). But provides an out-of-the-box experience for a broad set of applications.
>
> 2. Remove them: Results in a slim and clean API that is fast (to run/compile), but also one that will be less useful for certain applications.
>
> 3. Make them CT configurable: Best of both worlds in terms of speed, at the cost of a more complex API.
>

I like this #3.  If I understand it correctly, this would provide the template to extend the supported data types, correct?

However, I also think that you shouldn't try to make the basic storage format handle everything that might be more appropriately handled by a meta-model.

Are the range operations compatible with the std.parallelism library?

August 16, 2015

Re: std.data.json formal review

Posted by Dmitry Olshansky
in reply to Walter Bright

Dmitry Olshansky

Posted in reply to Walter Bright

On 16-Aug-2015 03:50, Walter Bright wrote:
> On 8/15/2015 3:18 AM, Sönke Ludwig wrote:
>>> There is no reason to validate UTF-8 input. The only place where
>>> non-ASCII code units can even legally appear is inside strings, and
>>> there they can just be copied verbatim while looking for the end of the
>>> string.
>> The idea is to assume that any char based input is already valid UTF
>> (as D
>> defines it), while integer based input comes from an unverified
>> source, so that
>> it still has to be validated before being cast/copied into a 'string'.
>> I think
>> this is a sensible approach, both semantically and performance-wise.
>
> The json parser will work fine without doing any validation at all. I've
> been implementing string handling code in Phobos with the idea of doing
> validation only if the algorithm requires it, and only for those parts
> that require it.
>

Aye.

> There are many validation algorithms in Phobos one can tack on - having
> two implementations of every algorithm, one with an embedded reinvented
> validation and one without - is too much.

Actually there are next to none. `validate` that throws on failed validation is a misnomer.

> The general idea with algorithms is that they do not combine things, but
> they enable composition.
>

At the lower level such as tokenizers combining a couple of simple steps together makes sense because it makes things run faster. It usually eliminates the need for temporary result that must be digestible by the next range.

For instance "combining" decoding and character classification one may side-step generating the codepoint value itself (because now it doesn't have to produce it for the top-level algorithm).


-- 
Dmitry Olshansky

August 16, 2015

Re: std.data.json formal review

Posted by Walter Bright
in reply to Dmitry Olshansky

Walter Bright

Posted in reply to Dmitry Olshansky

On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:
> For instance "combining" decoding and character classification one may side-step
> generating the codepoint value itself (because now it doesn't have to produce it
> for the top-level algorithm).

Perhaps, but I wouldn't be convinced without benchmarks to prove it on a case-by-case basis.

But it's moot, as json lexing never needs to decode.

August 16, 2015

Re: std.data.json formal review

Posted by Dmitry Olshansky
in reply to Walter Bright

Dmitry Olshansky

Posted in reply to Walter Bright

On 16-Aug-2015 11:30, Walter Bright wrote:
> On 8/15/2015 11:52 PM, Dmitry Olshansky wrote:
>> For instance "combining" decoding and character classification one may
>> side-step
>> generating the codepoint value itself (because now it doesn't have to
>> produce it
>> for the top-level algorithm).
>
> Perhaps, but I wouldn't be convinced without benchmarks to prove it on a
> case-by-case basis.

About x2 faster then decode + check-if-alphabetic on my stuff:

https://github.com/DmitryOlshansky/gsoc-bench-2012

I haven't updated it in a while. There are nice bargraphs for decoding versions by David comparing DMD vs LDC vs GDC:

Page 15 at http://dconf.org/2013/talks/nadlinger.pdf

>
> But it's moot, as json lexing never needs to decode.

Agreed.

-- 
Dmitry Olshansky

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation