std.data.json formal review (page 23) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » std.data.json formal review (page 23)

August 22, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Walter Bright

Sönke Ludwig

Posted in reply to Walter Bright

Am 17.08.2015 um 00:03 schrieb Walter Bright:
> On 8/16/2015 5:34 AM, Sönke Ludwig wrote:
>> Am 16.08.2015 um 02:50 schrieb Walter Bright:
>>>      if (isInputRange!R && is(Unqual!(ElementEncodingType!R) == char))
>>>
>>> I'm not a fan of more names for trivia, the deluge of names has its own
>>> costs.
>>
>> Good, I'll use `if (isInputRange!R &&
>> (isSomeChar!(ElementEncodingType!R) ||
>> isIntegral!(ElementEncodingType!R))`. It's just used in number of
>> places and
>> quite a bit more verbose (twice as long) and I guess a large number of
>> algorithms in Phobos accept char ranges, so that may actually warrant
>> a name in
>> this case.
>
> Except that there is no reason to support wchar, dchar, int, ubyte, or
> anything other than char. The idea is not to support something just
> because you can, but there should be an identifiable, real use case for
> it first. Has anyone ever seen Json data as ulongs? I haven't either.

But you have seen ubyte[] when reading something from a file or from a network stream. But since Andrei now also wants to remove it, so be it. I'll answer some of the other points anyway:

>>> The json parser will work fine without doing any validation at all. I've
>>> been implementing string handling code in Phobos with the idea of doing
>>> validation only if the algorithm requires it, and only for those parts
>>> that require it.
>>
>> Yes, and it won't do that if a char range is passed in. If the
>> integral range
>> path gets removed there are basically two possibilities left, perform the
>> validation up-front (slower), or risk UTF exceptions in unrelated
>> parts of the
>> code base. I don't see why we shouldn't take the opportunity for a
>> full and fast
>> validation here. But I'll relay this to Andrei, it was his idea
>> originally.
>
> That argument could be used to justify validation in every single
> algorithm that deals with strings.

Not really for all, but indeed there are more where this could apply in theory. However, JSON is used frequently in situations where parsing speed, or performance in general, is often crucial (e.g. web services), which makes it stand out due to practical concerns. Others, such as an XML parser would apply, too, but probably none of the generic string manipulation functions.

>>>>> Why do both? Always return an input range. If the user wants a string,
>>>>> he can pipe the input range to a string generator, such as .array
>>>> Convenience for one.
>>>
>>> Back to the previous point, that means that every algorithm in Phobos
>>> should have two versions, one that returns a range and the other a
>>> string? All these variations will result in a combinatorical explosion.
>>
>> This may be a factor of two, but not a combinatorial explosion.
>
> We're already up to validate or not, to string or not, i.e. 4 combinations.

Validation is part of the lexer and not the generator. There is no combinatorial relation between the two. Validation is also just a template parameter, so there are no two combinations in terms of implementation either. There is just a "static if" statement somewhere to decide if validate() should be called or not.

>>> The other problem, of course, is that returning a string means the
>>> algorithm has to decide how to allocate that string. As much as
>>> possible, algorithms should not be making allocation decisions.
>>
>> Granted, the fact that format() and to!() support input ranges (I
>> didn't notice
>> that until now) makes the issue less important. But without those, it
>> would
>> basically mean that almost all places that generate JSON strings would
>> have to
>> import std.array and append .array. Nothing particularly bad if viewed in
>> isolation, but makes the language appear a lot less clean/more verbose
>> if it
>> occurs often. It's also a stepping stone for language newcomers.
>
> This has been argued before, and the problem is it applies to EVERY
> algorithm in Phobos, and winds up with a doubling of the number of
> functions to deal with it. I do not view this as clean.
>
> D is going to be built around ranges as a fundamental way of coding.
> Users will need to learn something about them. Appending .array is not a
> big hill to climb.

It isn't if you get taught about it. But it surely is if you don't know about it yet and try to get something working based only on the JSON API (language newcomer that wants to work with JSON). It's also still an additional thing to remember, type and read, making it an additional piece of cognitive load, even for developers that are fluent with this. Have many of such pieces and they add up to a point where productivity goes to its knees.

I already personally find it quite annoying constantly having to import std.range, std.array and std.algorithm to just use some small piece of functionality in std.algorithm. It's also often not clear in which of the three modules/packages a certain function is. We need to find a better balance here if D is to keep its appeal as a language where you stay in "the zone"  (a.k.a flow), which always has been a big thing for me.

August 22, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Andrei Alexandrescu

Sönke Ludwig

Posted in reply to Andrei Alexandrescu

Am 21.08.2015 um 19:30 schrieb Andrei Alexandrescu:
> On 8/18/15 12:54 PM, Sönke Ludwig wrote:
>> Am 18.08.2015 um 00:21 schrieb Andrei Alexandrescu:
>>> * On the face of it, dedicating 6 modules to such a small specification
>>> as JSON seems excessive. I'm thinking one module here. (As a simple
>>> point: who would ever want to import only foundation, which in turn has
>>> one exception type and one location type in it?) I think it shouldn't be
>>> up for debate that we must aim for simple and clean APIs.
>>
>> That would mean a single module that is >5k lines long. Spreading out
>> certain things, such as JSONValue into an own module also makes sense to
>> avoid unnecessarily large imports where other parts of the functionality
>> isn't needed. Maybe we could move some private things to "std.internal"
>> or similar and merge some of the modules?
>
> That would help. My point is it's good design to make the response
> proportional to the problem. 5K lines is not a lot, but reducing those
> 5K in the first place would be a noble pursuit. And btw saving parsing
> time is so C++ :o).

Most lines are needed for tests and documentation. Surely dropping some functionality would make the module smaller, too. But there is not a lot to take away without making severe compromises in terms of actual functionality or usability.

>> But I also think that grouping symbols by topic is a good thing and
>> makes figuring out the API easier. There is also always package.d if you
>> really want to import everything.
>
> Figuring out the API easily is a good goal. The best way to achieve that
> is making the API no larger than necessary.

So, what's your suggestion, remove all read*/skip* functions for example? Make them member functions of JSONParserRange instead of UFCS functions? We could of course also just use the pseudo modules that std.algorithm had for example, where we'd create a table in the documentation for each category of functions.

>> Another thing I'd like to add is an output range that takes parser nodes
>> and writes to a string output range. This would be the kind of interface
>> that would be most useful for a serialization framework.
>
> Couldn't that be achieved trivially by e.g. using map!(t => t.toString)
> or similar?
>
> This is the nice thing about rangifying everything - suddenly you have a
> host of tools at your disposal.

No, the idea is to have an output range like so:

    Appender!string dst;
    JSONNodeOutputRange r(&dst);
    r.put(beginArray);
    r.put(1);
    r.put(2);
    r.put(endArray);

This would provide a forward interface for code that has to directly iterate over its input, which is the case for a serializer - it can't provide an input range interface in a sane way. The alternative would be to either let the serializer re-implement all of JSON, or to just provide some primitives (writeJSON() that takes bool, number or string) and to let the serializer implement the rest of JSON (arrays/objects), which includes certain options, such as pretty-printing.

>>> - Also, at token level strings should be stored with escapes unresolved.
>>> If the user wants a string with the escapes resolved, a lazy range
>>> does it.
>>
>> To make things efficient, it currently stores escaped strings if slices
>> of the input are used, but stores unescaped strings if allocations are
>> necessary anyway.
>
> That seems a good balance, and probably could be applied to numbers as
> well.

With the difference that numbers stored as numbers never need to allocate, so for non-slicable inputs the compromise is not the same.

What about just offering basically three (CT selectable) modes:
- Always parse as double (parse lazily if slicing can be used) (default)
- Parse double or long (again, lazily if slicing can be used)
- Always store the string representation

The question that remains is how to handle this in JSONValue - support just double there? Or something like JSONNumber that abstracts away the differences, but makes writing generic code against JSONValue difficult? Or make it also parameterized in what it can store?

>>> - Validating UTF is tricky; I've seen some discussion in this thread
>>> about it. On the face of it JSON only accepts valid UTF characters. As
>>> such, a modularity-based argument is to pipe UTF validation before
>>> tokenization. (We need a lazy UTF validator and sanitizer stat!) An
>>> efficiency-based argument is to do validation during tokenization. I'm
>>> inclining in favor of modularization, which allows us to focus on one
>>> thing at a time and do it well, instead of duplicationg validation
>>> everywhere. Note that it's easy to write routines that do JSON
>>> tokenization and leave UTF validation for later, so there's a lot of
>>> flexibility in composing validation with JSONization.
>>
>> It's unfortunate to see this change of mind in face of the work that
>> already went into the implementation. I also still think that this is a
>> good optimization opportunity that doesn't really affect the
>> implementation complexity. Validation isn't duplicated, but reused from
>> std.utf.
>
> Well if the validation is reused from std.utf, it can't have been very
> much work. I maintain that separating concerns seems like a good
> strategy here.

There is more than the actual call to validate(), such as writing tests and making sure the surroundings work, adjusting the interface and writing documentation. It's not *that* much work, but nonetheless wasted work.

I also still think that this hasn't been a bad idea at all. Because it speeds up the most important use case, parsing JSON from a non-memory source that has not yet been validated. I also very much like the idea of making it a programming error to have invalid UTF stored in a string, i.e. forcing the validation to happen before the cast from bytes to chars.

>>> - Litmus test: if the input type is a forward range AND if the string
>>> type chosen for tokens is the same as input type, successful
>>> tokenization should allocate exactly zero memory. I think this is a
>>> simple way to make sure that the tokenization API works well.
>>
>> Supporting arbitrary forward ranges doesn't seem to be enough, it would
>> at least have to be combined with something like take(), but then the
>> type doesn't equal the string type anymore. I'd suggest to keep it to
>> "if is sliceable and input type equals string type", at least for the
>> initial version.
>
> I had "take" in mind. Don't forget that "take" automatically uses slices
> wherever applicable. So if you just use typeof(take(...)), you get the
> best of all worlds.
>
> The more restrictive version seems reasonable for the first release.

Okay.

>>> - The JSON value does its own internal allocation (for e.g. arrays and
>>> hashtables), which should be fine as long as it's encapsulated and we
>>> can tweak it later (e.g. make it use reference counting inside).
>>
>> Since it's based on (Tagged)Algebraic, the internal types are part of
>> the interface. Changing them later is bound to break some code. So AFICS
>> this would either require to make the types used parameterized (string,
>> array and AA types). Or to abstract them away completely, i.e. only
>> forward operations but deny direct access to the type.
>>
>> ... thinking about it, TaggedAlgebraic could do that, while Algebraic
>> can't.
>
> Well if you figure the general Algebraic type is better replaced by a
> type specialized for JSON, fine.
>
> What we shouldn't endorse is two nearly identical library types
> (Algebraic and TaggedAlgebraic) that are only different in subtle
> matters related to performance in certain use patterns.
>
> If integral tags are better for closed type universes, specialize
> Algebraic to use integral tags where applicable.

TaggedAlgebraic would not be a type specialized for JSON! It's useful for all kinds of applications and just happens to have some advantages here, too.

An (imperfect) idea for merging this with the existing Algebraic name:

template Algebraic(T)
    if (is(T == struct) || is(T == union))
{
	// ... implementation of TaggedAlgebraic ...
}

To avoid the ambiguity with a single type Algebraic, a UDA could be required for T to get the actual TaggedAgebraic behavior.

Everything else would be problematic, because TaggedAlgebraic needs to be supplied with names for the different types, so the Algebraic(T...) way of specifying allowed types doesn't really work. And, more importantly, because exploiting static type information in the generated interface means breaking code that currently is built around a Variant return value.

>>> - Why both parseJSONStream and parseJSONValue? I'm thinking
>>> parseJSONValue would be enough because then you trivially parse a stream
>>> with repeated calls to parseJSONValue.
>>
>> parseJSONStream is the pull parser (StAX style) interface. It returns
>> the contents of a JSON document as individual nodes instead of storing
>> them in a DOM. This part is vital for high-performance parsing,
>> especially of large documents.
>
> So perhaps this is just a naming issue. The names don't suggest
> everything you said. What I see is "parse a JSON stream" and "parse a
> JSON value". So I naturally assumed we're looking at consuming a full
> stream vs. consuming only one value off a stream and stopping. How about
> better names?

parseToJSONValue/parseToJSONStream? parseAsX?

>>> - readArray suddenly introduces a distinct kind of interacting -
>>> callbacks. Why? Should be a lazy range lazy range lazy range. An adapter
>>> using callbacks is then a two-liner.
>>
>> It just has a more complicated implementation, but is already on the
>> TODO list.
>
> Great. Let me say again that with ranges you get to instantly tap into a
> wealth of tools. I say get rid of the callbacks and let a "tee" take
> care of it for whomever needs it.

The callbacks would surely be dropped when ranges get available. foreach() should usually be all that is needed.

>>> - Why is readBool even needed? Just readJSONValue and then enforce it as
>>> a bool. Same reasoning applies to readDouble and readString.
>>
>> This is for lower level access, using parseJSONValue would certainly be
>> possible, but it would have quite some unneeded overhead and would also
>> be non-@nogc.
>
> Meh, fine. But all of this is adding weight to the API in the wrong places.

Frankly, I don't think that this is even the wrong place. The pull parser interface is the single most important part of the API when we talk about allocation-less and high-performance operation. It also really has low weight, as it's just a small function that joins the other read* functions quite naturally and doesn't create any additional cognitive load.

>
>>> - readObject is with callbacks again - it would be nice if it were a
>>> lazy range.
>>
>> Okay, is also already on the list.
>
> Awes!

It could return a Tuple!(string, JSONNodeRange). But probably there should also be an opApply for the object field range, so that foreach (key, value; ...) becomes possible.

>> But apart from that, algebraic is unfortunately currently quite unsuited
>> for this kind of abstraction, even if that can be solved in theory (with
>> a lot of work). It requires to write things like
>> obj.get!(JSONValue[string])["foo"].get!JSONValue instead of just
>> obj["foo"], because it simply returns Variant from all of its forwarded
>> operators.
>
> Algebraic does not expose opIndex. We could add it to Algebraic such
> that obj["foo"] returns the same type a "this".

https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1088
https://github.com/D-Programming-Language/phobos/blob/6df5d551fd8a21feef061483c226e7d9b26d6cd4/std/variant.d#L1348

> It's easy for anyone to say that what's there is unfit for a particular
> purpose. It's also easy for many to define a ever-so-slightly-different
> new artifact that fits a particular purpose. Where you come as a
> talented hacker is to operate with the understanding of the importance
> of making things work, and make it work.

The problem is that making Algebraic exploit static type information means nothing short of a complete reimplementation, which TaggedAlgebraic is. It also means breaking existing code, if, for example, alg[0] suddenly returns a string instead of just a Variant with a string stored inside.

>>> - JSONValue should be more opaque and not expose representation as much
>>> as it does now. In particular, offering a built-in hashtable is bound to
>>> be problematic because those are expensive to construct, create garbage,
>>> and are not customizable. Instead, the necessary lookup and set APIs
>>> should be provided by JSONValue whilst keeping the implementation
>>> hidden. The same goes about array - a JSONValue shall not be exposed;
>>> instead, indexed access primitives should be exposed. Separate types
>>> might be offered (e.g. JSONArray, JSONDictionary) if deemed necessary.
>>> The string type should be a type parameter of JSONValue.
>>
>> This would unfortunately at the same time destroy almost all benefits
>> that using (Tagged)Algebraic has, namely that it would opens up the
>> possibility to have interoperability between different data formats (for
>> example, passing a JSONValue to a BSON generator without letting the
>> BSON generator know about JSON). This is unfortunately an area that I've
>> also not yet properly explored, but I think it's important as we go
>> forward with other data formats.
>
> I think we need to do it. Otherwise we're stuck with "D's JSON API
> cannot be used without the GC". We want to escape that gravitational
> pull. I know it's hard. But it's worth it.

I can't fight the feeling that what Phobos currently has in terms of allocators, containters and reference counting is simply not mature enough to make a good decision here. Restricting JSONValue as much as possible would at least keep the possibility to extend it later, but I think that we can and should do better in the long term.

August 22, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Andrei Alexandrescu

Sönke Ludwig

Posted in reply to Andrei Alexandrescu

Am 21.08.2015 um 18:56 schrieb Andrei Alexandrescu:
> On 8/18/15 1:21 PM, Sönke Ludwig wrote:
>> Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
>>> On 8/17/15 2:56 PM, Sönke Ludwig wrote:
>>>> - The enum is useful to be able to identify the types outside of the D
>>>> code itself. For example when serializing the data to disk, or when
>>>> communicating with C code.
>>>
>>> OK.
>>>
>>>> - It enables the use of pattern matching (final switch), which is often
>>>> very convenient, faster, and safer than an if-else cascade.
>>>
>>> Sounds tenuous.
>>
>> It's more convenient/readable in cases where a complex type is used
>> (typeID == Type.object vs. has!(JSONValue[string]). This is especially
>> true if the type is ever changed (or parametric) and all has!()/get!()
>> code needs to be adjusted accordingly.
>>
>> It's faster, even if there is no indirect call involved in the pointer
>> case, because the compiler can emit efficient jump tables instead of
>> generating a series of conditional jumps (if-else-cascade).
>>
>> It's safer because of the possibility to use final switch in addition to
>> a normal switch.
>>
>> I wouldn't call that tenuous.
>
> Well I guess I would, but no matter. It's something where reasonable
> people may disagree.

It depends on the perspective/use case, so it's surely not unreasonable to disagree here. But I'm especially not happy with the "final switch" argument getting dismissed so easily. By the same logic, we could also question the existence of "final switch", or even "switch", as a feature in the first place.

Performance benefits are certainly nice, too, but that's really just an implementation detail. The important trait is that the types get a name and that they form an enumerable set. This is quite similar to comparing a struct with named members to an anonymous Tuple!(T...).

August 22, 2015

Re: std.data.json formal review

Posted by Nick Sabalausky
in reply to David Nadlinger

Nick Sabalausky

Posted in reply to David Nadlinger

On 08/21/2015 12:29 PM, David Nadlinger wrote:
> On Friday, 21 August 2015 at 15:58:22 UTC, Nick Sabalausky wrote:
>> It also fucks up UFCS, and I'm a huge fan of UFCS.
>
> Are you saying that "import json : parseJSON = parse;
> foo.parseJSON.bar;" does not work?
>

Ok, fair point, although I was referring more to fully-qualified name lookups, as in the snippet I quoted from Jacob. Ie, this doesn't work:

someJsonCode.std.json.parse();

I do think though, generally speaking, if there is much need to do a renamed import, the symbol in question probably didn't have the best name in the first place.

Renamed importing is a great feature to have, but when you see it used it raises the question "*Why* is this being renamed? Why not just use it's real name?" For the most part, I see two main reasons:

1. "Just because. I like this bikeshed color better." But this is merely a code smell, not a legitimate reason to even bother.

or

2. The symbol has a questionable name in the first place.

If there's reason to even bring up renamed imports as a solution, then it's probably falling into the "questionably named" category.

Just because we CAN use D's module system and renamed imports and such to clear up ambiguities, doesn't mean we should let ourselves take things TOO far to the opposite extreme when avoiding C/C++'s "big long ugly names as a substitute for modules".

Like Walter, I do very much dislike C/C++'s super-long, super-unambiguous names. But IMO, preferring parseStream over parseJSONStream isn't a genuine case of avoiding C/C++-style naming, it's just being overrun by fear of C/C++-style naming and thus taking things too far to the opposite extreme. We can strike a better balance than choosing between "brief and unclear-at-a-glance" and "C++-level verbosity".

Yea, we CAN do "import std.json : parseJSONStream = parseStream;", but if there's even any motivation to do so in the first place, we may as well just use the better name right from the start. Besides, those who prefer ultra-brevity are free to paint their bikesheds with renamed imports, too ;)

August 24, 2015

Re: std.data.json formal review

Posted by Jacob Carlborg
in reply to Nick Sabalausky

Jacob Carlborg

Posted in reply to Nick Sabalausky

On 2015-08-21 18:25, Nick Sabalausky wrote:

> Module boundaries should be determined by organizational grouping, not
> by size.

Well, but it depends on how you decide what should be in a group. Size is usually a part of that decision, although it might not be conscious. You wouldn't but the whole D compiler on one module ;)

-- 
/Jacob Carlborg

August 24, 2015

Re: std.data.json formal review

Posted by Walter Bright
in reply to Sönke Ludwig

Walter Bright

Posted in reply to Sönke Ludwig

On 8/22/2015 5:21 AM, Sönke Ludwig wrote:
> Am 17.08.2015 um 00:03 schrieb Walter Bright:
>> D is going to be built around ranges as a fundamental way of coding.
>> Users will need to learn something about them. Appending .array is not a
>> big hill to climb.
>
> It isn't if you get taught about it. But it surely is if you don't know about it
> yet and try to get something working based only on the JSON API (language
> newcomer that wants to work with JSON).

Not if the illuminating example in the Json API description does it that way. Newbies will tend to copy/pasta the examples as a starting point.

> It's also still an additional thing to
> remember, type and read, making it an additional piece of cognitive load, even
> for developers that are fluent with this. Have many of such pieces and they add
> up to a point where productivity goes to its knees.

Having composable components behaving in predictable ways is not an additional piece of cognitive load, it is less of one.


> I already personally find it quite annoying constantly having to import
> std.range, std.array and std.algorithm to just use some small piece of
> functionality in std.algorithm. It's also often not clear in which of the three
> modules/packages a certain function is. We need to find a better balance here if
> D is to keep its appeal as a language where you stay in "the zone"  (a.k.a
> flow), which always has been a big thing for me.

If I buy a toy car, I get a toy car. If I get a lego set, I can build any toy with it. I believe the composable component approach will make Phobos smaller and much more flexible and useful, as opposed to monolithic APIs.

August 25, 2015

Re: std.data.json formal review

Posted by Martin Nowak
in reply to Sönke Ludwig

Martin Nowak

Posted in reply to Sönke Ludwig

On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:
> There is more than the actual call to validate(), such as writing tests and making sure the surroundings work, adjusting the interface and writing documentation. It's not *that* much work, but nonetheless wasted work.
>
> I also still think that this hasn't been a bad idea at all. Because it speeds up the most important use case, parsing JSON from a non-memory source that has not yet been validated. I also very much like the idea of making it a programming error to have invalid UTF stored in a string, i.e. forcing the validation to happen before the cast from bytes to chars.

Also see "utf/unicode should only be validated once"
https://issues.dlang.org/show_bug.cgi?id=14919

If combining lexing and validation is faster (why?) then a ubyte consuming interface should be available, though why couldn't it be done by adding a lazy ubyte->char validator range to std.utf.
In any case during lexing we should avoid autodecoding of narrow strings for redundant validation.

August 25, 2015

Re: std.data.json formal review

Posted by Martin Nowak
in reply to Andrei Alexandrescu

Martin Nowak

Posted in reply to Andrei Alexandrescu

On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote:
> * All new stuff should go in std.experimental. I assume "stdx" would change to that, should this work be merged.

Though stdx (or better std.x) would have been a prettier and more exciting name for std.experimental to begin with.

August 25, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Walter Bright

Sönke Ludwig

Posted in reply to Walter Bright

Am 24.08.2015 um 22:25 schrieb Walter Bright:
> On 8/22/2015 5:21 AM, Sönke Ludwig wrote:
>> Am 17.08.2015 um 00:03 schrieb Walter Bright:
>>> D is going to be built around ranges as a fundamental way of coding.
>>> Users will need to learn something about them. Appending .array is not a
>>> big hill to climb.
>>
>> It isn't if you get taught about it. But it surely is if you don't
>> know about it
>> yet and try to get something working based only on the JSON API (language
>> newcomer that wants to work with JSON).
>
> Not if the illuminating example in the Json API description does it that
> way. Newbies will tend to copy/pasta the examples as a starting point.

That's true, but then they will possibly have to understand the inner workings soon after, for example when something goes wrong and they get cryptic error messages. It makes the learning curve steeper, even if some of that can be mitigated with good documentation/tutorials.

>> It's also still an additional thing to
>> remember, type and read, making it an additional piece of cognitive
>> load, even
>> for developers that are fluent with this. Have many of such pieces and
>> they add
>> up to a point where productivity goes to its knees.
>
> Having composable components behaving in predictable ways is not an
> additional piece of cognitive load, it is less of one.

Having to write additional things that are not part of the problem (".array", "import std.array : array;") is cognitive load and having to read such things is cognitive and visual load. Also, having to remember where those additional components reside is cognitive load, at least if they are not used really frequently. This has of course nothing to do with predictable behavior of the components, but with the API/language boundary between ranges and arrays.

>> I already personally find it quite annoying constantly having to import
>> std.range, std.array and std.algorithm to just use some small piece of
>> functionality in std.algorithm. It's also often not clear in which of
>> the three
>> modules/packages a certain function is. We need to find a better
>> balance here if
>> D is to keep its appeal as a language where you stay in "the zone"
>> (a.k.a
>> flow), which always has been a big thing for me.
>
> If I buy a toy car, I get a toy car. If I get a lego set, I can build
> any toy with it. I believe the composable component approach will make
> Phobos smaller and much more flexible and useful, as opposed to
> monolithic APIs.

I'm not arguing against a range based approach! It's just that such an approach ideally shouldn't come at the expense of simplicity and relevance.

If I have a string variable and I want to store the upper case version of another string, the direct mental translation is "dst = toUpper(src);" - and not "dst = toUpper(src).array;". It reminds me of the unwrap() calls in Rust code. They can produce a huge amount of visual noise for dealing with errors, whereas an exception based approach lets you focus on the actual problem. Of course exceptions have their own issues, but that's a different topic.

Keeping toString in addition to toChars would be enough to avoid the issue here. A possible alternative would be to let the proposed JSON text input range have an "alias this" to "std.array.array(this)". Then it wouldn't even require a rename of toString to toChars to get both worlds.

August 25, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Martin Nowak

Sönke Ludwig

Posted in reply to Martin Nowak

Am 25.08.2015 um 07:55 schrieb Martin Nowak:
> On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:
>> There is more than the actual call to validate(), such as writing
>> tests and making sure the surroundings work, adjusting the interface
>> and writing documentation. It's not *that* much work, but nonetheless
>> wasted work.
>>
>> I also still think that this hasn't been a bad idea at all. Because it
>> speeds up the most important use case, parsing JSON from a non-memory
>> source that has not yet been validated. I also very much like the idea
>> of making it a programming error to have invalid UTF stored in a
>> string, i.e. forcing the validation to happen before the cast from
>> bytes to chars.
>
> Also see "utf/unicode should only be validated once"
> https://issues.dlang.org/show_bug.cgi?id=14919
>
> If combining lexing and validation is faster (why?) then a ubyte
> consuming interface should be available, though why couldn't it be done
> by adding a lazy ubyte->char validator range to std.utf.
> In any case during lexing we should avoid autodecoding of narrow strings
> for redundant validation.

The performance benefit comes from the fact that almost all of JSON is a subset of ASCII, so that lexing the input will implicitly validate it as correct UTF. The only places where actual UTF sequences can occur is in string literals outside of escape sequences. Depending on the type of document, that can result is a lot less conditionals compared to a full validation of the input.

Autodecoding during lexing is being avoided, everything happens on the code unit level.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation