RFC: std.json sucessor (page 15)

Or to be more explicit: If have SNAN then there is no point in trying to recompute the expression using a different algorithm. If have QNAN then you might want to recompute the expression using a different algorithm (e.g. complex numbers or analytically). ?

On Thursday, 28 August 2014 at 12:10:58 UTC, Ola Fosheim Grøstad wrote: > Or to be more explicit: > > If have SNAN then there is no point in trying to recompute the expression using a different algorithm. > > If have QNAN then you might want to recompute the expression using a different algorithm (e.g. complex numbers or analytically). > > ? No. Once you load an SNAN, it isn't an SNAN any more! It is a QNAN. You cannot have an SNAN in a floating-point register (unless you do a nasty hack to pass it in). It gets converted during loading. const float x = snan; x = x; // x is now a qnan.

"Don" wrote in message news:fvxmsrbicgpqkkiufdyv@forum.dlang.org... > If float.init exists, it cannot be an snan, since you are allowed to use float.init. So should we get rid of them from the language completely? Using them as template parameters does even respect the sign of the NaN last time I checked, let alone the s/q or payload. If we change float.init to be a qnan then it won't be possible to make one at compile time.

On Thursday, 28 August 2014 at 14:43:30 UTC, Don wrote: > No. Once you load an SNAN, it isn't an SNAN any more! It is a QNAN. By which definition? It is only if you consume the SNAN with an fp-exception-free arithmetic op that it should be turned into a QNAN. If you compute with an op that throws then it should throw an exception. MOV should not be viewed as a computation… It also makes sense to save SNAN to file when converting corrupted data-files. SNAN could then mean "corrupted" and QNAN could mean "absent". You should not get an exception for loading a file. You should get an exception if you start computing on the SNAN in the file. > You cannot have an SNAN in a floating-point register (unless you do a nasty hack to pass it in). It gets converted during loading. I don't understand this position. If you cannot load SNAN then why does SSE handle SNAN in arithmetic ops and compares? > const float x = snan; > x = x; > > // x is now a qnan. I disagree (and why const?) Assignment does nothing, it should not consume the SNAN. Assignment is just "naming". It is not "computing".

Let me try again: SNAN => unfortunately absent QNAN => deliberately absent So you can have: compute(SNAN) => handle(exception) { if(can turn unfortunate situation into deliberate) then compute(QNAN) else throw )

Kahan states this in a 1997 paper: «[…]An SNaN may be moved ( copied ) without incident, but any other arithmetic operation upon an SNaN is an INVALID operation ( and so is loading one onto the ix87's stack ) that must trap or else produce a new nonsignaling NaN. ( Another way to turn an SNaN into a NaN is to turn 0xxx...xxx into 1xxx...xxx with a logical OR.) Intended for, among other things, data missing from statistical collections, and for uninitialized variables[…]» ( http://www.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF) x87 is legacy, it predates IEEE754 by 5 years and should be forgotten. Note also that the string representation for a signalling nan is "NANS", so it reasonable to save it to file if you need to represent missing data. "NAN" represents 0/0, sqrt(-1), not missing data. I'm not really sure how it can be interpreted differently? Ola.

Been using it for a bit now, I think the only thing I have to say is having to insert all of those `JSONValue` everywhere is tiresome and I never know when I have to do it. Atila On Thursday, 21 August 2014 at 22:35:18 UTC, Sönke Ludwig wrote: > Following up on the recent "std.jgrandson" thread [1], I've picked up the work (a lot earlier than anticipated) and finished a first version of a loose blend of said std.jgrandson, vibe.data.json and some changes that I had planned for vibe.data.json for a while. I'm quite pleased by the results so far, although without a serialization framework it still misses a very important building block. > > Code: https://github.com/s-ludwig/std_data_json > Docs: http://s-ludwig.github.io/std_data_json/ > DUB: http://code.dlang.org/packages/std_data_json > > The new code contains: > - Lazy lexer in the form of a token input range (using slices of the > input if possible) > - Lazy streaming parser (StAX style) in the form of a node input range > - Eager DOM style parser returning a JSONValue > - Range based JSON string generator taking either a token range, a > node range, or a JSONValue > - Opt-out location tracking (line/column) for tokens, nodes and values > - No opDispatch() for JSONValue - this has shown to do more harm than > good in vibe.data.json > > The DOM style JSONValue type is based on std.variant.Algebraic. This currently has a few usability issues that can be solved by upgrading/fixing Algebraic: > > - Operator overloading only works sporadically > - No "tag" enum is supported, so that switch()ing on the type of a > value doesn't work and an if-else cascade is required > - Operations and conversions between different Algebraic types is not > conveniently supported, which gets important when other similar > formats get supported (e.g. BSON) > > Assuming that those points are solved, I'd like to get some early feedback before going for an official review. One open issue is how to handle unescaping of string literals. Currently it always unescapes immediately, which is more efficient for general input ranges when the unescaped result is needed, but less efficient for string inputs when the unescaped result is not needed. Maybe a flag could be used to conditionally switch behavior depending on the input range type. > > Destroy away! ;) > > [1]: http://forum.dlang.org/thread/lrknjl$co7$1@digitalmars.com

October 12, 2014

Re: RFC: std.json sucessor

Posted by Andrei Alexandrescu
in reply to Sönke Ludwig

Permalink

Andrei Alexandrescu

Posted in reply to Sönke Ludwig

Permalink

Here's my destruction of std.data.json.

* lexer.d:

** Beautifully done. From what I understand, if the input is string or immutable(ubyte)[] then the strings are carved out as slices of the input, as opposed to newly allocated. Awesome.

** The string after lexing is correctly scanned and stored in raw format (escapes are not rewritten) and decoded on demand. Problem with decoding is that it may allocate memory, and it would be great (and not difficult) to make the lexer 100% lazy/non-allocating. To achieve that, lexer.d should define TWO "Kind"s of strings at the lexer level: regular string and undecoded string. The former is lexer.d's way of saying "I got lucky" in the sense that it didn't detect any '\\' so the raw and decoded strings are identical. No need for anyone to do any further processing in the majority of cases => win. The latter means the lexer lexed the string, saw at least one '\\', and leaves it to the caller to do the actual decoding.

** After moving the decoding business out of lexer.d, a way to take this further would be to qualify lexer methods as @nogc if the input is string/immutable(ubyte)[]. I wonder how to implement a conditional attribute. We'll probably need a language enhancement for that.

** The implementation uses manually-defined tagged unions for work. Could we use Algebraic instead - dogfooding and all that? I recall there was a comment in Sönke's original work that Algebraic has a specific issue (was it false pointers?) - so the question arises, should we fix Algebraic and use it thus helping other uses as well?

** I see the "boolean" kind, should we instead have the "true_" and "false_" kinds?

** Long story short I couldn't find any major issue with this module, and I looked! I do think the decoding logic should be moved outside of lexer.d or at least the JSONLexerRange.

* generator.d: looking good, no special comments. Like the consistent use of structs filled with options as template parameters.

* foundation.d:

** At four words per token, Location seems pretty bulky. How about reducing line and column to uint?

** Could JSONException create the message string in toString (i.e. when/if used) as opposed to in the constructor?

* parser.d:

** How about using .init instead of .defaults for options?

** I'm a bit surprised by JSONParserNode.Kind. E.g. the objectStart/End markers shouldn't appear as nodes. There should be an "object" node only. I guess that's needed for laziness.

** It's unclear where memory is being allocated in the parser. @nogc annotations wherever appropriate would be great.

* value.d:

** Looks like this is/may be the only place where memory is being managed, at least if the input is string/immutable(ubyte)[]. Right?

** Algebraic ftw.

============================

Overall: This is very close to everything I hoped! A bit more care to @nogc would be awesome, especially with the upcoming focus on memory management going forward.

After one more pass it would be great to move forward for review.


Andrei

On Sunday, 12 October 2014 at 18:17:29 UTC, Andrei Alexandrescu wrote: > > ** The string after lexing is correctly scanned and stored in raw format (escapes are not rewritten) and decoded on demand. Problem with decoding is that it may allocate memory, and it would be great (and not difficult) to make the lexer 100% lazy/non-allocating. To achieve that, lexer.d should define TWO "Kind"s of strings at the lexer level: regular string and undecoded string. The former is lexer.d's way of saying "I got lucky" in the sense that it didn't detect any '\\' so the raw and decoded strings are identical. No need for anyone to do any further processing in the majority of cases => win. The latter means the lexer lexed the string, saw at least one '\\', and leaves it to the caller to do the actual decoding. I'd like to see unescapeStringLiteral() made public. Then I can unescape multiple strings to the same preallocated destination, or even unescape in place (guaranteed to work since the result will always be smaller than the input).

Forums