std.data.json formal review (page 20)

On 18-Aug-2015 19:35, Andrei Alexandrescu wrote: > On 8/18/15 12:31 PM, Dmitry Olshansky wrote: >> On 18-Aug-2015 16:19, Andrei Alexandrescu wrote: >>> On 8/18/15 2:55 AM, Dmitry Olshansky wrote: >>>> On 18-Aug-2015 01:33, Andrei Alexandrescu wrote: >>>>> On 8/17/15 2:47 PM, Dmitry Olshansky wrote: >>>>>> >>>>>> Actually one can combine the two: >>>>>> - use integer type tag for everything built-in >>>>>> - use pointer tag for what is not >>>>> >>>>> But a pointer tag can do everything that an integer tag does. -- >>>>> Andrei >>>> >>>> albeit quite a deal slooower. >>> >>> I think there's a misunderstanding. Pointers _are_ 64-bit integers and >>> may be compared as such. You can use a pointer as an integer. -- Andrei >>> >> >> Integer in a small range is faster to switch on. Plus comparing to zero >> is faster, so if the common type has tag == 0 it's a net gain. > > Agreed. These are small gains though unless tight loops are concerned. > >> Strictly speaking pointer with vtbl is about as fast as switch but when >> we have to switch on 2 types the vtbl dispatch needs to be based on 2 >> types instead of one. So ideally we need vtbl per pair of type to >> support e.g. fast binary operators on TaggedAlgebraic. > > But I'm talking about using pointers for indirect calls IN ADDITION to > using pointers for simple integral comparison. So the comparison is not > appropriate. It's better to have both options instead of just one. > If common type fast path with 0 is not relevant then the only gain of integer is being able to fit it in a couple of bytes or even reuse some vacant bits. Another thing is that function addresses are rather sparse so switch statement should do some special preprocessing to make it more dense: - subtract start of the code segment (maybe, but this won't work with DLLs though) - shift right by 2(4?) as functions are usually aligned -- Dmitry Olshansky

On Tuesday, 18 August 2015 at 14:58:08 UTC, Andrei Alexandrescu wrote: > That's a language issue - switch does not work with any pointers. I just submitted https://issues.dlang.org/show_bug.cgi?id=14931. -- Andrei No it is not. Is the set of values is not compact, no jump table.

On Tuesday, 18 August 2015 at 16:22:20 UTC, Andrei Alexandrescu wrote: > On 8/18/15 11:39 AM, Johannes Pfau wrote: >> No, this won't improve the ASM much: Enum values start at 0 and are >> consecutive. With a final switch they're also bounded. All these points >> do not apply to pointers. They don't start at 0, are not guaranteed to >> be consecutive and likely can't be used with final switch. Because of >> that a switch on pointers can never use jump tables. > > I agree there's a margin here in favor of integers, but it's getting thin. Meanwhile, pointers maintain large advantages of principle. I suggest we pursue better use of pointers as tags instead of adding integral-tagged unions to phobos. -- Andrei No, enum can also be cramed inline in the code for cheap, they can be inserted in existing structure for cheap using bits manipulations most of the time, the compiler can check that all cases are handled in an exhaustive manner. It is not getting thinner.

August 18, 2015

Re: std.data.json formal review

Posted by Sönke Ludwig
in reply to Andrei Alexandrescu

Permalink

Sönke Ludwig

Posted in reply to Andrei Alexandrescu

Permalink

Am 18.08.2015 um 00:37 schrieb Andrei Alexandrescu:
> On 8/17/15 2:56 PM, Sönke Ludwig wrote:
>> - The enum is useful to be able to identify the types outside of the D
>> code itself. For example when serializing the data to disk, or when
>> communicating with C code.
>
> OK.
>
>> - It enables the use of pattern matching (final switch), which is often
>> very convenient, faster, and safer than an if-else cascade.
>
> Sounds tenuous.

It's more convenient/readable in cases where a complex type is used (typeID == Type.object vs. has!(JSONValue[string]). This is especially true if the type is ever changed (or parametric) and all has!()/get!() code needs to be adjusted accordingly.

It's faster, even if there is no indirect call involved in the pointer case, because the compiler can emit efficient jump tables instead of generating a series of conditional jumps (if-else-cascade).

It's safer because of the possibility to use final switch in addition to a normal switch.

I wouldn't call that tenuous.

>
>> - A hypothesis is that it is faster, because there is no function call
>> indirection involved.
>
> Again: pointers do all integrals do. To compare:
>
> if (myptr == ThePtrOf!int) { ... this is an int ... }
>
> I want to make clear that this is understood.

Got that.

>
>> - It naturally enables fully statically typed operator forwarding as far
>> as possible (have a look at the examples of the current version). A
>> pointer based version could do this, too, but only by jumping through
>> hoops.
>
> I'm unclear on that. Could you please point me to the actual file and
> lines?

See the operator implementation code [1] that is completely statically typed until the final "switch" happens [2]. You can of course do the same for the pointer based Algebraic, but that would just duplicate/override the code that is already implemented by the pointer method.

>> - The same type can be used multiple times with a different enum name.
>> This can alternatively be solved using a Typedef!T, but I had several
>> occasions where that proved useful.
>
> Unclear on this.

I'd say this is just a little perk of the representation but not a hard argument since it can be achieved in a different way relatively easily.

[1]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L145
[2]: https://github.com/s-ludwig/taggedalgebraic/blob/591b45ca8f99dbab1da966192c67f45354c1e34e/source/taggedalgebraic.d#L551

On 2015-08-18 17:18, Andrei Alexandrescu wrote: > Me neither if internal. I do see a problem if it's public. -- Andrei If it's public and those 20 lines are useful on its own, I don't see a problem with that either. -- /Jacob Carlborg

On 8/18/15 1:24 PM, Jacob Carlborg wrote: > On 2015-08-18 17:18, Andrei Alexandrescu wrote: > >> Me neither if internal. I do see a problem if it's public. -- Andrei > > If it's public and those 20 lines are useful on its own, I don't see a > problem with that either. In this case at least they aren't. There is no need to import the JSON exception and the JSON location without importing anything else JSON. -- Andrei

On 19-Aug-2015 04:58, Andrei Alexandrescu wrote: > On 8/18/15 1:24 PM, Jacob Carlborg wrote: >> On 2015-08-18 17:18, Andrei Alexandrescu wrote: >> >>> Me neither if internal. I do see a problem if it's public. -- Andrei >> >> If it's public and those 20 lines are useful on its own, I don't see a >> problem with that either. > > In this case at least they aren't. There is no need to import the JSON > exception and the JSON location without importing anything else JSON. -- > Andrei > To catch it? Generally I agree - just merge things sensibly, there could be traits.d/primitives.d module should it define isXYZ constraints and other lightweight interface-only entities. -- Dmitry Olshansky

Am 19.08.2015 um 03:58 schrieb Andrei Alexandrescu: > On 8/18/15 1:24 PM, Jacob Carlborg wrote: >> On 2015-08-18 17:18, Andrei Alexandrescu wrote: >> >>> Me neither if internal. I do see a problem if it's public. -- Andrei >> >> If it's public and those 20 lines are useful on its own, I don't see a >> problem with that either. > > In this case at least they aren't. There is no need to import the JSON > exception and the JSON location without importing anything else JSON. -- > Andrei > The only other module where it would fit would be lexer.d, but that means that importing JSONValue also has to import the parser and lexer modules, which is usually only needed in a few places.

On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote: > > - JSONValue should offer a byToken range, which offers the contents of > the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' > token followed by three numeric tokens with the respective values > followed by the ']' token. What about the comma tokens?

On 8/19/15 8:42 AM, Timon Gehr wrote: > On 08/18/2015 12:21 AM, Andrei Alexandrescu wrote: >> >> - JSONValue should offer a byToken range, which offers the contents of >> the value one token at a time. For example, "[ 1, 2, 3 ]" offers the '[' >> token followed by three numeric tokens with the respective values >> followed by the ']' token. > > What about the comma tokens? Forgot about those. The invariant is that byToken should return a sequence of tokens that, when parsed, produces the originating object. -- Andrei

Forums