std.uuid: Update 4 (page 5)

On 17-06-2012 10:22, Johannes Pfau wrote: > Am Sun, 17 Jun 2012 12:15:00 +0400 > schrieb Dmitry Olshansky<dmitry.olsh@gmail.com>: > >> On 17.06.2012 12:04, Johannes Pfau wrote: >>> Am Sat, 16 Jun 2012 21:11:51 +0400 >>> schrieb Dmitry Olshansky<dmitry.olsh@gmail.com>: >>>> >>>> Ah and another way to go about it is: >>>> union { >>>> ubyte[16] uuid; >>>> size_t[16/size_t.sizeof] by_word; >>>> } >>>> >>> >>> Isn't that an optimization which should really be done by the >>> compiler? It already knows that it's supposed to compare two >>> ubyte[16]... >> >> It knows that you compare two ubyte[16] that it. >> It easily might miss the fact that one of them is always 0 in all 16 >> cells. >> >>> >>> Also how could the union solution be used without having to copy the >>> data? >> >> There is no copy it union, aka overlapped storage. In other words as >> these 16 bytes represented as (on 32bit) 4 size_t. They share memory >> location. It doesn't play nice with CTFE though, as it thinks unions >> to be plain struct last time I checked. >> > > Yes, I thought about using the union nested in the empty function, so a > copy would have been needed: > > bool empty() > { > union > { > ubyte[16] uuid; > .... > } > uuid = data; //copy > } > > I didn't know that it's possible to make members of a union private > though, so I could use this in the UUID struct: > union > { > ubyte[16] data; > private size_t[16/size_t.sizeof] by_word; > } > > which indeed wouldn't require copying. However, with this code added > std.uuid fails some unrelated ctfe UUID comparison unittests... I wouldn't expect unions to be CTFE-safe at all. They allow reinterpreting anything as anything, which could screw everything up in a managed environment like the CTFE interpreter. Anyway, isn't this optimizing something that really doesn't need optimization? I'd be very surprised if this shows up in any profiling results at all. -- Alex Rønne Petersen alex@lycus.org http://lycus.org

June 17, 2012

Re: Review: std.uuid

Posted by Johannes Pfau
in reply to Jonathan M Davis

Permalink

Johannes Pfau

Posted in reply to Jonathan M Davis

Permalink

Am Sun, 17 Jun 2012 01:10:16 -0700
schrieb Jonathan M Davis <jmdavisProg@gmx.com>:

> I seriously question that it will _ever_ be anything worse then Mt19997, but if you're that worried about it, maybe you should add a static assert that Random is Mt19997? If you want to leave it as-is, then you should probably at least put a comment in there as to why you're using Mt19997 instead of just using rndGen, otherwise, someone may come along and change it to use rndGen later.

OK, I added the static assert and use rndGen now.
I'll have to update
the pull request so that the new seeding method is used in rndGen
though.

> > > So, I don't see what you're doing that's any different from just doing
> > > 
> > > return randomUUID(rndGen());
> > > 
> > > aside from the fact that you seem to have an impossible line of code for seeding your random number generator. And trying to compile that line on my computer actually fails (as I'd expect), so I don't know how it's compiling on yours.
> > 
> > Yes, that code requires a new overload for seed, see
> > https://github.com/D-Programming-Language/phobos/pull/627
> > And yes, it's generating a infinite range of unpredictableSeed
> > (Mt19937 uses 624 values from that range)
> > 
> > I don't know anything about RNGs, but Andrew Talbot suggested in the thread called "Mersenne Twister Seeding and UUIDs" that we need this new, better seeding method.
> 
> Ah, okay. Apparently I missed that pull request. I'll have to look it over, particularly since I seem to be pretty much the only one merging anything in of late (particularly since Andrei has been too busy to do much with D lately).
> 
> > > > I don't think that's a good idea. If the randomGen.front the is e.g. ubyte, casting it to uint will produce 3 bytes which are not set to a random value. That's not acceptable.
> > > > 
> > > > ------------
> > > > @trusted UUID randomUUID(RNG)(ref RNG randomGen)
> > > > if(isUniformRNG!(RNG) &&
> > > > 
> > > >     isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof
> > > > <=
> > > > 
> > > > 16) {
> > > > 
> > > >     enum elemSize = typeof(RNG.front).sizeof;
> > > >     UUID u;
> > > >     foreach(size_t i; iota(0, 16, elemSize))
> > > >     {
> > > > 
> > > >         randomGen.popFront();
> > > >         immutable randomValue = randomGen.front;
> > > >         u.data[i .. i + elemSize] =
> > > > 
> > > > *cast(ubyte[elemSize]*)&randomValue; }
> > > > ------------
> > > > ?
> > > 
> > > Looks good except for the fact that the typeof(RNG.front).sizeof <= 16 is completely unnecessary. isIntgeral guarantees that the type is a built-in integral type, the largest of which is 8 bytes (and even if we added cent and ucent, it would still be only 16, not greater than 16).
> > 
> > I'll make it a static assert. It might not be necessary, but it at least documents that elemSize can't be > 16 and it's another safety measure.
> 
> But it's _completely_ unnecessary. You've already guaranteed it with isIntegral. I guess that you can leave it there if you want to, but as far as I can see, it's pointless.
> 
> > > > > * For your functions which can take range types, test with filter!"true" so that you get a range which isn't an array or string.
> > > > 
> > > > Does take!(string, int) return a slice? Should have thought
> > > > about that....
> > > 
> > > No. It doesn't, because hasSlicing!string is false.
> > > 
> > > > I can't use filter (and map) though:
> > > > std/algorithm.d(1141): Error: nested structs with constructors
> > > > are not yet supported in CTFE (Bug 6419) std/uuid.d(1273):
> > > > called from here:
> > > > filter("00000000-0000-0000-0000-000000000000"d)
> > > 
> > > filter!"true"("00000000-0000-0000-0000-000000000000") should compile just fine. You then pass that to your range-based uuid functions which accept ranges or dchar. What does CTFE have to do with it? That would only be an issue if you were trying to use filter in CTFE, and even if you have CTFE tests where you'd like to do that and can't due to that bug, there are plenty of tests that you could run which weren't CTFE. I think that one or both of us is misunderstanding something here.
> > 
> > I _do_ have CTFE tests. parseUUID and UUID.this(string) are guaranteed to work in CTFE as well. And if I have to use a custom Range for the CTFE tests anyway, why not use it for all tests? (This also allows me to explicitly test Forward and Input Ranges)
> 
> If you have to create other ranges for your tests anyway, then that's fine. But you seemed to be saying that filter didn't work at all, and it should.
> 
> > > > Thanks for your feedback!
> > > 
> > > Also, I just noticed that it looks like you don't have any tests which verify that empty is false when it's supposed to be. You should probably add a few.
> > 
> > There's one such test, but I can add some more.
> 
> You probably don't need a lot, but it does need to be tested, and I didn't see any tests for empty which were on a UUID which wasn't supposed to be empty.
> 
> > > And by the way, before this actually gets merged into Phobos (as I'm guessing that it will be, since no way seems to particularly dislike the module thus far), you should fix the ddoc to use LREF and XREF when referencing symbols elsewhere in the module and elsewhere in std respectively.
> > 
> > OK, I replaced some $(D ) with XREF/LREF but MYREF1,2,3 probably
> > have to stay. Those are needed for the table (at least it's done
> > that way in std.algorithm)
> 
> Stuff for the table is fine. It's primarily an issue of the ddoc directly on functions which you've been using $(D ) with, since that doesn't create links to anything.
> 
> - Jonathan M Davis

On 17.06.2012 12:36, Alex Rønne Petersen wrote: > On 17-06-2012 10:22, Johannes Pfau wrote: >> Am Sun, 17 Jun 2012 12:15:00 +0400 >> schrieb Dmitry Olshansky<dmitry.olsh@gmail.com>: >> >>> On 17.06.2012 12:04, Johannes Pfau wrote: >>>> Am Sat, 16 Jun 2012 21:11:51 +0400 >>>> schrieb Dmitry Olshansky<dmitry.olsh@gmail.com>: >>>>> >>>>> Ah and another way to go about it is: >>>>> union { >>>>> ubyte[16] uuid; >>>>> size_t[16/size_t.sizeof] by_word; >>>>> } >>>>> >>>> >>>> Isn't that an optimization which should really be done by the >>>> compiler? It already knows that it's supposed to compare two >>>> ubyte[16]... >>> >>> It knows that you compare two ubyte[16] that it. >>> It easily might miss the fact that one of them is always 0 in all 16 >>> cells. >>> >>>> >>>> Also how could the union solution be used without having to copy the >>>> data? >>> >>> There is no copy it union, aka overlapped storage. In other words as >>> these 16 bytes represented as (on 32bit) 4 size_t. They share memory >>> location. It doesn't play nice with CTFE though, as it thinks unions >>> to be plain struct last time I checked. >>> >> >> Yes, I thought about using the union nested in the empty function, so a >> copy would have been needed: >> >> bool empty() >> { >> union >> { >> ubyte[16] uuid; >> .... >> } >> uuid = data; //copy >> } >> >> I didn't know that it's possible to make members of a union private >> though, so I could use this in the UUID struct: >> union >> { >> ubyte[16] data; >> private size_t[16/size_t.sizeof] by_word; >> } >> >> which indeed wouldn't require copying. However, with this code added >> std.uuid fails some unrelated ctfe UUID comparison unittests... > Then I suggest to just go with local performance hack: bool empty() nothrow @safe ... { if(__ctfe) return find!"a!=0"(data).empty; //simple size_t* p = cast(size_t*)data.ptr; static if(size_t.szieof == 4) return p[0] == 0 && p[1] == 0 && p[2] == 0 && p[3] == 0; else static if(size_t.sizeof == 8) return p[0] == 0 && [1] == 0; else static assert(false, "nonsense, it's not 32 or 64 bit"); } > I wouldn't expect unions to be CTFE-safe at all. They allow > reinterpreting anything as anything, which could screw everything up in > a managed environment like the CTFE interpreter. > > Anyway, isn't this optimizing something that really doesn't need > optimization? I'd be very surprised if this shows up in any profiling > results at all. > The "pleasure" of writing standard library is that you'll never know what is bottleneck. All depends on users. And the more you have them the greater impact of things that have "one in a million" probability. -- Dmitry Olshansky

Forums