Jump to page: 1 2
Thread overview
Reference semantic ranges and algorithms (and std.random)
Sep 18, 2012
monarch_dodra
Sep 18, 2012
Jonathan M Davis
Sep 20, 2012
monarch_dodra
Sep 20, 2012
Jonathan M Davis
Sep 20, 2012
monarch_dodra
Sep 20, 2012
Johannes Pfau
Sep 20, 2012
monarch_dodra
Sep 20, 2012
Johannes Pfau
Sep 20, 2012
monarch_dodra
Sep 20, 2012
Jonathan M Davis
Sep 21, 2012
monarch_dodra
Sep 21, 2012
monarch_dodra
Sep 21, 2012
Jonathan M Davis
Sep 21, 2012
monarch_dodra
Sep 22, 2012
jerro
Sep 21, 2012
Jesse Phillips
Sep 25, 2012
monarch_dodra
Oct 26, 2012
monarch_dodra
September 18, 2012
I've been developing a PRNG recently, and studying how it behaves with phobos' other stuff, in particular, range and algorithm.

I quickly came to the conclusion that if a PRNG does not have reference semantics, it is just as good as useless.

Why? Because everything in phobos is passed "by value". For example:

--------
import std.random, std.stdio, std.range, std.algorithm;

void main()
{
    auto rng = MinstdRand(); rng.seed();
    auto r1 = rng.take(5).array();
    auto r2 = rng.take(5).array();
    writeln(r1);
    writeln(r2);
    copy(rng.take(5), r1);
    copy(rng.take(5), r2);
    writeln(r1);
    writeln(r2);
}
--------
[48271, 182605794, 1291394886, 1914720637, 2078669041]
[48271, 182605794, 1291394886, 1914720637, 2078669041]
[48271, 182605794, 1291394886, 1914720637, 2078669041]
[48271, 182605794, 1291394886, 1914720637, 2078669041]
--------
As you can see from the ouput, that is not very random. That's just the "tip of the iceberg". *Anything* in phobos that iterates on a range, such a fill, filter, or whatever, will not advance the PRNG, arguably breaking it.

At best, a "global" PRNG will work, provided it is never ever passed as an argument to a method.

This is issue #1: I'd propose that all objects in std.random be migrated to classes (or be made reference structs), sooner than later. This might break some code, so I do not know how this is usually done, but I think it is necessary. I do not, however, propose that they should all derive from a base class.

The second issue I've run into (issue #2) is that even with reference semantics, there are still some issues regarding phobo's behavior with references ranges: Completly un-documented and inconsistent.

For example, if one were to call copy(source, target), would "source" be advanced? would "target" be advanced? Depends really. You'd be surprised too: For example, fill, with an infinite range, would save the filler before starting.

I was making a pull for optimizing fill/copy, and the thing kind of exploded in my face. I decided to take this opportunity to try to formalize and document this behavior.

I was hoping for some info/feedback/suggestions...
The pull request is here:
https://github.com/D-Programming-Language/phobos/pull/802
September 18, 2012
On Tuesday, September 18, 2012 17:05:26 monarch_dodra wrote:
> This is issue #1: I'd propose that all objects in std.random be migrated to classes (or be made reference structs), sooner than later. This might break some code, so I do not know how this is usually done, but I think it is necessary. I do not, however, propose that they should all derive from a base class.

Moving to classes would definitely break code, but it should be possible to make them reference types simply by making it so that their internal state is in a separate object held by a pointer.

> The second issue I've run into (issue #2) is that even with reference semantics, there are still some issues regarding phobo's behavior with references ranges: Completly un-documented and inconsistent.

Reference-type ranges are not properly tested by Phobos in general, so they just plain won't work in a number of cases. I've done some work in fixing that, but there's plenty more to do. I really need to fiinish the unit test helper stuff that I've been doing for ranges and get that into Phobos. It's almost done, but I was running into weird build errors and haven't sorted them out yet. Once that's done, it should be much easier to test a larger range of range types (including reference types). Regardless, std.range and std.algorithm in particular need to be properly tested for reference range types and the related bugs fixed.

- Jonathan M Davis
September 20, 2012
On Tuesday, 18 September 2012 at 17:59:04 UTC, Jonathan M Davis wrote:
> On Tuesday, September 18, 2012 17:05:26 monarch_dodra wrote:
>> This is issue #1: I'd propose that all objects in std.random be
>> migrated to classes (or be made reference structs), sooner than
>> later. This might break some code, so I do not know how this is
>> usually done, but I think it is necessary. I do not, however,
>> propose that they should all derive from a base class.
>
> Moving to classes would definitely break code, but it should be possible to
> make them reference types simply by making it so that their internal state is
> in a separate object held by a pointer.

I was thinking of doing that. The problem with this (as I've run into and stated in another thread), is a problem of initialization: The simpler PRNGs are init'ed seeded, and are ready for use immediately. Changing to this approach would break the initialization, as shown in this post:

http://forum.dlang.org/thread/bvuquzwfykiytdwsqkky@forum.dlang.org#post-yvtsivozyhqzscgddbrl:40forum.dlang.org

A "used to be valid" PRNG has now become an un-initialized PRNG". This is extremely insidious, as the code still compiles, but will crash.

I'm hoping my thread goes somewhere, but since I doubt it, I've thought up of two other solutions for a "clean" migration:

#1
Keep as struct (as you suggest), but use the opCall hack. I HATE this solution.

#2
Change to class, but leave behind some "opCall"s for each old constructor, plus an extra one for default:
class A
{
    static A opCall(){return new A();}
    static A opCall(int i){return new A(i);}

    this(){}
    this(int){}

    /// ...
}

void main()
{
    A a1 = A();  //Works AND constructs
    A a2 = A(5); //Still Works
    A a3 = new A();  //Works
    A a4 = new A(5); //Works
}

In this case, I'm fine with having some opCalls, because technically, they aren't mixed with the constructors and used as hacks: The constructors are all there, and the opCalls, merelly forward to them.

That said, they *would* be destined for deprecation.

Use of "A a;" would create a problem though, but nothing's perfect...

Is this second solution something you think I should look into?
September 20, 2012
On Thursday, September 20, 2012 08:51:28 monarch_dodra wrote:
> On Tuesday, 18 September 2012 at 17:59:04 UTC, Jonathan M Davis
> 
> wrote:
> > On Tuesday, September 18, 2012 17:05:26 monarch_dodra wrote:
> >> This is issue #1: I'd propose that all objects in std.random be migrated to classes (or be made reference structs), sooner than later. This might break some code, so I do not know how this is usually done, but I think it is necessary. I do not, however, propose that they should all derive from a base class.
> > 
> > Moving to classes would definitely break code, but it should be
> > possible to
> > make them reference types simply by making it so that their
> > internal state is
> > in a separate object held by a pointer.
> 
> I was thinking of doing that. The problem with this (as I've run into and stated in another thread), is a problem of initialization: The simpler PRNGs are init'ed seeded, and are ready for use immediately. Changing to this approach would break the initialization, as shown in this post:
> 
> http://forum.dlang.org/thread/bvuquzwfykiytdwsqkky@forum.dlang.org#post-yvts ivozyhqzscgddbrl:40forum.dlang.org
> 
> A "used to be valid" PRNG has now become an un-initialized PRNG". This is extremely insidious, as the code still compiles, but will crash.

There's always the check that the internals have been initialized on every call and initialize it if it hasn't been solution. It's not pretty, but it won't break code. It's actually a use case that makes me wish that we had something like the invariant which ran before every public function call except that it was always there (even in -release) and let you do anything you want.

In any case, while it's a bit ugly, I believe that simply adding checks for initialization in every function call is the cleanest solution from the standpoint of backwards compatibility, and the ugliness is all self-contained. As far as performance goes, it's only an issue if you're iterating over it in a tight loop, but the actual random number generation is so much more expensive than a check for a null pointer that it probably doesn't matter.

> #2
> Change to class, but leave behind some "opCall"s for each old
> constructor, plus an extra one for default:

> Is this second solution something you think I should look into?

Since

A a;

will just blow up in your face if you switch it to a class, it's not a non- breaking change even as a migration path, so I don't see that as really being viable. Even if you've found a way to minimize the immediate code breakage, you didn't eliminate it. If you're going to break code immediately, you might as well just break it all at once and get people to fix their stuff rather than mostly fix it but not quite, especially when you're asking them to change their code later anyway as part of a migration path.

Regardless, when this came up previously, I believe that the conclusion was that if we were going to switch to classes, we needed to do something like create std.random2 and schedule std.random for deprecation rather than changing the current structs to classes (either that or rename _every_ type in there and schedule them for deprecation individually, but then you have to come up for new names for everything, and it's more of a pain to migrate, since all the names changed rather than just the import). So, I believe that the idea of switching to classes was pretty much rejected previously unless entirely new types were used so that no code would be broken.

I think that we have two options at this point:

1. Switch the internals so that they're in a separate struct pointed to by the outer struct and check for initialization on every function call to avoid the problem where init was used.

2. Create a new module to replace std.random and make them final classes in there, scheduling the old module for deprecation.

Honestly, I'd just go with #1 at this point, because it avoids breaking code, and there's increasing resistance to breaking code. Even Andrei, who was fairly willing to break code for improvements before, is almost paranoid about it now, and Walter was _always_ against it. So, if we have a viable solution that avoids breaking code (especially if any ugliness that comes with it is internal to the implementation), we should probably go with that.

- Jonathan M Davis
September 20, 2012
On Thursday, 20 September 2012 at 07:26:01 UTC, Jonathan M Davis wrote:
> On Thursday, September 20, 2012 08:51:28 monarch_dodra wrote:
>> On Tuesday, 18 September 2012 at 17:59:04 UTC, Jonathan M Davis
>> 
>> wrote:
>> > On Tuesday, September 18, 2012 17:05:26 monarch_dodra wrote:
>> >> This is issue #1: I'd propose that all objects in std.random be
>> >> migrated to classes (or be made reference structs), sooner than
>> >> later. This might break some code, so I do not know how this is
>> >> usually done, but I think it is necessary. I do not, however,
>> >> propose that they should all derive from a base class.
>> > 
>> > Moving to classes would definitely break code, but it should be
>> > possible to
>> > make them reference types simply by making it so that their
>> > internal state is
>> > in a separate object held by a pointer.
>> 
>> I was thinking of doing that. The problem with this (as I've run
>> into and stated in another thread), is a problem of
>> initialization: The simpler PRNGs are init'ed seeded, and are
>> ready for use immediately. Changing to this approach would break
>> the initialization, as shown in this post:
>> 
>> http://forum.dlang.org/thread/bvuquzwfykiytdwsqkky@forum.dlang.org#post-yvts
>> ivozyhqzscgddbrl:40forum.dlang.org
>> 
>> A "used to be valid" PRNG has now become an un-initialized PRNG".
>> This is extremely insidious, as the code still compiles, but will
>> crash.
>
> There's always the check that the internals have been initialized on every
> call and initialize it if it hasn't been solution. It's not pretty, but it
> won't break code. It's actually a use case that makes me wish that we had
> something like the invariant which ran before every public function call
> except that it was always there (even in -release) and let you do anything you
> want.
>
> In any case, while it's a bit ugly, I believe that simply adding checks for
> initialization in every function call is the cleanest solution from the
> standpoint of backwards compatibility, and the ugliness is all self-contained.
> As far as performance goes, it's only an issue if you're iterating over it in
> a tight loop, but the actual random number generation is so much more
> expensive than a check for a null pointer that it probably doesn't matter.
>
>> #2
>> Change to class, but leave behind some "opCall"s for each old
>> constructor, plus an extra one for default:
>
>> Is this second solution something you think I should look into?
>
> Since
>
> A a;
>
> will just blow up in your face if you switch it to a class, it's not a non-
> breaking change even as a migration path, so I don't see that as really being
> viable. Even if you've found a way to minimize the immediate code breakage,
> you didn't eliminate it. If you're going to break code immediately, you might
> as well just break it all at once and get people to fix their stuff rather than
> mostly fix it but not quite, especially when you're asking them to change their
> code later anyway as part of a migration path.
>
> Regardless, when this came up previously, I believe that the conclusion was
> that if we were going to switch to classes, we needed to do something like
> create std.random2 and schedule std.random for deprecation rather than
> changing the current structs to classes (either that or rename _every_ type in
> there and schedule them for deprecation individually, but then you have to
> come up for new names for everything, and it's more of a pain to migrate,
> since all the names changed rather than just the import). So, I believe that
> the idea of switching to classes was pretty much rejected previously unless
> entirely new types were used so that no code would be broken.
>
> I think that we have two options at this point:
>
> 1. Switch the internals so that they're in a separate struct pointed to by the
> outer struct and check for initialization on every function call to avoid the
> problem where init was used.
>
> 2. Create a new module to replace std.random and make them final classes in
> there, scheduling the old module for deprecation.
>
> Honestly, I'd just go with #1 at this point, because it avoids breaking code,
> and there's increasing resistance to breaking code. Even Andrei, who was
> fairly willing to break code for improvements before, is almost paranoid about
> it now, and Walter was _always_ against it. So, if we have a viable solution
> that avoids breaking code (especially if any ugliness that comes with it is
> internal to the implementation), we should probably go with that.
>
> - Jonathan M Davis

TY for your insight. Good points. I'll try to do your "#1": It is simple and non breaking. *IF* we ever do decide to break, and rather it be done after a very thourough discussion of requirements, specifications, migration path, etc...
September 20, 2012
Am Thu, 20 Sep 2012 08:51:28 +0200
schrieb "monarch_dodra" <monarchdodra@gmail.com>:

> > Moving to classes would definitely break code, but it should be
> > possible to
> > make them reference types simply by making it so that their
> > internal state is
> > in a separate object held by a pointer.
> 
> I was thinking of doing that. The problem with this (as I've run into and stated in another thread), is a problem of initialization: The simpler PRNGs are init'ed seeded, and are ready for use immediately. Changing to this approach would break the initialization, as shown in this post:

How would the internal state be allocated? malloc + refcounting or GC?

I'm not really happy about that as I'd like to avoid allocation (especially GC) whenever possible. Using a PRNG only locally in a function seems like a valid use case to me and it currently doesn't require allocation. As the similar problem with std.digest/std.algorithm.copy has showed value type ranges seem not very well supported in phobos.

I wonder why we don't pass all struct/value type ranges by ref? Is there a reason this wouldn't work?

Or we could leave the PRNGs as is and provide reference wrappers. This would allow to place the PRNG on the stack and still make it work with all range functions. But it's also cumbersome and error prone.

Best solution is probably to have wrappers which allocate by default, but also allow to pass a reference to a stack allocated value. Then make the wrappers default, so the default's safe and easy to use.

Pseudo-code:

struct RNG_Impl
{
    uint front();
    void popFront();
    bool empty();
}

struct RNG
{
    RNG_Impl* impl;

    this(ref RNG_Impl impl)
        impl = cast(RNG_Impl*) impl;

    void initialize()
    {
        assert(!impl); //Either initialize or constructor
        impl = new RNG_Impl();
    }
}

RNG_Impl impl;
RNG(impl).take(5); //No allocation (but must not leak references...)

Regarding the initialization check: I'd avoid the check in release mode. Not initializing a struct is a developer mistake and should be found in debug mode. I think it's unlikely that error handling code can handle such a situation anyway. But you could check how std.typecons.RefCounted handles this, as it also need explicit initialization.
September 20, 2012
On Thursday, 20 September 2012 at 09:58:41 UTC, Johannes Pfau wrote:
> Am Thu, 20 Sep 2012 08:51:28 +0200
> schrieb "monarch_dodra" <monarchdodra@gmail.com>:
>
>> > Moving to classes would definitely break code, but it should be possible to
>> > make them reference types simply by making it so that their internal state is
>> > in a separate object held by a pointer.
>> 
>> I was thinking of doing that. The problem with this (as I've run into and stated in another thread), is a problem of initialization: The simpler PRNGs are init'ed seeded, and are ready for use immediately. Changing to this approach would break the initialization, as shown in this post:
>
> How would the internal state be allocated? malloc + refcounting or GC?
>
> I'm not really happy about that as I'd like to avoid allocation
> (especially GC) whenever possible. Using a PRNG only locally in a
> function seems like a valid use case to me and it currently doesn't
> require allocation. As the similar problem with
> std.digest/std.algorithm.copy has showed value type ranges seem not
> very well supported in phobos.
>
> I wonder why we don't pass all struct/value type ranges by ref? Is
> there a reason this wouldn't work?
>
> Or we could leave the PRNGs as is and provide reference wrappers. This
> would allow to place the PRNG on the stack and still make it work with
> all range functions. But it's also cumbersome and error prone.
>
> Best solution is probably to have wrappers which allocate by default,
> but also allow to pass a reference to a stack allocated value. Then
> make the wrappers default, so the default's safe and easy to use.
>
> Pseudo-code:
>
> struct RNG_Impl
> {
>     uint front();
>     void popFront();
>     bool empty();
> }
>
> struct RNG
> {
>     RNG_Impl* impl;
>
>     this(ref RNG_Impl impl)
>         impl = cast(RNG_Impl*) impl;
>
>     void initialize()
>     {
>         assert(!impl); //Either initialize or constructor
>         impl = new RNG_Impl();
>     }
> }
>
> RNG_Impl impl;
> RNG(impl).take(5); //No allocation (but must not leak references...)
>
> Regarding the initialization check: I'd avoid the check in release
> mode. Not initializing a struct is a developer mistake and should be
> found in debug mode. I think it's unlikely that error handling code can
> handle such a situation anyway. But you could check how
> std.typecons.RefCounted handles this, as it also need explicit
> initialization.

That is a good point, I'd also make the "default" heap allocated, but give a way to access a stack allocated payload.

Regarding the "developer mistake", the problem is that currently:
"Misnstdrand a;"
Will create a valid and seeded PRNG that is ready for use, so we can't break that.

Arguably though, the argument holds for Mersenne twister, which needs a function call to be (default) seeded.

HOWEVER I find that:
auto a = MersenneTwister();  //Not seeded and invalid
auto b = MersenneTwister(5); //Seeded and valid
Is a confusing, especially since MersenneTwister provides "seed()" to default seed.

I'd rather have:
auto a = MersenneTwister();  //Not *yet* seeded: It will be done on the fly...
auto b = MersenneTwister(5); //Seeded and valid

But AGAIN, on the other hand, if you change back again to your proposed stack allocated MersenneTwisterImpl:
auto a = MersenneTwister();  //Not Seeded, but not assertable
auto b = MersenneTwister(5); //Seeded and valid

It really feels like there is no perfect solution.

********
The truth is that I would have rather ALL prngs not have ANY constructors, and that they ALL required an explicit seed: This would be uniform, and not have any surprises.

OR

That creating a prng would seed it with the default seed if nothing is specified, meaning that a prng is ALWAYS valid.

It feels like the current behavior is a bastardly hybrid...
September 20, 2012
Am Thu, 20 Sep 2012 12:23:12 +0200
schrieb "monarch_dodra" <monarchdodra@gmail.com>:
> 
> That is a good point, I'd also make the "default" heap allocated, but give a way to access a stack allocated payload.
> 
> Regarding the "developer mistake", the problem is that currently:
> "Misnstdrand a;"
> Will create a valid and seeded PRNG that is ready for use, so we
> can't break that.

Well for by reference types we'd need intialization, so there's probably no way not to break it. We'd probably have to do the initialization check in release mode for some phobos releases. But after some time it should be changed to a debug-only warning.

> 
> Arguably though, the argument holds for Mersenne twister, which needs a function call to be (default) seeded.
> 
> HOWEVER I find that:
> auto a = MersenneTwister();  //Not seeded and invalid
> auto b = MersenneTwister(5); //Seeded and valid
> Is a confusing, especially since MersenneTwister provides
> "seed()" to default seed.
> 
> I'd rather have:
> auto a = MersenneTwister();  //Not *yet* seeded: It will be done
> on the fly...
> auto b = MersenneTwister(5); //Seeded and valid
> 
> But AGAIN, on the other hand, if you change back again to your
> proposed stack allocated MersenneTwisterImpl:
> auto a = MersenneTwister();  //Not Seeded, but not assertable
> auto b = MersenneTwister(5); //Seeded and valid
> 
> It really feels like there is no perfect solution.
> 
> ********
> The truth is that I would have rather ALL prngs not have ANY
> constructors, and that they ALL required an explicit seed: This
> would be uniform, and not have any surprises.
> 
> OR
> 
> That creating a prng would seed it with the default seed if nothing is specified, meaning that a prng is ALWAYS valid.
> 
> It feels like the current behavior is a bastardly hybrid...

That's what I did in std.digest. All Digests have a start() method, even if it's not necessary for that specific Digest. It's probably a good solution if you want a uniform interface for types which need initialization and types which don't.

But there's also another, nice solution which should work: Introduce a new makeRNG template function. This func checks if the RNG has a seed function and calls it if available. Then you can do this for all RNGs:

auto rng = makeRNG!MersenneTwister();
auto rng = makeRNG!Misnstdrand();
September 20, 2012
On Thursday, 20 September 2012 at 10:46:22 UTC, Johannes Pfau wrote:
>
> That's what I did in std.digest. All Digests have a start() method,
> even if it's not necessary for that specific Digest. It's probably a
> good solution if you want a uniform interface for types which need
> initialization and types which don't.
>
> But there's also another, nice solution which should work: Introduce a
> new makeRNG template function. This func checks if the RNG has a seed
> function and calls it if available. Then you can do this for all RNGs:
>
> auto rng = makeRNG!MersenneTwister();
> auto rng = makeRNG!Misnstdrand();

These are good suggestions, but they are also breaking changes :/ I *AM* writing them down though, should we ever go down Jonathan M Davis's suggestion is changing the module.

BTW: std.container also has MakeContainter, but, AFAIK, I've never seen ANYONE use it :/
September 20, 2012
On Thursday, September 20, 2012 12:57:03 monarch_dodra wrote:
> BTW: std.container also has MakeContainter, but, AFAIK, I've never seen ANYONE use it :/

What std.container has is make, which is supposed to construct a type where it goes by default (classes on the heap with new, and structs on the stack). The idea is definitely useful, and I have a pull request to improve upon it:

https://github.com/D-Programming-Language/phobos/pull/756

But I don't think that I've ever seen anyone use std.container.make either. The new version will be more useful though (particularly, since it'll work on more than just structs and classes).

- Jonathan M Davis
« First   ‹ Prev
1 2