Interesting bug with std.random.uniform and dchar

Jun 08, 2014

Joseph Rushton Wakeling

Jun 08, 2014

monarch_dodra

Jun 08, 2014

Joseph Rushton Wakeling

Jun 08, 2014

H. S. Teoh

Jun 08, 2014

monarch_dodra

Jun 08, 2014

Joseph Rushton Wakeling

June 08, 2014

Interesting bug with std.random.uniform and dchar

Posted by Joseph Rushton Wakeling

Permalink

Joseph Rushton Wakeling

Permalink

Hello all,

Here's an interesting little bug that arises when std.random.uniform is called using dchar as the variable type:

/****************************************************************************/
import std.conv, std.random, std.stdio, std.string, std.typetuple;

void main()
{
    foreach (C; TypeTuple!(char, wchar, dchar))
    {
        writefln("Testing with %s: [%s, %s]", C.stringof, to!ulong(C.min), to!ulong(C.max));
        foreach (immutable _; 0 .. 100)
        {
            auto u = uniform!"[]"(C.min, C.max);

            assert(C.min <= u, format("%s.min = %s, u = %s", C.stringof, to!ulong(C.min), to!ulong(u)));
            assert(u <= C.max, format("%s.max = %s, u = %s", C.stringof, to!ulong(C.max), to!ulong(u)));
        }
    }
}
/****************************************************************************/

When closed boundaries "[]" are used with uniform, and the min and max of the distribution are equal to T.min and T.max (where T is the variable type), the integral/char-type uniform() makes use of an optimization, and returns

    std.random.uniform!ResultType(rng);

That is, it uses a specialization of uniform() for the case where one wants a random number drawn from all the possible bits of a given integral type.

With char and wchar (8- and 16-bit) this works fine.  However, dchar (32-bit) has a .max value that is less than the corresponding number of bits used to represent it: dchar.max is 1114111, while its 32 bits are theoretically capable of handling values of up to 4294967295.

A second consequence is that uniform!dchar (the all-the-bits specialization) will return invalid code points.

I take it this is a consequence of dchar being something of an oddity as a data type -- while stored as an "integral-like" value, it doesn't actually make use of the full range of values available in its 32 bits (unlike char and wchar which make full use of their 8-bit and 16-bit range).

I think it should suffice to forbid uniform!T from accepting dchar parameters and to tweak the integral-type uniform()'s internal check to avoid calling that specialization with dchar.

Thoughts ... ?

Thanks & best wishes,

    -- Joe

On Sunday, 8 June 2014 at 08:54:30 UTC, Joseph Rushton Wakeling via Digitalmars-d-learn wrote: > I think it should suffice to forbid uniform!T from accepting dchar parameters and to tweak the integral-type uniform()'s internal check to avoid calling that specialization with dchar. > > Thoughts ... ? > > Thanks & best wishes, > > -- Joe Why would we ban uniform!T from accepting dchar? I see no reason for that. Let's just fix the bug by tweaking the internal check.

On 08/06/14 11:02, monarch_dodra via Digitalmars-d-learn wrote: > Why would we ban uniform!T from accepting dchar? I see no reason for that. > > Let's just fix the bug by tweaking the internal check. Yea, I came to the same conclusion while working on it. :-) The solution I have is (i) in uniform!"[]" check that !is(ResultType == dchar) before checking the condition for calling uniform!ResultType, and (ii) inside uniform!T, place static if (is(T == dchar)) { return uniform!"[]"(T.min, T.max, rng); }

On Sun, Jun 08, 2014 at 11:17:41AM +0200, Joseph Rushton Wakeling via Digitalmars-d-learn wrote: > On 08/06/14 11:02, monarch_dodra via Digitalmars-d-learn wrote: > >Why would we ban uniform!T from accepting dchar? I see no reason for that. > > > >Let's just fix the bug by tweaking the internal check. > > Yea, I came to the same conclusion while working on it. :-) > > The solution I have is (i) in uniform!"[]" check that !is(ResultType > == dchar) before checking the condition for calling > uniform!ResultType, and (ii) inside uniform!T, place > > static if (is(T == dchar)) > { > return uniform!"[]"(T.min, T.max, rng); > } Doesn't wchar need to have a similar specialization too? Aren't some values of wchar invalid as well? T -- MS Windows: 64-bit rehash of 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand 1-bit of competition.

On Sunday, 8 June 2014 at 13:55:48 UTC, H. S. Teoh via Digitalmars-d-learn wrote: > On Sun, Jun 08, 2014 at 11:17:41AM +0200, Joseph Rushton Wakeling via Digitalmars-d-learn wrote: >> On 08/06/14 11:02, monarch_dodra via Digitalmars-d-learn wrote: >> >Why would we ban uniform!T from accepting dchar? I see no reason for that. >> > >> >Let's just fix the bug by tweaking the internal check. >> >> Yea, I came to the same conclusion while working on it. :-) >> >> The solution I have is (i) in uniform!"[]" check that !is(ResultType >> == dchar) before checking the condition for calling >> uniform!ResultType, and (ii) inside uniform!T, place >> >> static if (is(T == dchar)) >> { >> return uniform!"[]"(T.min, T.max, rng); >> } > > Doesn't wchar need to have a similar specialization too? Aren't some > values of wchar invalid as well? > > > T Arguably, the issue is the difference between "invalid" and downright "illegal" values. The thing about dchar is that while it *can* have values higher than dchar max, it's (AFAIK) illegal to have them, and the compiler (if it can) will flag you for it: dchar c1 = 0x0000_D800; //Invalid, but fine. dchar c2 = 0xFFFF_0000; //Illegal, nope.

On 08/06/14 16:25, monarch_dodra via Digitalmars-d-learn wrote: > Arguably, the issue is the difference between "invalid" and downright "illegal" > values. The thing about dchar is that while it *can* have values higher than > dchar max, it's (AFAIK) illegal to have them, and the compiler (if it can) will > flag you for it: > > dchar c1 = 0x0000_D800; //Invalid, but fine. > dchar c2 = 0xFFFF_0000; //Illegal, nope. Yup. If you use an invalid wchar (say, via writeln), you'll get a nonsense symbol on your screen, but it will work. Try and writeln a dchar whose value is greater than dchar.max and you'll get an exception/error thrown.

Forums