January 25, 2008
Walter Bright wrote:

>     /* Shuffle the files[] array
>      */

To be nitpicky, your algorithm will not give a perfect shuffling. Consider shuffling the sequence 012. The algorithm has 3^3 = 27 different ways it can shuffle the sequence, and there exists 3! = 6 different permutations of the sequence. Since 27 isn't divisible by 6, the algorithm will give a slight bias towards some end sequences. (021,102 and 120 in this case). A quick fix is to change the first line of the loop body:

>     for (size_t i = 0; i < files.length; i++)
>     {
>     auto j = std.random.rand() % files.length;

      auto j = std.random.rand() % (files.length - i) + i;

>     // swap [i] and [j]
>     auto tmp = files[i];
>     files[i] = files[j];
>     files[j] = tmp;
>     }

(the last iteration will be a nop, so the loop condition could be changed into i < files.length-1)

Regards,

-- 
Oskar
January 25, 2008
Oskar Linde wrote:
> Walter Bright wrote:
> 
>>     /* Shuffle the files[] array
>>      */
> 
> To be nitpicky, your algorithm will not give a perfect shuffling. Consider shuffling the sequence 012. The algorithm has 3^3 = 27 different ways it can shuffle the sequence, and there exists 3! = 6 different permutations of the sequence. Since 27 isn't divisible by 6, the algorithm will give a slight bias towards some end sequences. (021,102 and 120 in this case). A quick fix is to change the first line of the loop body:
> 
>>     for (size_t i = 0; i < files.length; i++)
>>     {
>>     auto j = std.random.rand() % files.length;
> 
>       auto j = std.random.rand() % (files.length - i) + i;
> 
>>     // swap [i] and [j]
>>     auto tmp = files[i];
>>     files[i] = files[j];
>>     files[j] = tmp;
>>     }
> 
> (the last iteration will be a nop, so the loop condition could be changed into i < files.length-1)

Yep, this is a classic problem from homeworks sets for algorithms classes.  How to make a random permutation of N elements in O(N) time and O(1) additional space such that every element is equally likely to end up in any slot.  And the answer is what Oskar said.    Walter, your MechE background is showing. :-)

--bb
January 25, 2008
Oskar Linde:
> To be nitpicky, your algorithm will not give a perfect shuffling. Consider shuffling the sequence 012. The algorithm has 3^3 = 27 different ways it can shuffle the sequence, and there exists 3! = 6 different permutations of the sequence. Since 27 isn't divisible by 6, the algorithm will give a slight bias towards some end sequences.

Thank you Oskar for showing why a shuffle() function is necessary in std.random, among other things. You can find one in my d libs, based on Knuth simple algorithm ;-)

Bye,
bearophile
January 25, 2008
bearophile wrote:

> Oskar Linde:
>> To be nitpicky, your algorithm will not give a perfect shuffling. Consider shuffling the sequence 012. The algorithm has 3^3 = 27 different ways it can shuffle the sequence, and there exists 3! = 6 different permutations of the sequence. Since 27 isn't divisible by 6, the algorithm will give a slight bias towards some end sequences.
> 
> Thank you Oskar for showing why a shuffle() function is necessary in std.random, among other things. You can find one in my d libs, based on Knuth simple algorithm ;-)
> 
> Bye,
> bearophile

FWIW, tango.core.Array gained shuffle last night :)

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango
January 25, 2008
Yes please :)

> What I'd like to do is collect a bunch of simple "cookbook" utilities like this that fit nicely into a single file, which can be put on the website to help people get started with D. I have several floating around my hard disk.


January 25, 2008
Walter Bright wrote:
>     /* Shuffle the files[] array
>      */
>     for (size_t i = 0; i < files.length; i++)
>     {
>     auto j = std.random.rand() % files.length;
>     // swap [i] and [j]
>     auto tmp = files[i];
>     files[i] = files[j];
>     files[j] = tmp;
>     }
> 
>     /* Sequentially fill the target until done or it quits with an
>      * exception when the device is full.
>      */
>     foreach (fromfile; files)
>     {
>     auto tofile = std.path.join(todir, basename(fromfile));
>     writefln("%s => %s", fromfile, tofile);
>     std.file.copy(fromfile, tofile);
>     }
> 
>     writefln("Done");
>     return 0;
> }

As mentioned by Oskar, this won't generate uniformly random lists.
However, both your algorithm and his version have an additional flaw: both shuffle the entire list when you only need a random prefix (i.e. you only need to select as many random files as will fit onto your second disk).

Try this small modification to the original algorithm instead:
-----

    /* The loop will normally quit via an exception when the target
     * device is full. But if there are not enough files in the source
     * to fill up the target, the loop ensures it will still eventually
     * quit.
     */
    while (files.length > 0)
    {
        auto j = std.random.rand() % files.length;
        auto fromfile = files[j];
        auto tofile = std.path.join(todir, basename(fromfile));
        writefln("%s => %s", fromfile, tofile);
        std.file.copy(fromfile, tofile);

        // Remove copied file from the list
        files[j] = files[$-1];
        files = files[0 .. $-1];
    }
-----

Though technically, this will still prefer the files at the front of the list during each iteration because (rand() % files.length) is more likely to be a number <= (RAND_MAX/files.length) than a larger number. (In fact, for RAND_MAX < files.length the higher-ups will *never* be chosen during that iteration)
However, this shouldn't be a significant problem as long as your number of files is much less than RAND_MAX; and RAND_MAX appears to be uint.max for std.random, so for any reasonable music collection it should be close enough.

To fix that, ignore the result of rand() if it's > ((RAND_MAX / files.length) * files.length), for example by replacing the first line of the loop with:
---
    uint r = std.random.max;
    // assumes RAND_MAX == uint.max
    auto max = (uint.max / files.length) * files.length;
    while (r > max)
        r = std.random.rand();
    auto j = r % files.length;
---
(also, add "assert(files.length <= RAND_MAX);" before the loop if RAND_MAX < uint.max)
January 25, 2008
* John Reimer <terminal.node@gmail.com> [08-01-25 06:03]:
>Walter Bright wrote:
>>For fun, I ordered a new car stereo that would play music from an SD card or USB stick(rather than from CD), and installed it over the weekend. The problem is loading songs onto the SD card from my home music server, which I like to hear played at random.
>>The solution is a simple D program, shuffle, which will randomly copy music files to an SD card until it fills up. Have some fun with it!
[..]
>And what would be the Tango equivalent, I wonder?  ;)

I don't know about Tango, but this is the solution with GNU coreutils -
lame, I know ;)

 find -type f -a \( -name '*.mp3' -o -name '*.wma' \)  -print0 | \
 shuf -z | \
 xargs -0 -i cp {} SDCARD_PATH
January 25, 2008
Walter Bright wrote:
> Roberto Mariottini wrote:
>> I've found players that implement the shuffle function this way: the
>> pseudo random generator always "prefers" a certain subset of the songs
>> and will play them frequently, while other songs are rarely selected
>> (if ever).
>> A real "shuffle" function should generate the same list of songs that
>> the source contains, changing only the order.
> 
> I agree, a much better algorithm would be to shuffle the files[] array, then sequentially write the shuffled result.

Strangely, I added a shuffle routine to tango.core.Array just yesterday.


Sean
January 25, 2008
Oskar Linde wrote:
> Walter Bright wrote:
> 
>>     /* Shuffle the files[] array
>>      */
> 
> 
> To be nitpicky, your algorithm will not give a perfect shuffling. Consider shuffling the sequence 012. The algorithm has 3^3 = 27 different ways it can shuffle the sequence, and there exists 3! = 6 different permutations of the sequence. Since 27 isn't divisible by 6, the algorithm will give a slight bias towards some end sequences. (021,102 and 120 in this case). A quick fix is to change the first line of the loop body:
> 

What would you get if you qsorted the list with rand used as the compare function?
January 25, 2008
Bill Baxter wrote:
> Yep, this is a classic problem from homeworks sets for algorithms classes.  How to make a random permutation of N elements in O(N) time and O(1) additional space such that every element is equally likely to end up in any slot.  And the answer is what Oskar said.    Walter, your MechE background is showing. :-)

I remember reading a criticism of the shuffle/swap algorithm and the fix a while back, but don't remember where I read it or what the fix was. I'm glad Oskar has set it straight here.

Although this is just a dumb program, because it's going into a 'cookbook' that others may use for more serious applications, the algorithms should be correct.

Kudos to Oskar.