Jump to page: 1 2
Thread overview
counting words
Jun 28, 2013
Benjamin Thaut
Jun 28, 2013
Brad Anderson
Jun 28, 2013
Brad Anderson
Jun 28, 2013
Benjamin Thaut
Jun 28, 2013
Brad Anderson
Jun 28, 2013
Benjamin Thaut
Jun 28, 2013
monarch_dodra
Jun 28, 2013
bearophile
Jun 28, 2013
monarch_dodra
Jun 28, 2013
Brad Anderson
Jun 28, 2013
monarch_dodra
Jun 28, 2013
Brad Anderson
June 28, 2013
I'm currently making a few tests with std.algorithm, std.range, etc

I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?

-- 
Kind Regards
Benjamin Thaut
June 28, 2013
On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
> I'm currently making a few tests with std.algorithm, std.range, etc
>
> I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?

If you don't mind sorting twice:

words.sort()
     .group()
     .array()
     .sort!((a, b)=> a[1] > b[1])
     .map!(a => a[0])
     .copy(words);

You could also do it with a hashmap to keep the count.
June 28, 2013
On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>> I'm currently making a few tests with std.algorithm, std.range, etc
>>
>> I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?
>
> If you don't mind sorting twice:
>
> words.sort()
>      .group()
>      .array()
>      .sort!((a, b)=> a[1] > b[1])
>      .map!(a => a[0])
>      .copy(words);
>
> You could also do it with a hashmap to keep the count.

Like so:

    size_t[string] dic;
    words.map!((w) { ++dic[w.idup]; return w; })
         .array // eager (so dic is filled first), sortable
         .sort!((a, b) { bool less = dic[a] > dic[b]; return less || less && a < b; })
         .uniq
         .copy(words);

It's a bit ugly and abuses side effects with the hash map.  The order will differ from the other program when words have identical counts.
June 28, 2013
Am 28.06.2013 18:42, schrieb Brad Anderson:
> On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
>> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>>> I'm currently making a few tests with std.algorithm, std.range, etc
>>>
>>> I have a arry of words. Is it possible to count how often each word
>>> is contained in the array and then sort the array by the count of the
>>> individual words by chaining ranges? (e.g. without using a foreach
>>> loop + hashmap)?
>>
>> If you don't mind sorting twice:
>>
>> words.sort()
>>      .group()
>>      .array()
>>      .sort!((a, b)=> a[1] > b[1])
>>      .map!(a => a[0])
>>      .copy(words);
>>
>> You could also do it with a hashmap to keep the count.
>
> Like so:
>
>      size_t[string] dic;
>      words.map!((w) { ++dic[w.idup]; return w; })
>           .array // eager (so dic is filled first), sortable
>           .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
> less && a < b; })
>           .uniq
>           .copy(words);
>
> It's a bit ugly and abuses side effects with the hash map.  The order
> will differ from the other program when words have identical counts.

I figured something like this by now too. Thank you.
But I don't quite understand what the copy is for at the end?

-- 
Kind Regards
Benjamin Thaut
June 28, 2013
On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>> I'm currently making a few tests with std.algorithm, std.range, etc
>>
>> I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?
>
> If you don't mind sorting twice:
>
> words.sort()
>      .group()
>      .array()
>      .sort!((a, b)=> a[1] > b[1])
>      .map!(a => a[0])
>      .copy(words);
>
> You could also do it with a hashmap to keep the count.

It's just missing the construction scheme for "words". I had this:

    text
        .splitter();
        .reduce!((words, w)=>++words[w], words)(int[string]);

But alas... reduce's signature is not ctfe-able :(

Well, I've had a PR open for this for about 8 months now...
June 28, 2013
On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
> Am 28.06.2013 18:42, schrieb Brad Anderson:
>> On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
>>> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>>>> I'm currently making a few tests with std.algorithm, std.range, etc
>>>>
>>>> I have a arry of words. Is it possible to count how often each word
>>>> is contained in the array and then sort the array by the count of the
>>>> individual words by chaining ranges? (e.g. without using a foreach
>>>> loop + hashmap)?
>>>
>>> If you don't mind sorting twice:
>>>
>>> words.sort()
>>>     .group()
>>>     .array()
>>>     .sort!((a, b)=> a[1] > b[1])
>>>     .map!(a => a[0])
>>>     .copy(words);
>>>
>>> You could also do it with a hashmap to keep the count.
>>
>> Like so:
>>
>>     size_t[string] dic;
>>     words.map!((w) { ++dic[w.idup]; return w; })
>>          .array // eager (so dic is filled first), sortable
>>          .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
>> less && a < b; })
>>          .uniq
>>          .copy(words);
>>
>> It's a bit ugly and abuses side effects with the hash map.  The order
>> will differ from the other program when words have identical counts.
>
> I figured something like this by now too. Thank you.
> But I don't quite understand what the copy is for at the end?

Just replacing your original word list with the sorted list (which I just realized is wrong because it will leave a bunch of words on the end, oops).  You could .array it instead to get a new array or just store the range with auto and consume that where needed with no extra array allocation.
June 28, 2013
Am 28.06.2013 19:01, schrieb Brad Anderson:
> On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
>> Am 28.06.2013 18:42, schrieb Brad Anderson:
>>> On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
>>>> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>>>>> I'm currently making a few tests with std.algorithm, std.range, etc
>>>>>
>>>>> I have a arry of words. Is it possible to count how often each word
>>>>> is contained in the array and then sort the array by the count of the
>>>>> individual words by chaining ranges? (e.g. without using a foreach
>>>>> loop + hashmap)?
>>>>
>>>> If you don't mind sorting twice:
>>>>
>>>> words.sort()
>>>>     .group()
>>>>     .array()
>>>>     .sort!((a, b)=> a[1] > b[1])
>>>>     .map!(a => a[0])
>>>>     .copy(words);
>>>>
>>>> You could also do it with a hashmap to keep the count.
>>>
>>> Like so:
>>>
>>>     size_t[string] dic;
>>>     words.map!((w) { ++dic[w.idup]; return w; })
>>>          .array // eager (so dic is filled first), sortable
>>>          .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
>>> less && a < b; })
>>>          .uniq
>>>          .copy(words);
>>>
>>> It's a bit ugly and abuses side effects with the hash map. The order
>>> will differ from the other program when words have identical counts.
>>
>> I figured something like this by now too. Thank you.
>> But I don't quite understand what the copy is for at the end?
>
> Just replacing your original word list with the sorted list (which I
> just realized is wrong because it will leave a bunch of words on the
> end, oops).  You could .array it instead to get a new array or just
> store the range with auto and consume that where needed with no extra
> array allocation.

Ok, thank you very much

-- 
Kind Regards
Benjamin Thaut
June 28, 2013
monarch_dodra:

> Well, I've had a PR open for this for about 8 months now...

Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours.
Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-)

Bye,
bearophile
June 28, 2013
On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
> monarch_dodra:
>
>> Well, I've had a PR open for this for about 8 months now...
>
> Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours.
> Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-)
>
> Bye,
> bearophile

The answer is negative though :(
June 28, 2013
On Friday, 28 June 2013 at 18:53:44 UTC, monarch_dodra wrote:
> On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
>> monarch_dodra:
>>
>>> Well, I've had a PR open for this for about 8 months now...
>>
>> Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours.
>> Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-)
>>
>> Bye,
>> bearophile
>
> The answer is negative though :(

95 merged phobos pull requests (109 total).  Long overdue, I'd say.  I don't know if there is really a formal process to getting write privileges but if any of the people who decide are reading this I nominate monarch_dodra.
« First   ‹ Prev
1 2