counting words - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » counting words

Thread overview

counting words
Jun 28, 2013 Benjamin Thaut
Jun 28, 2013 Brad Anderson
Jun 28, 2013 Brad Anderson
Jun 28, 2013 Benjamin Thaut
Jun 28, 2013 Brad Anderson
Jun 28, 2013 Benjamin Thaut
Jun 28, 2013 monarch_dodra
Jun 28, 2013 bearophile
Jun 28, 2013 monarch_dodra
Jun 28, 2013 Brad Anderson
Jun 28, 2013 monarch_dodra
Jun 28, 2013 Brad Anderson

June 28, 2013

Posted by Benjamin Thaut

Benjamin Thaut

I'm currently making a few tests with std.algorithm, std.range, etc

I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?

-- 
Kind Regards
Benjamin Thaut

June 28, 2013

Re: counting words

Posted by Brad Anderson
in reply to Benjamin Thaut

Brad Anderson

Posted in reply to Benjamin Thaut

On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
> I'm currently making a few tests with std.algorithm, std.range, etc
>
> I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?

If you don't mind sorting twice:

words.sort()
     .group()
     .array()
     .sort!((a, b)=> a[1] > b[1])
     .map!(a => a[0])
     .copy(words);

You could also do it with a hashmap to keep the count.

June 28, 2013

Re: counting words

Posted by Brad Anderson
in reply to Brad Anderson

Brad Anderson

Posted in reply to Brad Anderson

On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>> I'm currently making a few tests with std.algorithm, std.range, etc
>>
>> I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?
>
> If you don't mind sorting twice:
>
> words.sort()
>      .group()
>      .array()
>      .sort!((a, b)=> a[1] > b[1])
>      .map!(a => a[0])
>      .copy(words);
>
> You could also do it with a hashmap to keep the count.

Like so:

    size_t[string] dic;
    words.map!((w) { ++dic[w.idup]; return w; })
         .array // eager (so dic is filled first), sortable
         .sort!((a, b) { bool less = dic[a] > dic[b]; return less || less && a < b; })
         .uniq
         .copy(words);

It's a bit ugly and abuses side effects with the hash map.  The order will differ from the other program when words have identical counts.

June 28, 2013

Re: counting words

Posted by Benjamin Thaut
in reply to Brad Anderson

Benjamin Thaut

Posted in reply to Brad Anderson

Am 28.06.2013 18:42, schrieb Brad Anderson:
> On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
>> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>>> I'm currently making a few tests with std.algorithm, std.range, etc
>>>
>>> I have a arry of words. Is it possible to count how often each word
>>> is contained in the array and then sort the array by the count of the
>>> individual words by chaining ranges? (e.g. without using a foreach
>>> loop + hashmap)?
>>
>> If you don't mind sorting twice:
>>
>> words.sort()
>>      .group()
>>      .array()
>>      .sort!((a, b)=> a[1] > b[1])
>>      .map!(a => a[0])
>>      .copy(words);
>>
>> You could also do it with a hashmap to keep the count.
>
> Like so:
>
>      size_t[string] dic;
>      words.map!((w) { ++dic[w.idup]; return w; })
>           .array // eager (so dic is filled first), sortable
>           .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
> less && a < b; })
>           .uniq
>           .copy(words);
>
> It's a bit ugly and abuses side effects with the hash map.  The order
> will differ from the other program when words have identical counts.

I figured something like this by now too. Thank you.
But I don't quite understand what the copy is for at the end?

-- 
Kind Regards
Benjamin Thaut

June 28, 2013

Re: counting words

Posted by monarch_dodra
in reply to Brad Anderson

monarch_dodra

Posted in reply to Brad Anderson

On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>> I'm currently making a few tests with std.algorithm, std.range, etc
>>
>> I have a arry of words. Is it possible to count how often each word is contained in the array and then sort the array by the count of the individual words by chaining ranges? (e.g. without using a foreach loop + hashmap)?
>
> If you don't mind sorting twice:
>
> words.sort()
>      .group()
>      .array()
>      .sort!((a, b)=> a[1] > b[1])
>      .map!(a => a[0])
>      .copy(words);
>
> You could also do it with a hashmap to keep the count.

It's just missing the construction scheme for "words". I had this:

    text
        .splitter();
        .reduce!((words, w)=>++words[w], words)(int[string]);

But alas... reduce's signature is not ctfe-able :(

Well, I've had a PR open for this for about 8 months now...

June 28, 2013

Re: counting words

Posted by Brad Anderson
in reply to Benjamin Thaut

Brad Anderson

Posted in reply to Benjamin Thaut

On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
> Am 28.06.2013 18:42, schrieb Brad Anderson:
>> On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
>>> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>>>> I'm currently making a few tests with std.algorithm, std.range, etc
>>>>
>>>> I have a arry of words. Is it possible to count how often each word
>>>> is contained in the array and then sort the array by the count of the
>>>> individual words by chaining ranges? (e.g. without using a foreach
>>>> loop + hashmap)?
>>>
>>> If you don't mind sorting twice:
>>>
>>> words.sort()
>>>     .group()
>>>     .array()
>>>     .sort!((a, b)=> a[1] > b[1])
>>>     .map!(a => a[0])
>>>     .copy(words);
>>>
>>> You could also do it with a hashmap to keep the count.
>>
>> Like so:
>>
>>     size_t[string] dic;
>>     words.map!((w) { ++dic[w.idup]; return w; })
>>          .array // eager (so dic is filled first), sortable
>>          .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
>> less && a < b; })
>>          .uniq
>>          .copy(words);
>>
>> It's a bit ugly and abuses side effects with the hash map.  The order
>> will differ from the other program when words have identical counts.
>
> I figured something like this by now too. Thank you.
> But I don't quite understand what the copy is for at the end?

Just replacing your original word list with the sorted list (which I just realized is wrong because it will leave a bunch of words on the end, oops).  You could .array it instead to get a new array or just store the range with auto and consume that where needed with no extra array allocation.

June 28, 2013

Re: counting words

Posted by Benjamin Thaut
in reply to Brad Anderson

Benjamin Thaut

Posted in reply to Brad Anderson

Am 28.06.2013 19:01, schrieb Brad Anderson:
> On Friday, 28 June 2013 at 16:48:08 UTC, Benjamin Thaut wrote:
>> Am 28.06.2013 18:42, schrieb Brad Anderson:
>>> On Friday, 28 June 2013 at 16:25:25 UTC, Brad Anderson wrote:
>>>> On Friday, 28 June 2013 at 16:04:35 UTC, Benjamin Thaut wrote:
>>>>> I'm currently making a few tests with std.algorithm, std.range, etc
>>>>>
>>>>> I have a arry of words. Is it possible to count how often each word
>>>>> is contained in the array and then sort the array by the count of the
>>>>> individual words by chaining ranges? (e.g. without using a foreach
>>>>> loop + hashmap)?
>>>>
>>>> If you don't mind sorting twice:
>>>>
>>>> words.sort()
>>>>     .group()
>>>>     .array()
>>>>     .sort!((a, b)=> a[1] > b[1])
>>>>     .map!(a => a[0])
>>>>     .copy(words);
>>>>
>>>> You could also do it with a hashmap to keep the count.
>>>
>>> Like so:
>>>
>>>     size_t[string] dic;
>>>     words.map!((w) { ++dic[w.idup]; return w; })
>>>          .array // eager (so dic is filled first), sortable
>>>          .sort!((a, b) { bool less = dic[a] > dic[b]; return less ||
>>> less && a < b; })
>>>          .uniq
>>>          .copy(words);
>>>
>>> It's a bit ugly and abuses side effects with the hash map. The order
>>> will differ from the other program when words have identical counts.
>>
>> I figured something like this by now too. Thank you.
>> But I don't quite understand what the copy is for at the end?
>
> Just replacing your original word list with the sorted list (which I
> just realized is wrong because it will leave a bunch of words on the
> end, oops).  You could .array it instead to get a new array or just
> store the range with auto and consume that where needed with no extra
> array allocation.

Ok, thank you very much

-- 
Kind Regards
Benjamin Thaut

June 28, 2013

Re: counting words

Posted by bearophile
in reply to monarch_dodra

bearophile

Posted in reply to monarch_dodra

monarch_dodra:

> Well, I've had a PR open for this for about 8 months now...

Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours.
Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-)

Bye,
bearophile

June 28, 2013

Re: counting words

Posted by monarch_dodra
in reply to bearophile

monarch_dodra

Posted in reply to bearophile

On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
> monarch_dodra:
>
>> Well, I've had a PR open for this for about 8 months now...
>
> Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours.
> Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-)
>
> Bye,
> bearophile

The answer is negative though :(

June 28, 2013

Re: counting words

Posted by Brad Anderson
in reply to monarch_dodra

Brad Anderson

Posted in reply to monarch_dodra

On Friday, 28 June 2013 at 18:53:44 UTC, monarch_dodra wrote:
> On Friday, 28 June 2013 at 17:59:58 UTC, bearophile wrote:
>> monarch_dodra:
>>
>>> Well, I've had a PR open for this for about 8 months now...
>>
>> Do you have Phobos commit rights? If the answer is positive then perhaps you can review and commit some Phobos code written by Andrei and he can do the same with yours.
>> Today it seems the major development bottleneck is the review queue and not having enough writing of patches :-)
>>
>> Bye,
>> bearophile
>
> The answer is negative though :(

95 merged phobos pull requests (109 total).  Long overdue, I'd say.  I don't know if there is really a formal process to getting write privileges but if any of the people who decide are reading this I nominate monarch_dodra.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation