Thread overview
Dev.to daily challenge Duplicate Encoder
Jun 24, 2020
Jesse Phillips
Jun 24, 2020
CraigDillabaugh
Jun 24, 2020
CraigDillabaugh
Jun 24, 2020
Vladimir Panteleev
Jun 25, 2020
Jesse Phillips
Jun 25, 2020
Jesse Phillips
Jun 25, 2020
Jesse Phillips
June 24, 2020
The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the  problem and instead took the top implementations done in different languages and implemented them in D[2].

I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change.

I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach)

This was just a fun little experiment.

1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder
June 24, 2020
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
> The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the  problem and instead took the top implementations done in different languages and implemented them in D[2].
>
> I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change.
>
> I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach)
>
> This was just a fun little experiment.
>
> 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder

Neat. It would be nice if the axes in your images were labeled. It is not immediately clear what the numbers mean.

June 24, 2020
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
> The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the  problem and instead took the top implementations done in different languages and implemented them in D[2].
>
> I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change.
>
> I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach)
>
> This was just a fun little experiment.
>
> 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder

For your pointer based implementation you might be able to finish with a single for loop over the array (psuedocode).

for each char ch in str:
     if this is the first time you see ch, replace it with '(' and
         store a pointer to it.
     else if this is the second time replace it with ')' and
         follow your pointer back, replacing the first occurrence with ')'
     else replace it with ')'

Likely results in somewhat uglier code.
June 24, 2020
On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder

In `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`.

Replacing the type with `int[256]` results in roughly a 2x speedup.

I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all.

I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster:

string duplicateEncode_pointer(string str) {
    import std.ascii : toLower;
    auto result = str.dup;
    char*[256] locMap;

    foreach(ref c; result)
    {
        auto p = &locMap[c.toLower()];
        if (*p)
            **p = c = MANY;
        else
        {
            c = ONCE;
            *p = &c;
        }
    }

    return result.to!string;
}

June 25, 2020
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote:
>
> I also don't understand the choices that led to the duplicateEncode_pointer implementation.

It it was basically, what happens if I don't loop twice over but instead just store where things need to change. I did want a single loop but when I realized I failed I switched to gnuplot

This wasn't really an attempt to make efficient code, just a translation + some of my own curiosity.
June 25, 2020
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote:
> On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
>> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder
>
> In `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`.

Yeah I noticed that too. Really just the first 'make it work' thing I did.
June 25, 2020
On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote:
> On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
>> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder
>
> Replacing the type with `int[256]` results in roughly a 2x speedup.
>
> I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all.
>
> I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster:

I've added those changes and included axis labels

https://github.com/JesseKPhillips/devto-challenge259-dupencoder