Dev.to daily challenge Duplicate Encoder

Jun 24, 2020

Jesse Phillips

Jun 24, 2020

CraigDillabaugh

Jun 24, 2020

Jun 24, 2020

Jun 25, 2020

Jun 25, 2020

Jun 25, 2020

The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the problem and instead took the top implementations done in different languages and implemented them in D[2]. I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change. I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach) This was just a fun little experiment. 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder

On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote: > The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the problem and instead took the top implementations done in different languages and implemented them in D[2]. > > I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change. > > I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach) > > This was just a fun little experiment. > > 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l > 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder Neat. It would be nice if the axes in your images were labeled. It is not immediately clear what the numbers mean.

June 24, 2020

Re: Dev.to daily challenge Duplicate Encoder

Posted by CraigDillabaugh
in reply to Jesse Phillips

Permalink

CraigDillabaugh

Posted in reply to Jesse Phillips

Permalink

On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote:
> The Dev.to community does a Daily Challenge[1]. I didn't really set out to solve the  problem and instead took the top implementations done in different languages and implemented them in D[2].
>
> I need to emphasize I reference languages but this is not performance of those languages all data is from D. Also I did not play too much with optimizations on the compiler (using LDC) but feel free to recommend a change.
>
> I find the changes to Haskell's performance to be interesting as it went from very unreasonable to competing with Go (Ranges vs foreach)
>
> This was just a fun little experiment.
>
> 1. https://dev.to/thepracticaldev/daily-challenge-259-duplicate-encoder-2e8l
> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder

For your pointer based implementation you might be able to finish with a single for loop over the array (psuedocode).

for each char ch in str:
     if this is the first time you see ch, replace it with '(' and
         store a pointer to it.
     else if this is the second time replace it with ')' and
         follow your pointer back, replacing the first occurrence with ')'
     else replace it with ')'

Likely results in somewhat uglier code.

On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote: > 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder In `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`. Replacing the type with `int[256]` results in roughly a 2x speedup. I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all. I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster: string duplicateEncode_pointer(string str) { import std.ascii : toLower; auto result = str.dup; char*[256] locMap; foreach(ref c; result) { auto p = &locMap[c.toLower()]; if (*p) **p = c = MANY; else { c = ONCE; *p = &c; } } return result.to!string; }

On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote: > > I also don't understand the choices that led to the duplicateEncode_pointer implementation. It it was basically, what happens if I don't loop twice over but instead just store where things need to change. I did want a single loop but when I realized I failed I switched to gnuplot This wasn't really an attempt to make efficient code, just a translation + some of my own curiosity.

On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote: > On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote: >> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder > > In `duplicateEncode_go`, the code is inconsistent in whether it wants to process the string by code unit or code point. `occurences` [sic] is declared as `int[dchar]`, but later it uses a standard (`char`-wise) `foreach` loop, and the `std.ascii` variant of `toLower`. Yeah I noticed that too. Really just the first 'make it work' thing I did.

On Wednesday, 24 June 2020 at 19:13:39 UTC, Vladimir Panteleev wrote: > On Wednesday, 24 June 2020 at 17:21:34 UTC, Jesse Phillips wrote: >> 2. https://github.com/JesseKPhillips/devto-challenge259-dupencoder > > Replacing the type with `int[256]` results in roughly a 2x speedup. > > I think it might be possible to use an AVX (256-bit) register to hold state to avoid going to the RAM at all. > > I also don't understand the choices that led to the duplicateEncode_pointer implementation. This version is much faster: I've added those changes and included axis labels https://github.com/JesseKPhillips/devto-challenge259-dupencoder

Forums