March 24, 2014
Very nice example.   I'll test on ubuntu later.

On windows ...

D:\diamond\diamond\diamond\Release>diamond 1> nul
brad: time: 19544[ms]
printDiamond1: time: 1139[ms]
printDiamond2: time: 1656[ms]
printDiamond3: time: 663[ms]
jay1: time: 455[ms]
sergei: time: 11673[ms]
jay2: time: 411[ms]
diamondShape: time: 4399[ms]
printDiamond: time: 185[ms]
March 24, 2014
On Thursday, 20 March 2014 at 21:25:03 UTC, Ali Çehreli wrote:
> This is a somewhat common little exercise:

if you like similar puzzles, here is another:

Write a program that expects a 10-by-10 matrix from standard input. The program should compute sum of each row and each column and print the highest of these numbers to standard output.

    An example input:

    01 34 46 31 55 21 16 88 87 87
    32 40 82 40 43 96 08 82 41 86
    30 16 24 18 04 54 65 96 38 48
    32 00 99 90 24 75 89 41 04 01
    11 80 31 83 08 93 37 96 27 64
    09 81 28 41 48 23 68 55 86 72
    64 61 14 55 33 39 40 18 57 59
    49 34 50 81 85 12 22 54 80 76
    18 45 50 26 81 95 25 14 46 75
    22 52 37 50 37 40 16 71 52 17

    Expected output:

    615

The purpose is to write a "golfing" program, that is the shortest.

My current D solution is about 170 bytes (UNIX newlines):

void main(){
import std.stdio,std.range,std.algorithm,std.conv;
auto m=10.iota.map!(_=>readln.split.to!(int[]));
m.map!sum.chain(m.transposed.map!sum).reduce!max.write;
}


I am now trying to use std.file.slurp, but its documentation is insufficient.

A cryptic Python solution (not mine), 73 characters:

m=[map(int,_().split())for _ in[raw_input]*10]
_(max(map(sum,m+zip(*m))))

Bye,
bearophile
March 25, 2014
not through yet with the diamond.  This one is a little faster.  Appending the newline to the stars and calculating the slice backward from the end would save a w.put for the newlines ... probably faster.  I keep looking for a way to create a dynamic array of a specific size, filled with the init value I provide. Does it exist?

D:\diamond\diamond\diamond\Release>diamond 1>nul
brad: time: 19370[ms]
printDiamond1: time: 1140[ms]
printDiamond2: time: 1631[ms]
printDiamond3: time: 633[ms]
jay1: time: 459[ms]
sergei: time: 11886[ms]
jay2: time: 415[ms]
diamondShape: time: 4553[ms]
printDiamond: time: 187[ms]
printDiamonde2a: time: 139[ms]


void printDiamonde2a(in uint N)
{
    size_t N2 = N/2;
    char pSpace[] = uninitializedArray!(char[])(N2);
    pSpace[] = ' ';

    char pStars[] = uninitializedArray!(char[])(N);
    pStars[] = '*';

    char pNewLine[]=uninitializedArray!(char[])(2);
    pNewLine[] = '\n';

    auto w = appender!(char[])();
    w.reserve(N*4);

    foreach (n ; 0 .. N2 + 1){
        w.put(pSpace[0 .. N2 - n]);
        w.put(pStars[0 .. 2*n+1]);
        w.put(pNewLine[1]);
    }

    foreach_reverse (n ; 0 .. N2){
        w.put(pSpace[0 .. N2 - n]);
        w.put(pStars[0 .. 2*n+1]);
        w.put(pNewLine[1]);
    }
    write(w.data);
}
March 25, 2014
These were times on ubuntu. I may have printed debug build times previously, but these are dmd release build.  I gave up trying to figure out how to build ldc on ubuntu.  The dmd one click installer is much appreciated.

brad: time: 12425[ms]
printDiamond1: time: 380[ms]
printDiamond2: time: 728[ms]
printDiamond3: time: 378[ms]
jay1: time: 62[ms]
sergei: time: 3965[ms]
jay2: time: 27[ms]
diamondShape: time: 2778[ms]
printDiamond: time: 19[ms]
printDiamonde: time: 19[ms]
printDiamonde2b: time: 16[ms]


This was using the appended newlines to get rid of the extra wput in the loops.

void printDiamonde2b(in uint N)
{
    uint N2 = N/2;
    char pSpace[] = uninitializedArray!(char[])(N2);
    pSpace[] = ' ';

    char pStars[] = uninitializedArray!(char[])(N+1);
    pStars[] = '*';

    pStars[$-1] = '\n';

    auto w = appender!(char[])();
    w.reserve(N*3);

    foreach (n ; 0 .. N2 + 1){
        w.put(pSpace[0 .. N2 - n]);
	w.put(pStars[$-2*n-2 .. $]);
	}

    foreach_reverse (n ; 0 .. N2){
        w.put(pSpace[0 .. N2 - n]);
	w.put(pStars[$-2*n-2 .. $]);
	}

    write(w.data);
}
March 25, 2014
On Tuesday, 25 March 2014 at 02:25:57 UTC, Jay Norwood wrote:
> not through yet with the diamond.  This one is a little faster.
>  Appending the newline to the stars and calculating the slice backward from the end would save a w.put for the newlines ... probably faster.  I keep looking for a way to create a dynamic array of a specific size, filled with the init value I provide. Does it exist?
>
> D:\diamond\diamond\diamond\Release>diamond 1>nul
> brad: time: 19370[ms]
> printDiamond1: time: 1140[ms]
> printDiamond2: time: 1631[ms]
> printDiamond3: time: 633[ms]
> jay1: time: 459[ms]
> sergei: time: 11886[ms]
> jay2: time: 415[ms]
> diamondShape: time: 4553[ms]
> printDiamond: time: 187[ms]
> printDiamonde2a: time: 139[ms]
>
>
> void printDiamonde2a(in uint N)
> {
>     size_t N2 = N/2;
>     char pSpace[] = uninitializedArray!(char[])(N2);
>     pSpace[] = ' ';
>
>     char pStars[] = uninitializedArray!(char[])(N);
>     pStars[] = '*';
>
>     char pNewLine[]=uninitializedArray!(char[])(2);
>     pNewLine[] = '\n';
>
>     auto w = appender!(char[])();
>     w.reserve(N*4);
>
>     foreach (n ; 0 .. N2 + 1){
>         w.put(pSpace[0 .. N2 - n]);
>         w.put(pStars[0 .. 2*n+1]);
>         w.put(pNewLine[1]);
>     }
>
>     foreach_reverse (n ; 0 .. N2){
>         w.put(pSpace[0 .. N2 - n]);
>         w.put(pStars[0 .. 2*n+1]);
>         w.put(pNewLine[1]);
>     }
>     write(w.data);
> }

Interesting. I'd have thought the "extra copy" would be an overall slowdown, but I guess that's not the case.

I also tried your strategy of adding '\n' to the buffer, but I was getting some bad output on windows. I'm not sure why "\n\n" works though. On *nix, I'd have also expected a double line feed. Did you check the actual output?

Appender is better than "~=", but it's not actually that good either. Try this:

//----
void printDiamond3(size_t N)
{
    import core.memory;
    char* p = cast(char*)GC.malloc(N*N+16);
    p[0..N*N+16]='*';

    auto pp = p;
    N/=2;
    enum code = q{
        pp[0 .. N - n] = ' ';
        pp+=(1+N+n);
        version(Windows)
        {
            pp[0 .. 2] = "\r\n";
            pp+=2;
        }
        else
        {
            pp[0] = '\n';
            ++pp;
        }
    };
    foreach        (n; 0 .. N + 1) {mixin(code);}
    foreach_reverse(n; 0 .. N    ) {mixin(code);}
    write(p[0 .. pp-p]);
}
//----

This makes just 1 allocation of roughly the right size. It also eagerly fills the entire array with '*', since I *figure* that's faster than a lot of different writes.

I could be mistaken about that though, but I imagine the pre-allocation and not using Appender is definitely a boost.
March 25, 2014
> Interesting. I'd have thought the "extra copy" would be an overall slowdown, but I guess that's not the case.


>
> I also tried your strategy of adding '\n' to the buffer, but I was getting some bad output on windows. I'm not sure why "\n\n" works though. On *nix, I'd have also expected a double line feed. Did you check the actual output?

I checked the output.  The range selected is for one newline.
>
> Appender is better than "~=", but it's not actually that good either. Try this:
>
> //----
> void printDiamond3(size_t N)
> {
>     import core.memory;
>     char* p = cast(char*)GC.malloc(N*N+16);
>     p[0..N*N+16]='*';
>
>     auto pp = p;
>     N/=2;
>     enum code = q{
>         pp[0 .. N - n] = ' ';
>         pp+=(1+N+n);
>         version(Windows)
>         {
>             pp[0 .. 2] = "\r\n";
>             pp+=2;
>         }
>         else
>         {
>             pp[0] = '\n';
>             ++pp;
>         }
>     };
>     foreach        (n; 0 .. N + 1) {mixin(code);}
>     foreach_reverse(n; 0 .. N    ) {mixin(code);}
>     write(p[0 .. pp-p]);
> }
> //----
>
> This makes just 1 allocation of roughly the right size. It also eagerly fills the entire array with '*', since I *figure* that's faster than a lot of different writes.
>
> I could be mistaken about that though, but I imagine the pre-allocation and not using Appender is definitely a boost.

ok. I'll try it.  I was happy the appender was pretty fast.

March 25, 2014
These are times on ubuntu. printDiamond3 was slower than printDiamond.


brad: time: 12387[ms]
printDiamond1: time: 373[ms]
printDiamond2: time: 722[ms]
printDiamond3: time: 384[ms]
jay1: time: 62[ms]
sergei: time: 3918[ms]
jay2: time: 28[ms]
diamondShape: time: 2725[ms]
printDiamond: time: 19[ms]
printDiamonde2a: time: 18[ms]
printDiamonde2b: time: 14[ms]
printDiamond3: time: 26[ms]

March 25, 2014
On Tuesday, 25 March 2014 at 12:30:37 UTC, Jay Norwood wrote:
> These are times on ubuntu. printDiamond3 was slower than printDiamond.

Hum... Too bad :/

I was able to improve my first "printDiamon" by having a single slice that contains spaces then stars, and make writeln's of that.

It gave (on my windows) speeds comparable to your printDiamond3. But not any speed differences that warrants posting new code.

Thanks for the benches. This was fun :)

I love how D can achieve *great* performance, while still looking readable and maintainable.
March 25, 2014
On Tuesday, 25 March 2014 at 15:31:12 UTC, monarch_dodra wrote:
> I love how D can achieve *great* performance, while still looking readable and maintainable.

Yes,  I'm pretty happy to see the appender works well.  The parallel library also seems to work very well in my few experiences with it.

Maybe it would be useful to see how to use the parallel api to implement this, and if it can make a scalable impact on the execution time.




March 26, 2014
This is a first attempt at using parallel, but no improvement in speed on a corei7.  It is about 3x slower than the prior versions.  Probably the join was not a good idea.  Also, no foreach_reverse for the parallel, so it requires extra calculations for the reverse index.


void printDiamonde2cpa(in uint N)
{
    size_t N2 = N/2;
    char p[] = uninitializedArray!(char[])(N2+N);
    p[0..N2] = ' ';
    p[N2..$] = '*';
    char nl[] = uninitializedArray!(char[])(1);
    nl[] = '\n';

    char[][] wc = minimallyInitializedArray!(char[][])(N);

    auto w = appender!(char[])();

    foreach(n, ref elem; taskPool.parallel(wc[0..N2+1],100)){
        elem = p[n .. N2+2*n+1];
    }

    foreach (rn, ref elem ; taskPool.parallel(wc[0..N2],100)){
        int n = N2 - rn - 1;
        elem = p[n .. N2+2*n+1];
    }
    auto wj = join(wc,nl);
    w.put(wj);

    writeln(w.data);
}