August 03, 2018
On Friday, 3 August 2018 at 09:41:53 UTC, Radu wrote:
> You might wanna change some of the code to be more idiomatic, for example `DitherRCoefficient` can be turned into and enum manifest constant.

Thanks for the advice. I wanted to use something similar to C's #define, but with my quick Google search a mixin template looked like the closest I could get. Based on your comment and reading the docs on it, the enum manifest constant looks like what I was originally trying to do, though I think now I'm going to change it to actually compute the cbrt through a callback to JavaScript's Math.cbrt function.


August 03, 2018
On Thursday, 2 August 2018 at 22:04:50 UTC, Allen Garvey wrote:
> 4. I was not able to use any of the optimization flags except for enable inlining, since using any of them would optimize out the entire program. I'm assuming this is because there is no main function, so the compiler can't tell which functions are actually being called.

I won't be able to run your full example (I'll keep my personal machine Node.js-free! ;)), so I just used `ldc2 -mtriple=wasm32-unknown-unknown-wasm -betterC -O main.wasm` with current LDC master (and added the missing __assert stub).
The produced .wasm is 620 bytes small (without -O: 3,235 bytes); loading it in Firefox shows that the functions are there, here's an excerpt:

  (func $fillBayerMatrix (;3;) (param $var0 i32) (param $var1 i32)
    get_local $var1
    i64.const 4469670136257392600
    i64.store align=4
    get_local $var1
    i32.const 8
    i32.add
    i64.const -4753701902744867000
    i64.store align=4
  )
  (func $dither (;5;) (param $var0 i32) (param $var1 i32) (param $var2 i32) (param $var3 i32)
    get_local $var2
    i64.const 4469670136257392600
    i64.store align=4
    get_local $var2
    i32.const 8
    i32.add
    i64.const -4753701902744867000
    i64.store align=4
    block $label0
      get_local $var0
      get_local $var1
      i32.mul
      i32.const 2
      i32.shl
      i32.const 1
      i32.ge_s
      br_if $label0
      return
    end $label0
    unreachable
    unreachable
  )
August 03, 2018
On Friday, 3 August 2018 at 19:01:02 UTC, kinke wrote:
> I won't be able to run your full example (I'll keep my personal machine Node.js-free! ;)), so I just used `ldc2 -mtriple=wasm32-unknown-unknown-wasm -betterC -O main.wasm` with current LDC master (and added the missing __assert stub).
> The produced .wasm is 620 bytes small (without -O: 3,235 bytes); loading it in Firefox shows that the functions are there, here's an excerpt:
>
>   (func $fillBayerMatrix (;3;) (param $var0 i32) (param $var1 i32)
>     get_local $var1
>     i64.const 4469670136257392600
>     i64.store align=4
>     get_local $var1
>     i32.const 8
>     i32.add
>     i64.const -4753701902744867000
>     i64.store align=4
>   )
>   (func $dither (;5;) (param $var0 i32) (param $var1 i32) (param $var2 i32) (param $var3 i32)
>     get_local $var2
>     i64.const 4469670136257392600
>     i64.store align=4
>     get_local $var2
>     i32.const 8
>     i32.add
>     i64.const -4753701902744867000
>     i64.store align=4
>     block $label0
>       get_local $var0
>       get_local $var1
>       i32.mul
>       i32.const 2
>       i32.shl
>       i32.const 1
>       i32.ge_s
>       br_if $label0
>       return
>     end $label0
>     unreachable
>     unreachable
>   )

Thanks for taking the time to look in to this for me. I think on a slightly earlier version of my code, when compiling with the optimizations the output was around 350 bytes and the error was something about the dither function not being able to be found. Compiling it now (still using beta 2) with the -O flag and the assert stub I get a similar 618 byte output to you, but when I try to run it, the browser gives this error: Uncaught RuntimeError: unreachable. The wasm output for the dither function is this

(func $dither (export "dither") (type $t4) (param $p0 i32) (param $p1 i32) (param $p2 i32) (param $p3 i32)
    (i64.store align=4
      (get_local $p2)
      (i64.const 4469670136257392629))
    (i64.store align=4
      (i32.add
        (get_local $p2)
        (i32.const 8))
      (i64.const -4753701902744866827))
    (block $B0
      (br_if $B0
        (i32.ge_s
          (i32.shl
            (i32.mul
              (get_local $p0)
              (get_local $p1))
            (i32.const 2))
          (i32.const 1)))
      (return))
    (unreachable)
    (unreachable))

and the browser says the error is on the first unreachable declaration at the end of the function.

Also, if you have either python 2 or python 3 on your system, you can run a server by cd-ing into the docs directory of the project and running `python -m SimpleHTTPServer 3000` (python 2) or `python3 -m http.server 3000` (python 3). They both do the same thing as the npm script, which is to serve the site on localhost:3000.

I will try seeing if I can build the current LDC master from source to test with that, but I have had mixed success with building projects from source in the past.


August 03, 2018
On Friday, 3 August 2018 at 19:47:14 UTC, Allen Garvey wrote:
> I will try seeing if I can build the current LDC master from source to test with that, but I have had mixed success with building projects from source in the past.

Probably not worth it, the changes since beta2 are minimal.

> (func $dither (export "dither") (type $t4) (param $p0 i32) (param $p1 i32) (param $p2 i32) (param $p3 i32)
>     (i64.store align=4
>       (get_local $p2)
>       (i64.const 4469670136257392629))
>     (i64.store align=4
>       (i32.add
>         (get_local $p2)
>         (i32.const 8))
>       (i64.const -4753701902744866827))
>     (block $B0
>       (br_if $B0
>         (i32.ge_s
>           (i32.shl
>             (i32.mul
>               (get_local $p0)
>               (get_local $p1))
>             (i32.const 2))
>           (i32.const 1)))
>       (return))
>     (unreachable)
>     (unreachable))
>
> and the browser says the error is on the first unreachable declaration at the end of the function.

I haven't looked into the actual wasm semantics; guessing by your report, it looks as if the return after the conditional branch [the Firefox textual display is clearer IMO] would return from the enclosing block, and not from the function. Otherwise, the 2 (!) unreachables at the end would truly be unreachable. An LLVM issue, I guess.
August 03, 2018
> An LLVM issue, I guess.

Nope, LLVM apparently doesn't like the pixels array starting at null (so just pass a null pointer from JS to WebAssembly). With this diff and your python2 help, I got it to work locally now (849 bytes):

diff --git a/docs/js/worker.js b/docs/js/worker.js
index 57756e9..b512fad 100644
--- a/docs/js/worker.js
+++ b/docs/js/worker.js
@@ -162,7 +162,7 @@
             const heapSize = wasmHeap.length - imageByteSize;
             //dither image
             const performanceResults = Timer.megapixelsPerSecond('WASM ordered dithering performance', imageWidth * imageHeight, ()=>{
-                wasmExports.dither(imageWidth, imageHeight, heapOffset, heapSize);
+                wasmExports.dither(0, imageWidth, imageHeight, heapOffset, heapSize);
             });
             performanceResults[2] = ditherId;
             //dithered image is now in the wasmHeap
diff --git a/wasm_src/main.d b/wasm_src/main.d
index 4878a98..6959b7b 100644
--- a/wasm_src/main.d
+++ b/wasm_src/main.d
@@ -11,7 +11,7 @@ template TInitialize(T){
     //sort of halfway between static and dynamic array
     //like dynamic array in that length and offset can be runtime values
     //but like static array in that length cannot change after initialization without possible causing problems
-    T[] fixedArray(int offset, int length){
+    T[] fixedArray(void* offset, int length){
         //take pointer to (global/heap? not sure correct term) memory and convert to array by taking slice
         //(make sure you disable bounds checking in compiler since assert is not supported in wasm currently)
         return (cast(T*) offset)[0..length];
@@ -58,16 +58,16 @@ void fillBayerMatrix(float[] bayerMatrix){
        bayerMatrix[3] = -.166666667 * DITHER_R_COEFFICIENT;
 }

-void dither(int imageWidth, int imageHeight, int heapOffset, int heapLength){
+void dither(void* pixelsData, int imageWidth, int imageHeight, void* heapStart, int heapLength){
        //* 4 since RGBA format
        immutable int pixelsLength = imageWidth * imageHeight * 4;
     //pixels array starts at offset 0 in wasm heap
-    ubyte[] pixels = TInitialize!(ubyte).fixedArray(0, pixelsLength);
+    ubyte[] pixels = TInitialize!(ubyte).fixedArray(pixelsData, pixelsLength);

     //2x2 bayer matrix
     immutable int bayerDimensions = 2;
     //create array using heap memory
-    float[] bayerMatrix = TInitialize!(float).fixedArray(heapOffset, bayerDimensions*bayerDimensions);
+    float[] bayerMatrix = TInitialize!(float).fixedArray(heapStart, bayerDimensions*bayerDimensions);

     /*
     //adjust heapOffset and heapLength, in case we want to use them again
@@ -105,5 +105,7 @@ void dither(int imageWidth, int imageHeight, int heapOffset, int heapLength){
     }
 }

+void __assert(const(char)* msg, const(char)* file, uint line) {}
+
 // seems to be the required entry point
 void _start() {}
August 03, 2018
On Friday, 3 August 2018 at 21:05:09 UTC, kinke wrote:
>> An LLVM issue, I guess.
>
> Nope, LLVM apparently doesn't like the pixels array starting at null (so just pass a null pointer from JS to WebAssembly). With this diff and your python2 help, I got it to work locally now (849 bytes):

Thank you very much for your help! My knowledge of pointers is a bit sketchy, so I had assumed they were pretty much interchangeable with ints. With your changes I was able to get it to compile and run beta2, with I assume similar output, as the binary was also 849 bytes. I'm seeing a speed increase of over 2x compared to the non-optimized version. This is surprising to me, because since browsers JIT everything I had assumed optimizing the wasm binary would not make that much of a difference.

It's also a bit strange how llvm was able to output valid code with no optimizations, but not do it with optimizations turned on. After skimming through this link https://webassembly.org/docs/semantics/ for wasm semantics, it seems as though a function (or possibly block as well?) can end with a return statement or an unreachable statement, but not both, as in the optimized dither function output (before your changes).

"""
unreachable: An instruction which always traps. It is intended to be used for example after calls to functions which are known by the producer not to return.
"""

Also, on an unrelated note, do you by any chance know if version(WebAssembly) is supported yet (or maybe it is in master but not the beta)? I was trying to use it to conditionally include the assert stub, but it doesn't seem to be working for me.

August 03, 2018
On Friday, 3 August 2018 at 21:55:56 UTC, Allen Garvey wrote:
> My knowledge of pointers is a bit sketchy, so I had assumed they were pretty much interchangeable with ints.

They apparently are (numbers on the JS side, pointers on wasm), so you can simply declare the params as appropriately typed pointers. That wasn't the problem though, see below.

> I'm seeing a speed increase of over 2x compared to the non-optimized version.

And further doubling of that performance [at least for me] with https://github.com/allen-garvey/wasm-dither-example/pull/1. :)

> It's also a bit strange how llvm was able to output valid code with no optimizations, but not do it with optimizations turned on.

The optimizer apparently figured you were going to dereference a seemingly invalid address (probably `3`, the alpha channel of the 1st pixel, which is read unconditionally in each iteration) and so optimized the whole loop body to a trap (and branched directly to the first unreachable in the 1st iteration).
These low addresses are perfectly valid and to be expected in wasm though (well, I don't encourage using null though :]). Passing the pointer as argument prevents LLVM from such aggressive optimizations.

> Also, on an unrelated note, do you by any chance know if version(WebAssembly) is supported yet (or maybe it is in master but not the beta)? I was trying to use it to conditionally include the assert stub, but it doesn't seem to be working for me.

Ah yeah, that made it into master shortly after beta2. [In beta2, it's `WebAssembly32`.]
August 05, 2018
On Friday, 3 August 2018 at 23:36:18 UTC, kinke wrote:
> And further doubling of that performance [at least for me] with https://github.com/allen-garvey/wasm-dither-example/pull/1. :)

I was reviewing your pull request and it looks very nice, much more succinct and roughly 20% faster than my code. It was interesting to see what idiomatic D looks like, as I'm sure you noticed, my own D style is somewhere between C without the cruft and Java compiled to native code :). The only thing is that I've noticed a weird visual glitch that I've narrowed down to the use of the enum array for some reason, as storing the array on the heap is the only thing that seems to make it go away. To see what I'm seeing, create a png with width >= 256 pixels and make it either completely black or completely white. The glitch only shows up once and in the same general place no matter the image size, but not at the exact same array index, so I'm baffled as to what could be causing it.

> The optimizer apparently figured you were going to dereference a seemingly invalid address (probably `3`, the alpha channel of the 1st pixel, which is read unconditionally in each iteration) and so optimized the whole loop body to a trap (and branched directly to the first unreachable in the 1st iteration).
> These low addresses are perfectly valid and to be expected in wasm though (well, I don't encourage using null though :]). Passing the pointer as argument prevents LLVM from such aggressive optimizations.

I see, that makes sense. The thing that's counter-intuitive to me is that if you write the pointer address as 0 right in the code it gets interpreted as null, but if you pass in the same value as 0, everything works as expected. I guess that has to do with the legacy of other systems, as only on WebAssembly is 0 a memory valid address.

> Ah yeah, that made it into master shortly after beta2. [In beta2, it's `WebAssembly32`.]

That's good to hear.


August 05, 2018
On Sunday, 5 August 2018 at 03:01:46 UTC, Allen Garvey wrote:
> The thing that's counter-intuitive to me is that if you write the pointer address as 0 right in the code it gets interpreted as null, but if you pass in the same value as 0, everything works as expected.

If you provide the start address as null literal directly in the D code, the optimizer infers that you're going to read from address 0x3 in the first iteration. If you provide the start address as argument from outside code not available during optimization, LLVM cannot make any assumptions in this regard.

> The only thing is that I've noticed a weird visual glitch that I've narrowed down to the use of the enum array for some reason

I wasn't sure about the enum array; it'd probably make more sense to define it as static immutable directly in the function. I'd then expect this global to live in the module's memory (`exports.memory`) and be initialized properly during instantiation; this means that the start address/offset of the pixel data probably shouldn't be 0 and that you shouldn't overwrite any existing data after instantiation, just appending to it, in order not to overwrite any D globals.
August 05, 2018
On Sunday, 5 August 2018 at 14:06:25 UTC, kinke wrote:
> If you provide the start address as null literal directly in the D code, the optimizer infers that you're going to read from address 0x3 in the first iteration. If you provide the start address as argument from outside code not available during optimization, LLVM cannot make any assumptions in this regard.

I get that now, I guess I just meant it's sort of amusing in a philosophical sense, as it's not what you know, but how you know it.

> I wasn't sure about the enum array; it'd probably make more sense to define it as static immutable directly in the function. I'd then expect this global to live in the module's memory (`exports.memory`) and be initialized properly during instantiation; this means that the start address/offset of the pixel data probably shouldn't be 0 and that you shouldn't overwrite any existing data after instantiation, just appending to it, in order not to overwrite any D globals.

Gotcha. I was wondering where exactly the enum array was being stored, but I see what you mean that it was being stored in the same place we were putting the pixel data. I've merged your changes and everything's looking good! Thanks again for taking the time to help me out with this.