April 02, 2015
On 3/04/2015 12:29 a.m., John Colvin wrote:
> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>> Hi,
>>>> I have a bunch of square r16 and png images which I need to flip
>>>> horizontally.
>>>>
>>>> My flip method looks like this:
>>>> void hFlip(T)(T[] data, int w)
>>>> {
>>>>    import std.datetime : StopWatch;
>>>>
>>>>    StopWatch sw;
>>>>    sw.start();
>>>>
>>>>    foreach(int i; 0..w)
>>>>    {
>>>>      auto row = data[i*w..(i+1)*w];
>>>>      row.reverse();
>>>>    }
>>>>
>>>>    sw.stop();
>>>>    writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>> }
>>>>
>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>> compare it with C# and was pretty surprised by the results.
>>>>
>>>> C#:
>>>> PNG load - 90ms
>>>> PNG flip - 10ms
>>>> PNG save - 380ms
>>>>
>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>> PNG load - 500ms
>>>> PNG flip - 30ms
>>>> PNG save - 950ms
>>>>
>>>> D using imageformats
>>>> (http://code.dlang.org/packages/imageformats):
>>>> PNG load - 230ms
>>>> PNG flip - 30ms
>>>> PNG save - 1100ms
>>>>
>>>> I used dmd-2.0.67 with -release -inline -O
>>>> C# was just with debug and VisualStudio attached to process for
>>>> debugging and even with that it is much faster.
>>>>
>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>> used with D too, but not on linux.
>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>> enough to be sure if there isn't some more effecient way to make
>>>> the flip. I like how the slices can be used here.
>>>>
>>>> For a C# user who is expecting things to just work as fast as
>>>> possible from a system level programming language this can be
>>>> somewhat disappointing to see that pure D version is about 3
>>>> times slower.
>>>>
>>>> Am I doing something utterly wrong?
>>>> Note that this example is not critical for me, it's just a simple
>>>> hobby script I use to move and flip some images - I can wait. But
>>>> I post it to see if this can be taken somewhat closer to what can
>>>> be expected from a system level programming language.
>>>>
>>>> dlib:
>>>> auto im = loadPNG(name);
>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>> savePNG(im, newName);
>>>>
>>>> imageformats:
>>>> auto im = read_image(name);
>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>
>>>> C# code:
>>>> static void Main(string[] args)
>>>>          {
>>>>              var files = Directory.GetFiles(args[0]);
>>>>
>>>>              foreach (var f in files)
>>>>              {
>>>>                  var sw = Stopwatch.StartNew();
>>>>                  var img = Image.FromFile(f);
>>>>
>>>>                  Debug.WriteLine("Img loaded in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>                  sw.Restart();
>>>>
>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>                  Debug.WriteLine("Img flipped in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>                  sw.Restart();
>>>>
>>>>                  img.Save(Path.Combine(args[0], "test_" +
>>>> Path.GetFileName(f)));
>>>>                  Debug.WriteLine("Img saved in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>                  sw.Stop();
>>>>              }
>>>>          }
>>>
>>>
>>> Assuming I've done it correctly, Devisualization.Image takes around 8ms
>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>
>>> module test;
>>>
>>> void main() {
>>>     import devisualization.image;
>>>     import devisualization.image.mutable;
>>>     import devisualization.util.core.linegraph;
>>>
>>>     import std.stdio;
>>>
>>>     writeln("===============\nREAD\n===============");
>>>     Image img = imageFromFile("test/large.png");
>>>     img = new MutableImage(img);
>>>
>>>     import std.datetime : StopWatch;
>>>
>>>     StopWatch sw;
>>>     sw.start();
>>>
>>>     foreach(i; 0 .. 1000) {
>>>         img.flipHorizontal;
>>>     }
>>>
>>>     sw.stop();
>>>
>>>     writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>> }
>>>
>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>> which fixed for 2.067 broke chunk types reading.
>>
>> My bad, forgot I decreased test image resolution to 256x256. I'm
>> totally out of the running. I have some serious work to do by the looks.
>
> Have you considered just being able to grab an object with changed
> iteration order instead of actually doing the flip? The same goes for
> transposes and 90º rotations. Sure, sometimes you do need actually
> rearrange the memory and in a subset of those cases you need it to be
> done fast, but a lot of the time you're better off* just using a
> different iteration scheme (which, for ranges, should probably be part
> of the type to avoid checking the scheme every iteration).
>
> *for speed and memory reasons. Need to keep the original and the
> transpose? No need to for any duplicates
>
> Note that this is what numpy does with transposes. The .T and .transpose
> methods of ndarray don't actually modify the data, they just set the
> memory order** whereas the transpose function actually moves memory around.
>
> **using a runtime flag, which is ok for them because internal iteration
> lets you only branch once on it.

I've got it down to ~ 12ms using dmd now. But if the image was much bigger (lets say a height of ushort.max). I wouldn't be able to use a little trick. But this is only because I'm using multithreading.
April 02, 2015
On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
> On 3/04/2015 12:29 a.m., John Colvin wrote:
>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>> Hi,
>>>>> I have a bunch of square r16 and png images which I need to flip
>>>>> horizontally.
>>>>>
>>>>> My flip method looks like this:
>>>>> void hFlip(T)(T[] data, int w)
>>>>> {
>>>>>   import std.datetime : StopWatch;
>>>>>
>>>>>   StopWatch sw;
>>>>>   sw.start();
>>>>>
>>>>>   foreach(int i; 0..w)
>>>>>   {
>>>>>     auto row = data[i*w..(i+1)*w];
>>>>>     row.reverse();
>>>>>   }
>>>>>
>>>>>   sw.stop();
>>>>>   writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>> }
>>>>>
>>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>> compare it with C# and was pretty surprised by the results.
>>>>>
>>>>> C#:
>>>>> PNG load - 90ms
>>>>> PNG flip - 10ms
>>>>> PNG save - 380ms
>>>>>
>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>> PNG load - 500ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 950ms
>>>>>
>>>>> D using imageformats
>>>>> (http://code.dlang.org/packages/imageformats):
>>>>> PNG load - 230ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 1100ms
>>>>>
>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>> C# was just with debug and VisualStudio attached to process for
>>>>> debugging and even with that it is much faster.
>>>>>
>>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>>> used with D too, but not on linux.
>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>>> enough to be sure if there isn't some more effecient way to make
>>>>> the flip. I like how the slices can be used here.
>>>>>
>>>>> For a C# user who is expecting things to just work as fast as
>>>>> possible from a system level programming language this can be
>>>>> somewhat disappointing to see that pure D version is about 3
>>>>> times slower.
>>>>>
>>>>> Am I doing something utterly wrong?
>>>>> Note that this example is not critical for me, it's just a simple
>>>>> hobby script I use to move and flip some images - I can wait. But
>>>>> I post it to see if this can be taken somewhat closer to what can
>>>>> be expected from a system level programming language.
>>>>>
>>>>> dlib:
>>>>> auto im = loadPNG(name);
>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>> savePNG(im, newName);
>>>>>
>>>>> imageformats:
>>>>> auto im = read_image(name);
>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>
>>>>> C# code:
>>>>> static void Main(string[] args)
>>>>>         {
>>>>>             var files = Directory.GetFiles(args[0]);
>>>>>
>>>>>             foreach (var f in files)
>>>>>             {
>>>>>                 var sw = Stopwatch.StartNew();
>>>>>                 var img = Image.FromFile(f);
>>>>>
>>>>>                 Debug.WriteLine("Img loaded in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>                 sw.Restart();
>>>>>
>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>>                 Debug.WriteLine("Img flipped in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>                 sw.Restart();
>>>>>
>>>>>                 img.Save(Path.Combine(args[0], "test_" +
>>>>> Path.GetFileName(f)));
>>>>>                 Debug.WriteLine("Img saved in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>                 sw.Stop();
>>>>>             }
>>>>>         }
>>>>
>>>>
>>>> Assuming I've done it correctly, Devisualization.Image takes around 8ms
>>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>>
>>>> module test;
>>>>
>>>> void main() {
>>>>    import devisualization.image;
>>>>    import devisualization.image.mutable;
>>>>    import devisualization.util.core.linegraph;
>>>>
>>>>    import std.stdio;
>>>>
>>>>    writeln("===============\nREAD\n===============");
>>>>    Image img = imageFromFile("test/large.png");
>>>>    img = new MutableImage(img);
>>>>
>>>>    import std.datetime : StopWatch;
>>>>
>>>>    StopWatch sw;
>>>>    sw.start();
>>>>
>>>>    foreach(i; 0 .. 1000) {
>>>>        img.flipHorizontal;
>>>>    }
>>>>
>>>>    sw.stop();
>>>>
>>>>    writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>>> }
>>>>
>>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>>> which fixed for 2.067 broke chunk types reading.
>>>
>>> My bad, forgot I decreased test image resolution to 256x256. I'm
>>> totally out of the running. I have some serious work to do by the looks.
>>
>> Have you considered just being able to grab an object with changed
>> iteration order instead of actually doing the flip? The same goes for
>> transposes and 90º rotations. Sure, sometimes you do need actually
>> rearrange the memory and in a subset of those cases you need it to be
>> done fast, but a lot of the time you're better off* just using a
>> different iteration scheme (which, for ranges, should probably be part
>> of the type to avoid checking the scheme every iteration).
>>
>> *for speed and memory reasons. Need to keep the original and the
>> transpose? No need to for any duplicates
>>
>> Note that this is what numpy does with transposes. The .T and .transpose
>> methods of ndarray don't actually modify the data, they just set the
>> memory order** whereas the transpose function actually moves memory around.
>>
>> **using a runtime flag, which is ok for them because internal iteration
>> lets you only branch once on it.
>
> I've got it down to ~ 12ms using dmd now. But if the image was much bigger (lets say a height of ushort.max). I wouldn't be able to use a little trick. But this is only because I'm using multithreading.

That would be an insanely large image. If it was square it would be a 4GiB image. I think it's safe to say that someone with images that large will be looking for quite specialised solutions and wouldn't be disappointed if things aren't optimally fast off-the-shelf!
April 02, 2015
On Wednesday, 1 April 2015 at 14:00:52 UTC, bearophile wrote:
> If you have to perform performance benchmarks then use ldc or gdc.
>
> Also disable bound tests with your compilation switches.
>
> Add the usual pure/nothrow/@nogc/@safe annotations where you can (they don't increase speed much, usually).
>
> if you are using classes don't forget to make the method final.
>
> Profile the code and look for the performance bottlenecks.

This very text should be placed somewhere prominent at the D
homepage if we don't want to constantly dissapoint people who
come with the impession that D should be at the same speed level
as C/C++ but their test programs aren't.
April 03, 2015
On 3/04/2015 4:27 a.m., John Colvin wrote:
> On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
>> On 3/04/2015 12:29 a.m., John Colvin wrote:
>>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>>> Hi,
>>>>>> I have a bunch of square r16 and png images which I need to flip
>>>>>> horizontally.
>>>>>>
>>>>>> My flip method looks like this:
>>>>>> void hFlip(T)(T[] data, int w)
>>>>>> {
>>>>>>   import std.datetime : StopWatch;
>>>>>>
>>>>>>   StopWatch sw;
>>>>>>   sw.start();
>>>>>>
>>>>>>   foreach(int i; 0..w)
>>>>>>   {
>>>>>>     auto row = data[i*w..(i+1)*w];
>>>>>>     row.reverse();
>>>>>>   }
>>>>>>
>>>>>>   sw.stop();
>>>>>>   writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>>> }
>>>>>>
>>>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>>> compare it with C# and was pretty surprised by the results.
>>>>>>
>>>>>> C#:
>>>>>> PNG load - 90ms
>>>>>> PNG flip - 10ms
>>>>>> PNG save - 380ms
>>>>>>
>>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>>> PNG load - 500ms
>>>>>> PNG flip - 30ms
>>>>>> PNG save - 950ms
>>>>>>
>>>>>> D using imageformats
>>>>>> (http://code.dlang.org/packages/imageformats):
>>>>>> PNG load - 230ms
>>>>>> PNG flip - 30ms
>>>>>> PNG save - 1100ms
>>>>>>
>>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>>> C# was just with debug and VisualStudio attached to process for
>>>>>> debugging and even with that it is much faster.
>>>>>>
>>>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>>>> used with D too, but not on linux.
>>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>>>> enough to be sure if there isn't some more effecient way to make
>>>>>> the flip. I like how the slices can be used here.
>>>>>>
>>>>>> For a C# user who is expecting things to just work as fast as
>>>>>> possible from a system level programming language this can be
>>>>>> somewhat disappointing to see that pure D version is about 3
>>>>>> times slower.
>>>>>>
>>>>>> Am I doing something utterly wrong?
>>>>>> Note that this example is not critical for me, it's just a simple
>>>>>> hobby script I use to move and flip some images - I can wait. But
>>>>>> I post it to see if this can be taken somewhat closer to what can
>>>>>> be expected from a system level programming language.
>>>>>>
>>>>>> dlib:
>>>>>> auto im = loadPNG(name);
>>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>>> savePNG(im, newName);
>>>>>>
>>>>>> imageformats:
>>>>>> auto im = read_image(name);
>>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>>
>>>>>> C# code:
>>>>>> static void Main(string[] args)
>>>>>>         {
>>>>>>             var files = Directory.GetFiles(args[0]);
>>>>>>
>>>>>>             foreach (var f in files)
>>>>>>             {
>>>>>>                 var sw = Stopwatch.StartNew();
>>>>>>                 var img = Image.FromFile(f);
>>>>>>
>>>>>>                 Debug.WriteLine("Img loaded in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>>                 sw.Restart();
>>>>>>
>>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>>>                 Debug.WriteLine("Img flipped in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>>                 sw.Restart();
>>>>>>
>>>>>>                 img.Save(Path.Combine(args[0], "test_" +
>>>>>> Path.GetFileName(f)));
>>>>>>                 Debug.WriteLine("Img saved in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>>                 sw.Stop();
>>>>>>             }
>>>>>>         }
>>>>>
>>>>>
>>>>> Assuming I've done it correctly, Devisualization.Image takes around
>>>>> 8ms
>>>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>>>
>>>>> module test;
>>>>>
>>>>> void main() {
>>>>>    import devisualization.image;
>>>>>    import devisualization.image.mutable;
>>>>>    import devisualization.util.core.linegraph;
>>>>>
>>>>>    import std.stdio;
>>>>>
>>>>>    writeln("===============\nREAD\n===============");
>>>>>    Image img = imageFromFile("test/large.png");
>>>>>    img = new MutableImage(img);
>>>>>
>>>>>    import std.datetime : StopWatch;
>>>>>
>>>>>    StopWatch sw;
>>>>>    sw.start();
>>>>>
>>>>>    foreach(i; 0 .. 1000) {
>>>>>        img.flipHorizontal;
>>>>>    }
>>>>>
>>>>>    sw.stop();
>>>>>
>>>>>    writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>>>> }
>>>>>
>>>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>>>> which fixed for 2.067 broke chunk types reading.
>>>>
>>>> My bad, forgot I decreased test image resolution to 256x256. I'm
>>>> totally out of the running. I have some serious work to do by the
>>>> looks.
>>>
>>> Have you considered just being able to grab an object with changed
>>> iteration order instead of actually doing the flip? The same goes for
>>> transposes and 90º rotations. Sure, sometimes you do need actually
>>> rearrange the memory and in a subset of those cases you need it to be
>>> done fast, but a lot of the time you're better off* just using a
>>> different iteration scheme (which, for ranges, should probably be part
>>> of the type to avoid checking the scheme every iteration).
>>>
>>> *for speed and memory reasons. Need to keep the original and the
>>> transpose? No need to for any duplicates
>>>
>>> Note that this is what numpy does with transposes. The .T and .transpose
>>> methods of ndarray don't actually modify the data, they just set the
>>> memory order** whereas the transpose function actually moves memory
>>> around.
>>>
>>> **using a runtime flag, which is ok for them because internal iteration
>>> lets you only branch once on it.
>>
>> I've got it down to ~ 12ms using dmd now. But if the image was much
>> bigger (lets say a height of ushort.max). I wouldn't be able to use a
>> little trick. But this is only because I'm using multithreading.
>
> That would be an insanely large image. If it was square it would be a
> 4GiB image. I think it's safe to say that someone with images that large
> will be looking for quite specialised solutions and wouldn't be
> disappointed if things aren't optimally fast off-the-shelf!

Most image editing software could definitely not handle it. I would be very surprised if e.g. libpng can even read such a file. Although I'm pretty sure mine can ;)

Worse case scenario for more than ushort.max I think it'll be a couple hundred ms.
April 06, 2015
On Wednesday, 1 April 2015 at 13:52:06 UTC, tchaloupka wrote:
> C#:
> PNG load - 90ms
> PNG flip - 10ms
> PNG save - 380ms
>
> D using dlib (http://code.dlang.org/packages/dlib):
> PNG load - 500ms
> PNG flip - 30ms
> PNG save - 950ms
>
> D using imageformats
> (http://code.dlang.org/packages/imageformats):
> PNG load - 230ms
> PNG flip - 30ms
> PNG save - 1100ms

My implementation of flip takes 0ms ;)

http://blog.thecybershadow.net/2014/03/21/functional-image-processing-in-d/
1 2
Next ›   Last »