April 02, 2015 Re: Speed of horizontal flip | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On 3/04/2015 12:29 a.m., John Colvin wrote:
> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>> Hi,
>>>> I have a bunch of square r16 and png images which I need to flip
>>>> horizontally.
>>>>
>>>> My flip method looks like this:
>>>> void hFlip(T)(T[] data, int w)
>>>> {
>>>> import std.datetime : StopWatch;
>>>>
>>>> StopWatch sw;
>>>> sw.start();
>>>>
>>>> foreach(int i; 0..w)
>>>> {
>>>> auto row = data[i*w..(i+1)*w];
>>>> row.reverse();
>>>> }
>>>>
>>>> sw.stop();
>>>> writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>> }
>>>>
>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>> compare it with C# and was pretty surprised by the results.
>>>>
>>>> C#:
>>>> PNG load - 90ms
>>>> PNG flip - 10ms
>>>> PNG save - 380ms
>>>>
>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>> PNG load - 500ms
>>>> PNG flip - 30ms
>>>> PNG save - 950ms
>>>>
>>>> D using imageformats
>>>> (http://code.dlang.org/packages/imageformats):
>>>> PNG load - 230ms
>>>> PNG flip - 30ms
>>>> PNG save - 1100ms
>>>>
>>>> I used dmd-2.0.67 with -release -inline -O
>>>> C# was just with debug and VisualStudio attached to process for
>>>> debugging and even with that it is much faster.
>>>>
>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>> used with D too, but not on linux.
>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>> enough to be sure if there isn't some more effecient way to make
>>>> the flip. I like how the slices can be used here.
>>>>
>>>> For a C# user who is expecting things to just work as fast as
>>>> possible from a system level programming language this can be
>>>> somewhat disappointing to see that pure D version is about 3
>>>> times slower.
>>>>
>>>> Am I doing something utterly wrong?
>>>> Note that this example is not critical for me, it's just a simple
>>>> hobby script I use to move and flip some images - I can wait. But
>>>> I post it to see if this can be taken somewhat closer to what can
>>>> be expected from a system level programming language.
>>>>
>>>> dlib:
>>>> auto im = loadPNG(name);
>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>> savePNG(im, newName);
>>>>
>>>> imageformats:
>>>> auto im = read_image(name);
>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>
>>>> C# code:
>>>> static void Main(string[] args)
>>>> {
>>>> var files = Directory.GetFiles(args[0]);
>>>>
>>>> foreach (var f in files)
>>>> {
>>>> var sw = Stopwatch.StartNew();
>>>> var img = Image.FromFile(f);
>>>>
>>>> Debug.WriteLine("Img loaded in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>> sw.Restart();
>>>>
>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>> Debug.WriteLine("Img flipped in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>> sw.Restart();
>>>>
>>>> img.Save(Path.Combine(args[0], "test_" +
>>>> Path.GetFileName(f)));
>>>> Debug.WriteLine("Img saved in {0}[ms]",
>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>> sw.Stop();
>>>> }
>>>> }
>>>
>>>
>>> Assuming I've done it correctly, Devisualization.Image takes around 8ms
>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>
>>> module test;
>>>
>>> void main() {
>>> import devisualization.image;
>>> import devisualization.image.mutable;
>>> import devisualization.util.core.linegraph;
>>>
>>> import std.stdio;
>>>
>>> writeln("===============\nREAD\n===============");
>>> Image img = imageFromFile("test/large.png");
>>> img = new MutableImage(img);
>>>
>>> import std.datetime : StopWatch;
>>>
>>> StopWatch sw;
>>> sw.start();
>>>
>>> foreach(i; 0 .. 1000) {
>>> img.flipHorizontal;
>>> }
>>>
>>> sw.stop();
>>>
>>> writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>> }
>>>
>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>> which fixed for 2.067 broke chunk types reading.
>>
>> My bad, forgot I decreased test image resolution to 256x256. I'm
>> totally out of the running. I have some serious work to do by the looks.
>
> Have you considered just being able to grab an object with changed
> iteration order instead of actually doing the flip? The same goes for
> transposes and 90º rotations. Sure, sometimes you do need actually
> rearrange the memory and in a subset of those cases you need it to be
> done fast, but a lot of the time you're better off* just using a
> different iteration scheme (which, for ranges, should probably be part
> of the type to avoid checking the scheme every iteration).
>
> *for speed and memory reasons. Need to keep the original and the
> transpose? No need to for any duplicates
>
> Note that this is what numpy does with transposes. The .T and .transpose
> methods of ndarray don't actually modify the data, they just set the
> memory order** whereas the transpose function actually moves memory around.
>
> **using a runtime flag, which is ok for them because internal iteration
> lets you only branch once on it.
I've got it down to ~ 12ms using dmd now. But if the image was much bigger (lets say a height of ushort.max). I wouldn't be able to use a little trick. But this is only because I'm using multithreading.
|
April 02, 2015 Re: Speed of horizontal flip | ||||
---|---|---|---|---|
| ||||
Posted in reply to Rikki Cattermole | On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
> On 3/04/2015 12:29 a.m., John Colvin wrote:
>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>> Hi,
>>>>> I have a bunch of square r16 and png images which I need to flip
>>>>> horizontally.
>>>>>
>>>>> My flip method looks like this:
>>>>> void hFlip(T)(T[] data, int w)
>>>>> {
>>>>> import std.datetime : StopWatch;
>>>>>
>>>>> StopWatch sw;
>>>>> sw.start();
>>>>>
>>>>> foreach(int i; 0..w)
>>>>> {
>>>>> auto row = data[i*w..(i+1)*w];
>>>>> row.reverse();
>>>>> }
>>>>>
>>>>> sw.stop();
>>>>> writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>> }
>>>>>
>>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>> compare it with C# and was pretty surprised by the results.
>>>>>
>>>>> C#:
>>>>> PNG load - 90ms
>>>>> PNG flip - 10ms
>>>>> PNG save - 380ms
>>>>>
>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>> PNG load - 500ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 950ms
>>>>>
>>>>> D using imageformats
>>>>> (http://code.dlang.org/packages/imageformats):
>>>>> PNG load - 230ms
>>>>> PNG flip - 30ms
>>>>> PNG save - 1100ms
>>>>>
>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>> C# was just with debug and VisualStudio attached to process for
>>>>> debugging and even with that it is much faster.
>>>>>
>>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>>> used with D too, but not on linux.
>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>>> enough to be sure if there isn't some more effecient way to make
>>>>> the flip. I like how the slices can be used here.
>>>>>
>>>>> For a C# user who is expecting things to just work as fast as
>>>>> possible from a system level programming language this can be
>>>>> somewhat disappointing to see that pure D version is about 3
>>>>> times slower.
>>>>>
>>>>> Am I doing something utterly wrong?
>>>>> Note that this example is not critical for me, it's just a simple
>>>>> hobby script I use to move and flip some images - I can wait. But
>>>>> I post it to see if this can be taken somewhat closer to what can
>>>>> be expected from a system level programming language.
>>>>>
>>>>> dlib:
>>>>> auto im = loadPNG(name);
>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>> savePNG(im, newName);
>>>>>
>>>>> imageformats:
>>>>> auto im = read_image(name);
>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>
>>>>> C# code:
>>>>> static void Main(string[] args)
>>>>> {
>>>>> var files = Directory.GetFiles(args[0]);
>>>>>
>>>>> foreach (var f in files)
>>>>> {
>>>>> var sw = Stopwatch.StartNew();
>>>>> var img = Image.FromFile(f);
>>>>>
>>>>> Debug.WriteLine("Img loaded in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>> sw.Restart();
>>>>>
>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>> Debug.WriteLine("Img flipped in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>> sw.Restart();
>>>>>
>>>>> img.Save(Path.Combine(args[0], "test_" +
>>>>> Path.GetFileName(f)));
>>>>> Debug.WriteLine("Img saved in {0}[ms]",
>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>> sw.Stop();
>>>>> }
>>>>> }
>>>>
>>>>
>>>> Assuming I've done it correctly, Devisualization.Image takes around 8ms
>>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>>
>>>> module test;
>>>>
>>>> void main() {
>>>> import devisualization.image;
>>>> import devisualization.image.mutable;
>>>> import devisualization.util.core.linegraph;
>>>>
>>>> import std.stdio;
>>>>
>>>> writeln("===============\nREAD\n===============");
>>>> Image img = imageFromFile("test/large.png");
>>>> img = new MutableImage(img);
>>>>
>>>> import std.datetime : StopWatch;
>>>>
>>>> StopWatch sw;
>>>> sw.start();
>>>>
>>>> foreach(i; 0 .. 1000) {
>>>> img.flipHorizontal;
>>>> }
>>>>
>>>> sw.stop();
>>>>
>>>> writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>>> }
>>>>
>>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>>> which fixed for 2.067 broke chunk types reading.
>>>
>>> My bad, forgot I decreased test image resolution to 256x256. I'm
>>> totally out of the running. I have some serious work to do by the looks.
>>
>> Have you considered just being able to grab an object with changed
>> iteration order instead of actually doing the flip? The same goes for
>> transposes and 90º rotations. Sure, sometimes you do need actually
>> rearrange the memory and in a subset of those cases you need it to be
>> done fast, but a lot of the time you're better off* just using a
>> different iteration scheme (which, for ranges, should probably be part
>> of the type to avoid checking the scheme every iteration).
>>
>> *for speed and memory reasons. Need to keep the original and the
>> transpose? No need to for any duplicates
>>
>> Note that this is what numpy does with transposes. The .T and .transpose
>> methods of ndarray don't actually modify the data, they just set the
>> memory order** whereas the transpose function actually moves memory around.
>>
>> **using a runtime flag, which is ok for them because internal iteration
>> lets you only branch once on it.
>
> I've got it down to ~ 12ms using dmd now. But if the image was much bigger (lets say a height of ushort.max). I wouldn't be able to use a little trick. But this is only because I'm using multithreading.
That would be an insanely large image. If it was square it would be a 4GiB image. I think it's safe to say that someone with images that large will be looking for quite specialised solutions and wouldn't be disappointed if things aren't optimally fast off-the-shelf!
|
April 02, 2015 Re: Speed of horizontal flip | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Wednesday, 1 April 2015 at 14:00:52 UTC, bearophile wrote:
> If you have to perform performance benchmarks then use ldc or gdc.
>
> Also disable bound tests with your compilation switches.
>
> Add the usual pure/nothrow/@nogc/@safe annotations where you can (they don't increase speed much, usually).
>
> if you are using classes don't forget to make the method final.
>
> Profile the code and look for the performance bottlenecks.
This very text should be placed somewhere prominent at the D
homepage if we don't want to constantly dissapoint people who
come with the impession that D should be at the same speed level
as C/C++ but their test programs aren't.
|
April 03, 2015 Re: Speed of horizontal flip | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On 3/04/2015 4:27 a.m., John Colvin wrote:
> On Thursday, 2 April 2015 at 11:49:44 UTC, Rikki Cattermole wrote:
>> On 3/04/2015 12:29 a.m., John Colvin wrote:
>>> On Thursday, 2 April 2015 at 09:55:15 UTC, Rikki Cattermole wrote:
>>>> On 2/04/2015 10:47 p.m., Rikki Cattermole wrote:
>>>>> On 2/04/2015 2:52 a.m., tchaloupka wrote:
>>>>>> Hi,
>>>>>> I have a bunch of square r16 and png images which I need to flip
>>>>>> horizontally.
>>>>>>
>>>>>> My flip method looks like this:
>>>>>> void hFlip(T)(T[] data, int w)
>>>>>> {
>>>>>> import std.datetime : StopWatch;
>>>>>>
>>>>>> StopWatch sw;
>>>>>> sw.start();
>>>>>>
>>>>>> foreach(int i; 0..w)
>>>>>> {
>>>>>> auto row = data[i*w..(i+1)*w];
>>>>>> row.reverse();
>>>>>> }
>>>>>>
>>>>>> sw.stop();
>>>>>> writeln("Img flipped in: ", sw.peek().msecs, "[ms]");
>>>>>> }
>>>>>>
>>>>>> With simple r16 file format its pretty fast, but with RGB PNG
>>>>>> files (2048x2048) I noticed its somewhat slow so I tried to
>>>>>> compare it with C# and was pretty surprised by the results.
>>>>>>
>>>>>> C#:
>>>>>> PNG load - 90ms
>>>>>> PNG flip - 10ms
>>>>>> PNG save - 380ms
>>>>>>
>>>>>> D using dlib (http://code.dlang.org/packages/dlib):
>>>>>> PNG load - 500ms
>>>>>> PNG flip - 30ms
>>>>>> PNG save - 950ms
>>>>>>
>>>>>> D using imageformats
>>>>>> (http://code.dlang.org/packages/imageformats):
>>>>>> PNG load - 230ms
>>>>>> PNG flip - 30ms
>>>>>> PNG save - 1100ms
>>>>>>
>>>>>> I used dmd-2.0.67 with -release -inline -O
>>>>>> C# was just with debug and VisualStudio attached to process for
>>>>>> debugging and even with that it is much faster.
>>>>>>
>>>>>> I know that System.Drawing is using Windows GDI+, that can be
>>>>>> used with D too, but not on linux.
>>>>>> If we ignore the PNG loading and saving (didn't tried libpng
>>>>>> yet), even flip method itself is 3 times slower - I don't know D
>>>>>> enough to be sure if there isn't some more effecient way to make
>>>>>> the flip. I like how the slices can be used here.
>>>>>>
>>>>>> For a C# user who is expecting things to just work as fast as
>>>>>> possible from a system level programming language this can be
>>>>>> somewhat disappointing to see that pure D version is about 3
>>>>>> times slower.
>>>>>>
>>>>>> Am I doing something utterly wrong?
>>>>>> Note that this example is not critical for me, it's just a simple
>>>>>> hobby script I use to move and flip some images - I can wait. But
>>>>>> I post it to see if this can be taken somewhat closer to what can
>>>>>> be expected from a system level programming language.
>>>>>>
>>>>>> dlib:
>>>>>> auto im = loadPNG(name);
>>>>>> hFlip(cast(ubyte[3][])im.data, cast(int)im.width);
>>>>>> savePNG(im, newName);
>>>>>>
>>>>>> imageformats:
>>>>>> auto im = read_image(name);
>>>>>> hFlip(cast(ubyte[3][])im.pixels, cast(int)im.w);
>>>>>> write_image(newName, im.w, im.h, im.pixels);
>>>>>>
>>>>>> C# code:
>>>>>> static void Main(string[] args)
>>>>>> {
>>>>>> var files = Directory.GetFiles(args[0]);
>>>>>>
>>>>>> foreach (var f in files)
>>>>>> {
>>>>>> var sw = Stopwatch.StartNew();
>>>>>> var img = Image.FromFile(f);
>>>>>>
>>>>>> Debug.WriteLine("Img loaded in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>> sw.Restart();
>>>>>>
>>>>>> img.RotateFlip(RotateFlipType.RotateNoneFlipX);
>>>>>> Debug.WriteLine("Img flipped in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>> sw.Restart();
>>>>>>
>>>>>> img.Save(Path.Combine(args[0], "test_" +
>>>>>> Path.GetFileName(f)));
>>>>>> Debug.WriteLine("Img saved in {0}[ms]",
>>>>>> (int)sw.Elapsed.TotalMilliseconds);
>>>>>> sw.Stop();
>>>>>> }
>>>>>> }
>>>>>
>>>>>
>>>>> Assuming I've done it correctly, Devisualization.Image takes around
>>>>> 8ms
>>>>> in debug mode to flip horizontally using dmd. But 3ms for release.
>>>>>
>>>>> module test;
>>>>>
>>>>> void main() {
>>>>> import devisualization.image;
>>>>> import devisualization.image.mutable;
>>>>> import devisualization.util.core.linegraph;
>>>>>
>>>>> import std.stdio;
>>>>>
>>>>> writeln("===============\nREAD\n===============");
>>>>> Image img = imageFromFile("test/large.png");
>>>>> img = new MutableImage(img);
>>>>>
>>>>> import std.datetime : StopWatch;
>>>>>
>>>>> StopWatch sw;
>>>>> sw.start();
>>>>>
>>>>> foreach(i; 0 .. 1000) {
>>>>> img.flipHorizontal;
>>>>> }
>>>>>
>>>>> sw.stop();
>>>>>
>>>>> writeln("Img flipped in: ", sw.peek().msecs / 1000, "[ms]");
>>>>> }
>>>>>
>>>>> I was planning on doing this earlier. But I discovered a PR I pulled
>>>>> which fixed for 2.067 broke chunk types reading.
>>>>
>>>> My bad, forgot I decreased test image resolution to 256x256. I'm
>>>> totally out of the running. I have some serious work to do by the
>>>> looks.
>>>
>>> Have you considered just being able to grab an object with changed
>>> iteration order instead of actually doing the flip? The same goes for
>>> transposes and 90º rotations. Sure, sometimes you do need actually
>>> rearrange the memory and in a subset of those cases you need it to be
>>> done fast, but a lot of the time you're better off* just using a
>>> different iteration scheme (which, for ranges, should probably be part
>>> of the type to avoid checking the scheme every iteration).
>>>
>>> *for speed and memory reasons. Need to keep the original and the
>>> transpose? No need to for any duplicates
>>>
>>> Note that this is what numpy does with transposes. The .T and .transpose
>>> methods of ndarray don't actually modify the data, they just set the
>>> memory order** whereas the transpose function actually moves memory
>>> around.
>>>
>>> **using a runtime flag, which is ok for them because internal iteration
>>> lets you only branch once on it.
>>
>> I've got it down to ~ 12ms using dmd now. But if the image was much
>> bigger (lets say a height of ushort.max). I wouldn't be able to use a
>> little trick. But this is only because I'm using multithreading.
>
> That would be an insanely large image. If it was square it would be a
> 4GiB image. I think it's safe to say that someone with images that large
> will be looking for quite specialised solutions and wouldn't be
> disappointed if things aren't optimally fast off-the-shelf!
Most image editing software could definitely not handle it. I would be very surprised if e.g. libpng can even read such a file. Although I'm pretty sure mine can ;)
Worse case scenario for more than ushort.max I think it'll be a couple hundred ms.
|
April 06, 2015 Re: Speed of horizontal flip | ||||
---|---|---|---|---|
| ||||
Posted in reply to tchaloupka | On Wednesday, 1 April 2015 at 13:52:06 UTC, tchaloupka wrote: > C#: > PNG load - 90ms > PNG flip - 10ms > PNG save - 380ms > > D using dlib (http://code.dlang.org/packages/dlib): > PNG load - 500ms > PNG flip - 30ms > PNG save - 950ms > > D using imageformats > (http://code.dlang.org/packages/imageformats): > PNG load - 230ms > PNG flip - 30ms > PNG save - 1100ms My implementation of flip takes 0ms ;) http://blog.thecybershadow.net/2014/03/21/functional-image-processing-in-d/ |
Copyright © 1999-2021 by the D Language Foundation