October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nicholas Wilson | On Sunday, 9 October 2016 at 08:25:40 UTC, Nicholas Wilson wrote:
> How? All you need is an extra `each` e.g. r.inBatchesOf!(8).each!(a =>a[].map!(convertColor!RGBA8))
>
> perhaps define a helper function for it that does each + the explicit slice + map, but it certainly doesn't scream completely different API to me.
Ha, realised I went full circle. Still might be useful if the compiler is able to use the fact that the range is a multiple of N (particularly is N is a power of 2).
Are you able to apply arbitrary attributes to the delegate passed (i.e. things like @fastmath)?
|
October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nicholas Wilson | On Sunday, 9 October 2016 at 08:39:57 UTC, Nicholas Wilson wrote:
> On Sunday, 9 October 2016 at 05:34:06 UTC, Ilya Yaroshenko wrote:
>> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote:
>>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>> [...]
>>>
>>> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate.
>>
>> Could you please give an example what type of operation should be vectorized?
>
> anything that is able to be. Given that ElementType!InBatchesOfN are a static array
> of ElementType!(R), the compiler can* (assuming no branching and anything else that impedes vectorisation) combine most expressions into equivalent vector instruction.
> This approach might not work so well for colours as is but should work if we "transpose" the colour i.e. rgbargbargbargba -> rrrrggggbbbbaaaa and then transpose it back.
>
> *I know this is the sufficiently intelligent compiler argument
static foreach can help for static arrays. For example, ndslice uses static foreach a lot. mir.ndslice.algorithm allows to perform vectorized operations. Some conversion algorithms for ndslices will be added to Mir or DCV. Please open a pool request or fill an issue.
|
October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On 9 October 2016 at 15:34, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote: >> >> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote: >>> >>> [...] >> >> >> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate. > > > Could you please give an example what type of operation should be vectorized? Let's consider a super simple blend: dest = src.rgb * src.a + dest.rgb * (1-src.alpha); This is perhaps the most common blend that exists. If this is a ubyte[4] color, which is the most common format, then to do it efficiently, runs of 16 colors (4x ubyte[16] vectors), needs to be rearranged into: ubyte[16][3] rgb = [ [RGBRGBRGBRGBRGBR], [GBRGBRGBRGBRGBRG], [BRGBRGBRGBRGBRGB] ]; ubyte[16][3] a = [ [AAAaaaAAAaaaAAAa], [aaAAAaaaAAAaaaAA], [AaaaAAAaaaAAAaaa] ]; You can do this with gather loads, or with a couple of shuffle's after loading. Then obviously do the work in this configuration. Or you might expand it to [ [RRRRRRRRRRRRRRRR], [GGGGGGGGGGGGGGGG], [BBBBBBBBBBBBBBBB] ], etc, depends on the work, and which expansion is cheaper for the platform (ie, shuffling limitations). Now, this might not look like much of a win for this blend, but as you extend the sequence of ops, the win gets much much bigger. Particularly so if you want to do gamma-correct stuff, which would usually involve expanding those ubytes into floats, then doing vector pow's and stuff like that. Either way, you need to iterate the image 4 vectors at a time. That's the sort of batching I'm talking about. Trouble is, this work needs to be wrapped into a function that receives inputs in batches, like: RGBA8[16] doBulkBlend(RGBA8[16] buffer) { ... bulk blend code ... } This sort of thing: buffer.map!(e => src.rgb * src.a).copy(output); Super readable! Would be really nice to express, but I have no idea how we can make that sort of thing efficient. You could start writing this sort of thing: buffer.chunksOf!16.map!(e => doBulkBlend(e[0..16])).deChunk.copy(output); Yeah, it's code... it would compile, but I consider that to be completely obfuscated. You can't look at that and understand anything much about what it does... so I don't think that's a good goal-post at all. If I showed that to a colleague, I don't think they'd be impressed. We can't reach that point and say D is awesome for data-stream processing... we need to go a lot further than that. Anyway, I think this sort of thing is a minimum target. I'd like to see how this sort of batching would integrate into ndslice nicely, because it introduces the 'nd' iteration element... there's a heap of challenges; element alignment, unaligned line-strides, mid-vector slices, etc. I can imagine certain filter algorithms that work on 2d slices ('blocks') rather than 1d slices like the example above. What if the image is rotated or transposed? How can applying a per-pixel operation on a buffer iterate the memory-linear fashion even though it's working in batched of elements? I haven't sat and tried plugging this into ndslice much yet. Haven't had enough time, and I really wanted to get colour to a point I'm happy with. |
October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On 9 October 2016 at 15:34, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote:
>>
>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> [...]
>>
>>
>> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate.
>
>
> Could you please give an example what type of operation should be vectorized?
Even operations that don't require shuffling, eg:
RGBA8[] a, b;
zip(a, b).map!(e => e[0] + e[1]).copy(output);
Which I've suggested before (and Walter liked the idea), could be
sugared up by making use of the languages largely under-used array
operation syntax:
output[] = a[] + b[]; // ie, types have overloaded addition
operators, so this array expression would be lowered to the pipeline
expression above. This would be super-useful for HEAPS of things!
Even these still need to be done in batches since colour adds are saturating operations, and there are SIMD instructions for saturating arithmetic, so we basically always have to do colour work in SIMD, which means batching, and that basically ruins any chance for natural, readable, expressions in your code. I want to find a way that we can express these operations naturally, without having to always manually handle the batching.
If we can get there, then I will say D is a good language for stream-data processing.
|
October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nicholas Wilson | On 9 October 2016 at 18:25, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote:
>>
>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>
>>> How far would `r.inBatchesOf!(N)` go in terms of compiler optimisations
>>> (e.g. vectorisation) if N is a power of 2?
>>>
>>> auto inBatchesOf(size_t N,R)(R r) if(N!=0 &&isInputRange!R &&
>>> hasLength!R)
>>> {
>>> struct InBatchesOfN
>>> {
>>> R r;
>>> ElementType!(R)[N] batch;
>>> this(R _r)
>>> {
>>> assert(_r.length % N ==0);// could have overloads where
>>> undefined elements == ElementType!(R).init
>>> r = _r;
>>> foreach( i; 0..N)
>>> {
>>> batch[i] = r.front;
>>> r.popFront;
>>> }
>>> }
>>>
>>> bool empty() { return r.empty; }
>>> auto front { return batch; }
>>> void popFront()
>>> {
>>> foreach( i; 0..N)
>>> {
>>> batch[i] = r.front;
>>> r.popFront;
>>> }
>>> }
>>> }
>>>
>>> return InBatchesOfN(r);
>>> }
>>
>>
>> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate.
>
>
> How? All you need is an extra `each` e.g. r.inBatchesOf!(8).each!(a
> =>a[].map!(convertColor!RGBA8))
>
> perhaps define a helper function for it that does each + the explicit slice + map, but it certainly doesn't scream completely different API to me.
As you demonstrate; convertColor doesn't accept RGBA8[16], it accepts a single RGBA8... there's no way the optimiser will be able to magic-up an efficient inline of convertColor which works with 16 elements at a time, but I could easily write a super-fast version by hand.
My point about the separate API is, any function that works on a
single element would need a compliment of functions that work on 'n'
elements, where 'n' is some context-specific number of elements that
suits that particular workload.
Now, that's conceivable, and it's even possible to make the magic meta
that calls these functions work out there is a batch overload and call
it if it can, but we're miles away from std.algorithm and common
ranges now.
The other issue is that every such efficient batch version would need
to be hand-written, and that sucks because there are too many
permutations.
|
October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Sunday, 9 October 2016 at 13:18:22 UTC, Manu wrote: > On 9 October 2016 at 15:34, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote: >> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote: >>> >>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote: >>>> >>>> [...] >>> >>> >>> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate. >> >> >> Could you please give an example what type of operation should be vectorized? > > Let's consider a super simple blend: > dest = src.rgb * src.a + dest.rgb * (1-src.alpha); This code do not need transposition. And can be partially vectorised using mir.ndslice.algorithm. To perform full vectorization image should be regrouped in memory channels. For example many computer vision algorithms work better with each channel regrouped in memory channels. Relevant issue for this type of optmizations: https://github.com/libmir/mir/issues/343 Please comment on this issue and provide a set of functions you would like to be vectorised. --Ilya |
October 09, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nicholas Wilson | On 9 October 2016 at 18:39, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Sunday, 9 October 2016 at 05:34:06 UTC, Ilya Yaroshenko wrote:
>>
>> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote:
>>>
>>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
>>>>
>>>> [...]
>>>
>>>
>>> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate.
>>
>>
>> Could you please give an example what type of operation should be vectorized?
>
>
> anything that is able to be. Given that ElementType!InBatchesOfN are a
> static array
> of ElementType!(R), the compiler can* (assuming no branching and anything
> else that impedes vectorisation) combine most expressions into equivalent
> vector instruction.
> This approach might not work so well for colours as is but should work if we
> "transpose" the colour i.e. rgbargbargbargba -> rrrrggggbbbbaaaa and then
> transpose it back.
>
> *I know this is the sufficiently intelligent compiler argument
I've been waiting for that compiler for almost 2 decades. I've shipped
17 commercial games in that time while waiting, and I had to resort to
manual code that didn't yet have an opportunity to leverage such a
sufficiently intelligent compilers awesome optimiser.
I've never seen an auto-vectoriser go ANYWHERE NEAR the sort of
intelligence required here. I've seen it start to do some good work
with arrays of floats... that's about where it ends.
Arrays of RGBA_10_10_10_2 require unpacking, and then shuffling. When
you stuff the unpack in the way, that will almost always throw an
auto-vectoriser off the scent. I'm also waiting to see saturation
expressions written in C code promote to saturating SIMD arithmetic.
Basically every operation is saturating when working with colours.
|
October 10, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ilya Yaroshenko | On 9 October 2016 at 23:36, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On Sunday, 9 October 2016 at 13:18:22 UTC, Manu wrote: >> >> On 9 October 2016 at 15:34, Ilya Yaroshenko via Digitalmars-d <digitalmars-d@puremagic.com> wrote: >>> >>> On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote: >>>> >>>> >>>> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote: >>>>> >>>>> >>>>> [...] >>>> >>>> >>>> >>>> Well the trouble is the lambda that you might give to 'map' won't work anymore. Operators don't work on batches, you need to use a completely different API, and I think that's unfortunate. >>> >>> >>> >>> Could you please give an example what type of operation should be vectorized? >> >> >> Let's consider a super simple blend: >> dest = src.rgb * src.a + dest.rgb * (1-src.alpha); > > > This code do not need transposition. And can be partially vectorised using mir.ndslice.algorithm. How so? Sorry, let me write the actual work in full: RGBA8[] src, dest; // allocate somehow zip(src, dest).map!((e) { ubyte r = cast(ubyte)clamp(((cast(int)e[0].r * e[0].a * 0x1011) >> 20) + ((cast(int)e[1].r * (255 - e[0].a) * 0x1011) >> 20), 0, 255); ubyte g = cast(ubyte)clamp(((cast(int)e[0].g * e[0].a * 0x1011) >> 20) + ((cast(int)e[1].g * (255 - e[0].a) * 0x1011) >> 20), 0, 255); ubyte b = cast(ubyte)clamp(((cast(int)e[0].b * e[0].a * 0x1011) >> 20) + ((cast(int)e[1].b * (255 - e[0].a) * 0x1011) >> 20), 0, 255); return RGBA8(r, g, b, e[0].a); }).copy(dest); If you can coerce the auto-vectoriser to do the right thing with that, I'll be shocked. > To perform full vectorization image should be regrouped in memory channels. For example many computer vision algorithms work better with each channel regrouped in memory channels. Yes, but many (most) applications just want to call a function, and don't want to rearrange their data. Most people aren't researchers writing high-tech computer vision software ;) Also, most images in realtime software are textures, which are usually compressed in some way, and don't generally work well split into channels; that would be multiple textures (for each plane), and multiple samples per texel (sample for each plane). Image processing definitely needs to work without having the working dataset pre-arranged into planes. > Relevant issue for this type of optmizations: https://github.com/libmir/mir/issues/343 > > Please comment on this issue and provide a set of functions you would like to be vectorised. --Ilya I'll take a look. |
October 10, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Manu | On Thursday, 6 October 2016 at 14:53:52 UTC, Manu wrote:
> I've done another pass incorporating prior feedback, mostly focusing on documentation.
>
> http://dtest.thecybershadow.net/artifact/website-b6e2e44dd40dd7c70eb45829c02060b99ae3937b-57272ccdf902fa3f0c050d522129f2be/web/library-prerelease/std/experimental/color.html
>
> Can interested parties please give it another once-over and add
> further comments?
> How can I get this to a point where people would like to see it in phobos?
>
> Repo: https://github.com/TurkeyMan/color
> PR: https://github.com/dlang/phobos/pull/2845
Nice work!
colorFromString should be colorFromRGBString :)
|
October 10, 2016 Re: color lib | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrea Fontana | On 10 October 2016 at 17:29, Andrea Fontana via Digitalmars-d <digitalmars-d@puremagic.com> wrote:
> On Thursday, 6 October 2016 at 14:53:52 UTC, Manu wrote:
>>
>> I've done another pass incorporating prior feedback, mostly focusing on documentation.
>>
>>
>> http://dtest.thecybershadow.net/artifact/website-b6e2e44dd40dd7c70eb45829c02060b99ae3937b-57272ccdf902fa3f0c050d522129f2be/web/library-prerelease/std/experimental/color.html
>>
>> Can interested parties please give it another once-over and add
>> further comments?
>> How can I get this to a point where people would like to see it in phobos?
>>
>> Repo: https://github.com/TurkeyMan/color
>> PR: https://github.com/dlang/phobos/pull/2845
>
>
> Nice work!
>
> colorFromString should be colorFromRGBString :)
Nar. It parses any form of colour-in-a-string.
|
Copyright © 1999-2021 by the D Language Foundation