Need for (C++20) Contiguous Range (page 3)

On 10/8/2020 10:49 AM, Andrei Alexandrescu wrote: > It occured to me we could define that traite, but we got away without it by implicitly decreeing that anything contiguous is just supposed to be a T[]. We didn't "get away" with anything, we just did it right the first time :-)

On Thursday, 8 October 2020 at 16:42:51 UTC, Steven Schveighoffer wrote: > It should just be: > > enum isDynamicArray(T) = is(T == U[], U); So if I create my owner container that has an opCast overload to a slice of the underlying buffer, would isDynamicArray be true for my container? If not you need a new test.

On Friday, 9 October 2020 at 09:52:12 UTC, Ola Fosheim Grøstad wrote: > On Thursday, 8 October 2020 at 16:42:51 UTC, Steven Schveighoffer wrote: >> It should just be: >> >> enum isDynamicArray(T) = is(T == U[], U); > > So if I create my owner container that has an opCast overload to a slice of the underlying buffer, would isDynamicArray be true for my container? > > If not you need a new test. I think Steven's idea is not that the container itself is a contiguous range, but that slicing it would return a contiguous range, which itself is a D slice. I.e. you need to implement opSlice() (or opIndex()) with no parameters, which should return a slice of the container's contents. For example, std.array.Appender implements it: https://dlang.org/phobos/std_array#.Appender.opSlice However std.container.array doesn't and instead returns a custom Range struct. https://dlang.org/phobos/std_container_array#.Array.opSlice On the other hand, std.container.array has opSliceAssign: https://dlang.org/phobos/std_container_array#.Array.opSliceAssign which covers part of the need for C++'s contiguous range concept. Another interesting thing to note is that mir-algorithm (which implements multi-dimensional slices) has specifically the concept of ContiguousSlices: http://mir-algorithm.libmir.org/mir_ndslice_slice.html#SliceKind http://mir-algorithm.libmir.org/mir_ndslice_topology.html#.assumeContiguous http://mir-algorithm.libmir.org/mir_ndslice_traits.html http://mir-algorithm.libmir.org/mir_ndslice_slice.html#Slice (check the "Internal Binary Representation" section)

On Friday, 9 October 2020 at 13:05:44 UTC, Petar Kirov [ZombineDev] wrote: > I think Steven's idea is not that the container itself is a contiguous range, but that slicing it would return a contiguous range, which itself is a D slice. I.e. you need to implement opSlice() (or opIndex()) with no parameters, which should return a slice of the container's contents. Alright, I guess I would want to access the underlying buffer regardless of whether the container is indexable or not... Although, an optimization protocol probably should let you query whether elements can be out of order. And possibly also test whether an element is present or not. Then you could allow direct access to the buffer of a simple hash table or priority queue etc. Hm. Interesting topic.

On 10/9/20 5:52 AM, Ola Fosheim Grøstad wrote: > On Thursday, 8 October 2020 at 16:42:51 UTC, Steven Schveighoffer wrote: >> It should just be: >> >> enum isDynamicArray(T) = is(T == U[], U); > > So if I create my owner container that has an opCast overload to a slice of the underlying buffer, would isDynamicArray be true for my container? No. > > If not you need a new test. > No, just use a cast to an array (if that's what you implemented). We are talking about a range that is also contiguous in memory. The only logical reason to require this is to get a ptr and a length and use the CPU specialized instructions for these things. We have that type pattern, it's a slice. C++ didn't have that formally defined, so they needed to add something. When I developed iopipe, I made the declaration that the buffer window shall be a random-access range. I thought of the future, where people might find new cool ways to use buffers that weren't arrays. And when I got to thinking about the performance of such buffers (like, let's say a ring buffer that has 2 slices over the same underlying buffer, so you never have to copy), the cost of *indexing* is going to overwhelm the benefit of not having to copy. I even implemented a ringbuffer using memory map tricks, and it's not any better than a straight array. So even though I still define the buffer window as having to be a random access range, all current buffer types are arrays (that may change if I ever get around to implementing an inter-thread iopipe). Contiguous memory has the properties that allow the CPU to shine, and the way to say "I have contiguous memory" in D is: use a slice. -Steve

On Friday, 9 October 2020 at 13:31:18 UTC, Steven Schveighoffer wrote: > We are talking about a range that is also contiguous in memory. The only logical reason to require this is to get a ptr and a length and use the CPU specialized instructions for these things. We have that type pattern, it's a slice. There are ranges with contiguous memory that aren't slices - stride and occasionally map over a contiguous range comes to mind. Converting these to slices may not be possible, and certainly not efficient. -- Simen

October 09, 2020

Re: Need for (C++20) Contiguous Range

Posted by Ola Fosheim Grøstad
in reply to Steven Schveighoffer

Permalink

Ola Fosheim Grøstad

Posted in reply to Steven Schveighoffer

Permalink

On Friday, 9 October 2020 at 13:31:18 UTC, Steven Schveighoffer wrote:
> We are talking about a range that is also contiguous in memory. The only logical reason to require this is to get a ptr and a length and use the CPU specialized instructions for these things. We have that type pattern, it's a slice. C++ didn't have that formally defined, so they needed to add something.

Yes, C++ has data(), but that does not imply that you provide an array interface. Actually, I'm not even sure it implies that the elements are in order... but it probably does now that they require random-access properties.

Either way, you want a SIMD aligned slice. I guess if one had some kind of interface where you fetchUnaligned first and from there fetchAligned, and the finish with fetchUnaligned could work.

A better solution is to have a bitmask that tells you whether the element is active or not. Then you just provide an aligned pointer to the range-algorithm. But this isn't going to work for safe code.

Modern SIMD on x86 has this built in.

This also means that you can continue to use SIMD operations after filter.

So when you "pop" slices you might get a bit mask sequence like this

00001111 11111111 11100000

(I show 8 elements per slice here, probably should be configured to 2, 4, 8 16, 32, 64, 128, 256)

Ok, so then you have a filter than will return every other element, then you get the same slices, but a different mask pattern:

00001010 10101010 10100000

Pretty neat actually.

This would also work with randomly populated buffers (like a hash).

> And when I got to thinking about the performance of such buffers (like, let's say a ring buffer that has 2 slices over the same underlying buffer, so you never have to copy), the cost of *indexing* is going to overwhelm the benefit of not having to copy. I even implemented a ringbuffer using memory map tricks, and it's not any better than a straight array.

I've literally spent weeks implementing various (some threadsafe) ringbuffers. It can be surprisingly challenging when you start to deal with slices and "transactional-like" features. Simple concept, but it can be a challenge to get up to speed and also being sure that it is correct.

There are some neat tricks one can use when implementing ringbuffers that are 2^n depending on what operations you desire. One important question is whether you always require one element to be unused or not, that tiny detail can have a major impact on the implementation :-D.

Anyway, when implementing a ring buffer, only include the operations you actually need. The more generic it is, the more overhead you'll get. Sadly. It is mind-blowing how many different implementation strategies there are. (for such a trivial concept)

It is nearly impossible to create a best-of-breed library implementation for that reason. Very sensitive to the use context.

But this is getting off-topic...

> Contiguous memory has the properties that allow the CPU to shine, and the way to say "I have contiguous memory" in D is: use a slice.

But that kinda implies that the buffer is indexable and in order.

On Friday, 9 October 2020 at 13:55:19 UTC, Ola Fosheim Grøstad wrote: > Ok, so then you have a filter than will return every other element, then you get the same slices, but a different mask pattern: > > 00001010 10101010 10100000 I meant if you want to add a filter that returns every other element then you just do a bitwise and with 10101010 on the mask. You can do the same with conditionals. E.g. you can get the signs of all the elements in the SIMD register as a bitmask.

On Friday, 9 October 2020 at 13:38:28 UTC, Simen Kjærås wrote: > There are ranges with contiguous memory that aren't slices - stride and occasionally map over a contiguous range comes to mind. Converting these to slices may not be possible, and certainly not efficient. I guess it in some cases would be technically possible to emulate fixed stride with a recast to a slice of a struct with padding before and after the element you want. But not very clean. Microsoft's experimental gsl::span implementation did support strides, but I don't think that is possible with the std::span that ended up in the standard. (C++ span is similar to D slices)

Forums