Jump to page: 1 2
Thread overview
A couple of questions about arrays and slices
Jun 21, 2023
Cecil Ward
Jun 21, 2023
Jonathan M Davis
Jun 21, 2023
Cecil Ward
Jun 21, 2023
Cecil Ward
Jun 21, 2023
H. S. Teoh
Jun 22, 2023
Cecil Ward
Jun 22, 2023
Paul Backus
Jun 22, 2023
Jonathan M Davis
Jun 22, 2023
Cecil Ward
Jun 24, 2023
Cecil Ward
Jun 24, 2023
Jonathan M Davis
Jun 24, 2023
Cecil Ward
Jun 24, 2023
Cecil Ward
Jun 24, 2023
Jonathan M Davis
Jun 24, 2023
Cecil Ward
Jun 24, 2023
Jonathan M Davis
Jun 24, 2023
Cecil Ward
Jun 24, 2023
Cecil Ward
Jun 22, 2023
Cecil Ward
Jun 24, 2023
Ali Çehreli
June 21, 2023
First is an easy one:

1.) I have a large array and a sub-slice which I want to set up to be pointing into a sub-range of it. What do I write if I know the start and end indices ? Concerned about an off-by-one error, I have start_index and past_end_index (exclusive).

2.) I have a dynamic array and I wish to preinitialise its alloc cell to be a certain large size so that I don’t need to reallocate often initially. I tell myself that I can set the .length property. Is that true?

2a.) And what happens when the cell is extended, is the remainder zero-filled or remaining full of garbage, or is the size of the alloc cell something separate from the dynamic array’s knowledge of the number of valid elements in it ?
June 20, 2023
On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
> First is an easy one:
>
> 1.) I have a large array and a sub-slice which I want to set up to be pointing into a sub-range of it. What do I write if I know the start and end indices ? Concerned about an off-by-one error, I have start_index and past_end_index (exclusive).

When slicing, the end index is exclusive. e.g.

    auto a = "0123456789abcdef";
    assert(a[3 .. 8] == "34567");

As a consequence of that, the end index minus the start index is the length of the resulting slice. It also means that $ always refers to one past the end of the array.

> 2.) I have a dynamic array and I wish to preinitialise its alloc cell to be a certain large size so that I don’t need to reallocate often initially. I tell myself that I can set the .length property. Is that true?

Well, dynamic arrays have a length and a capacity. The length is the actual length of the slice that you're operating on. e.g.

auto a = new int[](100);
assert(a.length == 100);

or

int[] a;
a.length = 100;
assert(a.length == 100);

However, the underlying memory that the dynamic array is a slice of will generally be larger than the length of the array - in part because of how the allocator works and in part so that appending to the array will be more efficient. So, you can see how much space the dynamic array has to grow before it has to be reallocated when appending to it by using the capacity property. E.G. When I run this on my system

auto a = new int[](100);
writeln(a.length);
writeln(a.capacity);

I get

100
127

This means that I could append 27 more elements to the array (or otherwise increase its length by up to 27 elements) without needing a reallocation. But if the array tries to grow beyond that, then the GC will allocate a new block of memory, copy the elements over to that, and adjust the slice to point to the new block of memory (of course leaving any other slices pointing to the old block of memory).

One thing to note about that is that a dynamic array can only have a capacity that's larger than its length if it's the furthest slice into that block of memory, and no other slices have gone further into it (since it's only safe for the runtime to allow you to append to a dynamic array in-place when no other dynamic array can possibly refer to the memory after the array that you're trying to append to). If the dynamic array is not at the end, then it ends up with a capacity of 0. E.G.

auto a = new int[](100);
auto b = a[0 .. $ - 1];
assert(b.length == 99);
assert(b.capacity == 0);

So, capacity is a bit more complex than the max length that the array could grow to without needing a reallocation, but you can use it see if appending to a dynamic array will result in a reallocation.

Another part of this that you of course need to remember is that dynamic arrays can refer to memory that is not GC-allocated for dynamic arrays (be it stack-allocated or malloc-allocated or even GC-allocated for a different purpose), in which case, the capacity will be 0, because the underlying memory block is not a GC-allocated memory block for dynamic arrays, and appending anything to the array then has to result in a reallocation so that it then does point to a GC-allocated block of memory for dynamic arrays.

If you want to create a dynamic array of a specific length while also ensuring that that there is at least a specific amount of memory in the underlying memory block for it to grow into, then you need the reserve function. It allows you to tell the GC how much memory you would like to be allocated underneath the hood. E.G.

int[] a;
a.reserve(500);
assert(a.length == 0);
writeln(a.capacity);

prints 511 on my system.

https://dlang.org/spec/arrays.html#capacity-reserve

All that being said, if you're trying to minimize reallocations when appending to a dynamic array, you should consider using https://dlang.org/phobos/std_array.html#Appender since it has some other tricks to make it so that it queries the GC less and speeds the whole process up. But the basic idea of reserving capacity is the same, and when you're done appending, you get a normal dynamic array out of the deal.

> 2a.) And what happens when the cell is extended, is the remainder zero-filled or remaining full of garbage, or is the size of the alloc cell something separate from the dynamic array’s knowledge of the number of valid elements in it ?

A dynamic array is basically this:

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

It's just a slice of memory and has no clue whatsoever about what it points to. All of the logic for that comes from the druntime functions that you call to operate on dynamic arrays (e.g. ~= or capacity). It could be a slice of a static array, GC-allocated memory, malloced memory, etc. And the GC doesn't do anything to keep track of all of the dynamic arrays. They're basically just simple structs sitting on the stack or inside of the memory of whatever data structure they're a part of. To know what to free, the GC just scans them along with everything else when it does a collection.

Now, if the memory that the dynamic array points to is a block of GC-allocated memory allocated for dynamic arrays (as it would be if you used new or assigned a value to its length property), then there are some minor bookkeeping items at the end of that memory block IIRC. In particular, the GC needs to know the furthest into that block of memory that any dynamic array has referred to so that it can know whether it's safe to append to a dynamic array referring to that block of memory. But all of that is implementation-specific stuff that you don't really need to worry about.

Now as for garbage, no @safe code will ever have garbage in any memory that it can access (barring @trusted being misused anyway). Whenever you explicitly increase the length of a dynamic array rather than appending to it, the new elements are all initialized with the element type's init value - which is why you can't have dynamic arrays of structs that @disable their init value.

If what you're worried about is the memory in the memory block that the dynamic array is a slice of which is beyond the current length of the array, then IIRC, it's just zeroed out no matter what the type is, but I'd have to go digging in druntime to be sure. Either way, you can't access any of that memory in @safe code (it would require using pointers with pointer arithmetic rather than slices to access it).

Actually, a quick test should show us quite easily, and on my system

int[] a;
a.reserve(10);
writeln(*a.ptr);
writeln(*(a.ptr + 1));
writeln(*(a.ptr + 2));
writeln("----");
a ~= 47;
writeln(*a.ptr);
writeln(*(a.ptr + 1));
writeln(*(a.ptr + 2));

printed

655142960
8
609419264
----
47
8
609419264

on one run, and

682106928
8
634187776
----
47
8
634187776

on another. So, the memory isn't being zeroed out, and it looks like it's probably garbage, though the fact that the second spot is 8 in both cases implies that the GC is doing something rather than it all just being garbage. Regardless, you don't really have to worry about how valid that memory is, since you can't access it in @safe code, and if it _does_ ever get initialized with anything, it'll just be bit-blitted.

And of course, if your main concern here is appending speed, then you should probably be using Appender rather than directly appending to a dynamic array.

Hopefully all of that answers your questions or at least helps clarify things.

- Jonathan M Davis




June 21, 2023
On Wed, Jun 21, 2023 at 02:09:26AM +0000, Cecil Ward via Digitalmars-d-learn wrote:
> First is an easy one:
> 
> 1.) I have a large array and a sub-slice which I want to set up to be pointing into a sub-range of it. What do I write if I know the start and end indices ? Concerned about an off-by-one error, I have start_index and past_end_index (exclusive).

array[start_idx .. one_past_end_idx]


> 2.) I have a dynamic array and I wish to preinitialise its alloc cell to be a certain large size so that I don’t need to reallocate often initially. I tell myself that I can set the .length property. Is that true?

You can use `array.reserve(preinitSize);`.


> 2a.) And what happens when the cell is extended, is the remainder zero-filled or remaining full of garbage, or is the size of the alloc cell something separate from the dynamic array’s knowledge of the number of valid elements in it ?

The size of the allocated cell is managed by druntime. On the user code side, all you know is the slice (start pointer + length).  The allocated region outside the current array length is not initialized.  Assigning to array.length initializes the area that the array occupies after the length has been extended.


T

-- 
Ph.D. = Permanent head Damage
June 21, 2023
On Wednesday, 21 June 2023 at 04:52:06 UTC, Jonathan M Davis wrote:
> On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> [...]
>
> When slicing, the end index is exclusive. e.g.
>
> [...]

Thankyou to both posters for your exceptionally helpful and generous replies, which will serve as a help to others if I made the title searchable. I need to look at the appended and try to find some (more) example code.
June 21, 2023
On Wednesday, 21 June 2023 at 04:52:06 UTC, Jonathan M Davis wrote:
> On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via Digitalmars-d-learn wrote:
>> [...]
>
> When slicing, the end index is exclusive. e.g.
>
> [...]

Actually concerning the garbage, I was rather hoping that it _would_ be garbage so as to avoid the cost of a zero-fill in time and also possibly in cache pollution, unless it uses the non-temporal cache-friendliness management tech that is found on eg x86.
June 22, 2023
On Wednesday, 21 June 2023 at 15:48:56 UTC, H. S. Teoh wrote:
> On Wed, Jun 21, 2023 at 02:09:26AM +0000, Cecil Ward via Digitalmars-d-learn wrote:
>> First is an easy one:
>> 
>> 1.) I have a large array and a sub-slice which I want to set up to be pointing into a sub-range of it. What do I write if I know the start and end indices ? Concerned about an off-by-one error, I have start_index and past_end_index (exclusive).
>
> array[start_idx .. one_past_end_idx]
>
>
>> 2.) I have a dynamic array and I wish to preinitialise its alloc cell to be a certain large size so that I don’t need to reallocate often initially. I tell myself that I can set the .length property. Is that true?
>
> You can use `array.reserve(preinitSize);`.
>
>
>> 2a.) And what happens when the cell is extended, is the remainder zero-filled or remaining full of garbage, or is the size of the alloc cell something separate from the dynamic array’s knowledge of the number of valid elements in it ?
>
> The size of the allocated cell is managed by druntime. On the user code side, all you know is the slice (start pointer + length).  The allocated region outside the current array length is not initialized.  Assigning to array.length initializes the area that the array occupies after the length has been extended.
>
>
> T

Is .reserve()’s argument scaled by the entry size after it is supplied, that is it is quoted in elements or is it in bytes? I’m not sure whether the runtime has a knowledge of the element type so maybe it doesn’t know anything about scale factors, not sure.
June 22, 2023
On Thursday, 22 June 2023 at 00:10:19 UTC, Cecil Ward wrote:
> Is .reserve()’s argument scaled by the entry size after it is supplied, that is it is quoted in elements or is it in bytes? I’m not sure whether the runtime has a knowledge of the element type so maybe it doesn’t know anything about scale factors, not sure.

length, reserve, and capacity all use the same unit, which is elements.

reserve passes a TypeInfo instance to the runtime so that it knows the size of the elements. You can see the implementation here:

https://github.com/dlang/dmd/blob/v2.104.0/druntime/src/object.d#L3910
June 21, 2023
On Wednesday, June 21, 2023 7:05:28 PM MDT Paul Backus via Digitalmars-d-learn wrote:
> On Thursday, 22 June 2023 at 00:10:19 UTC, Cecil Ward wrote:
> > Is .reserve()’s argument scaled by the entry size after it is supplied, that is it is quoted in elements or is it in bytes? I’m not sure whether the runtime has a knowledge of the element type so maybe it doesn’t know anything about scale factors, not sure.
>
> length, reserve, and capacity all use the same unit, which is elements.
>
> reserve passes a TypeInfo instance to the runtime so that it knows the size of the elements. You can see the implementation here:
>
> https://github.com/dlang/dmd/blob/v2.104.0/druntime/src/object.d#L3910

To add to that, it _has_ to know the element type, because aside from anything related to a type's size, it bit-blits the type's init value onto the new elements when it increases the length of the dynamic array.

You'd probably be dealing with bytes if you were explicitly asking for memory and the like (e.g. with malloc), but a dynamic array is properly typed, and everything you do with it in @safe code is going to deal with it as properly typed. For it to be otherwise would require @system casts.

- Jonathan M Davis




June 22, 2023
On Thursday, 22 June 2023 at 01:05:28 UTC, Paul Backus wrote:
> On Thursday, 22 June 2023 at 00:10:19 UTC, Cecil Ward wrote:
>> Is .reserve()’s argument scaled by the entry size after it is supplied, that is it is quoted in elements or is it in bytes? I’m not sure whether the runtime has a knowledge of the element type so maybe it doesn’t know anything about scale factors, not sure.
>
> length, reserve, and capacity all use the same unit, which is elements.
>
> reserve passes a TypeInfo instance to the runtime so that it knows the size of the elements. You can see the implementation here:
>
> https://github.com/dlang/dmd/blob/v2.104.0/druntime/src/object.d#L3910

Many thanks Paul, I should have found that!
June 22, 2023
On Thursday, 22 June 2023 at 01:44:22 UTC, Jonathan M Davis wrote:
> On Wednesday, June 21, 2023 7:05:28 PM MDT Paul Backus via Digitalmars-d-learn wrote:
>> [...]
>
> To add to that, it _has_ to know the element type, because aside from anything related to a type's size, it bit-blits the type's init value onto the new elements when it increases the length of the dynamic array.
>
> You'd probably be dealing with bytes if you were explicitly asking for memory and the like (e.g. with malloc), but a dynamic array is properly typed, and everything you do with it in @safe code is going to deal with it as properly typed. For it to be otherwise would require @system casts.
>
> - Jonathan M Davis

Thankyou Jonathan!
« First   ‹ Prev
1 2