Thread overview
Dynamic Arrays Capacity
Jun 02, 2022
Salih Dincer
Jun 02, 2022
Mike Parker
Jun 02, 2022
Mike Parker
Jun 02, 2022
Mike Parker
Jun 03, 2022
Salih Dincer
Jun 03, 2022
bauss
Jun 03, 2022
Adam D Ruppe
Jun 03, 2022
bauss
June 02, 2022

Hi,

Do I misunderstand? A dynamic array is allocated memory according to the nextpow2() algorithm(-1 lapse); strings, on the other hand, don't behave like this...

  string str = "0123456789ABCDEF";
  char[] chr = str.dup;

  assert(str.length == 16);
  assert(str.capacity == 0);

  import std.math: thus = nextPow2; //.algebraic

  assert(chr.capacity == thus(str.length) - 1);
  assert(chr.capacity == 31);

Also, .ptr keeps the address of the most recent first element, right?

  write("str[0]@", &str[0]);
  writeln(" == @", str.ptr);

  write("chr[0]@", &chr[0]);
  writeln(" == @", chr.ptr);

Print Out: (No Errors)

>

str[0]@5607593901E0 == @5607593901E0
chr[0]@7F9430982000 == @7F9430982000

SDB@79

June 02, 2022

On Thursday, 2 June 2022 at 05:04:03 UTC, Salih Dincer wrote:

>

Hi,

Do I misunderstand? A dynamic array is allocated memory according to the nextpow2() algorithm(-1 lapse); strings, on the other hand, don't behave like this...

  string str = "0123456789ABCDEF";
  char[] chr = str.dup;

  assert(str.length == 16);
  assert(str.capacity == 0);

  import std.math: thus = nextPow2; //.algebraic

  assert(chr.capacity == thus(str.length) - 1);
  assert(chr.capacity == 31);

You've initialized str with a string literal. No memory is allocated for these from the GC. They're stored in the binary, meaning they're loaded into memory from disk by the OS. So str.ptr points to a static memory location that's a fixed size, hence no extra capacity.

chr is allocated from the GC using whatever algorithm is implemented in the runtime. That it happens to be any given algorithm is an implementation detail that could change in any release.

>

Also, `.ptr` keeps the address of the most recent first element, right?

More specifically, it points to the starting address of the allocated block of memory.

June 02, 2022

On Thursday, 2 June 2022 at 08:14:40 UTC, Mike Parker wrote:

>

More specifically, it points to the starting address of the allocated block of memory.

I posted too soon.

Given an instance ts of type T[], array accesses essentially are this:

ts[0] == *(ts.ptr + 0);
ts[1] == *(ts.ptr + 1);
ts[2] == *(ts.ptr + 2);

Since the size of T is known, each addition to the pointer adds N * T.sizeof bytes. If you converted it to a ubyte array, you'd need to handle that yourself.

And so, &ts[0] is the same as &(*ts.ptr + 0), or simply ts.ptr.

June 02, 2022

On Thursday, 2 June 2022 at 08:24:51 UTC, Mike Parker wrote:

>

And so, &ts[0] is the same as &(*ts.ptr + 0), or simply ts.ptr.

That should be the same as &(*(ts.ptr + 0))!

June 02, 2022

On 6/2/22 1:04 AM, Salih Dincer wrote:

>

Hi,

Do I misunderstand? A dynamic array is allocated memory according to the nextpow2() algorithm(-1 lapse); strings, on the other hand, don't behave like this...

   string str = "0123456789ABCDEF";
   char[] chr = str.dup;

   assert(str.length == 16);
   assert(str.capacity == 0);

   import std.math: thus = nextPow2; //.algebraic

   assert(chr.capacity == thus(str.length) - 1);
   assert(chr.capacity == 31);

The capacity is how many elements of the array can be stored without reallocating when appending.

Why 0 for the string literal? Because it's not from the GC, and so has no capacity for appending (note that a capacity of 0 is returned even though the string currently has 16 characters in it).

Why 31 for the GC-allocated array? Because implementation details. But I can give you the details:

  1. The GC allocates in powers of 2 (mostly) The smallest block is 16 bytes, and the next size up is 32 bytes.
  2. In order to remember which parts of the block are used, it needs to allocate some space to record that value. For a 16-byte block, that requires 1 byte. So it can't fit your 16-byte string + 1 byte for the capacity tracker into a 16 byte block, it has to go into a 32 byte block. And of course, 1 byte of that 32 byte block is for the capacity tracker, hence capacity 31.
>

Also, .ptr keeps the address of the most recent first element, right?

This statement suggests to me that you have an incorrect perception of a string. A string is a pointer paired with a length of how many characters after that pointer are valid. That's it. str.ptr is the pointer to the first element of the string.

There isn't a notion of "most recent first element".

-Steve

June 03, 2022

On Thursday, 2 June 2022 at 08:14:40 UTC, Mike Parker wrote:

>

You've initialized str with a string literal. No memory is allocated for these from the GC. They're stored in the binary, meaning they're loaded into memory from disk by the OS. So str.ptr points to a static memory location that's a fixed size, hence no extra capacity.

I didn't know that, so maybe this example proves it; the following test code that Ali has started and I have developed:

import std.range;
import std.stdio;

/* toggle array:

alias chr = char*;
auto data = [' '];/*/

alias chr = immutable(char*);
auto data = " ";//*/

void main()
{
  chr[] ptrs;
  data.fill(3, ptrs);

  writeln;
  foreach(ref ptr; ptrs)
  {
    " 0x".write(ptr);
  }
} /* Print Out:

 0:           0 1:          15 2:          31 3:          47
 0x55B07E227020 0x7F2391F9F000 0x7F2391FA0000 0x7F2391FA1000
//*/


void fill(R)(ref R mostRecent,
             int limit,
             ref chr[] ptrs)
{
  auto ptr = mostRecent.ptr;
  size_t capacity, depth;

  while (depth <= limit)
  {
    mostRecent ~= ElementType!R.init;

    if(ptr != mostRecent.ptr)
    {
      ptrs ~= ptr;
      depth.writef!"%2s: %11s"(capacity);
      depth++;
    }

    if (mostRecent.capacity != capacity)
    {
      ptr = mostRecent.ptr;
      capacity = mostRecent.capacity;
    }
  }
}

As for the result I got from this code: The array configured in the heap is copied to another memory region as soon as its capacity changes (0x5...20 >> 0x7...00). We get the same result in array. Just add the / character to the beginning of the 4th line to try it.

Thank you all very much for the replies; all of these open my mind.

SDB@79

June 03, 2022

On Thursday, 2 June 2022 at 20:12:30 UTC, Steven Schveighoffer wrote:

>

This statement suggests to me that you have an incorrect perception of a string. A string is a pointer paired with a length of how many characters after that pointer are valid. That's it. str.ptr is the pointer to the first element of the string.

There isn't a notion of "most recent first element".

-Steve

This isn't correct either, at least with unicode, since 1 byte isn't equal to 1 character and a character can be several bytes.

I believe it's only true in unicode for utf-32 since all characters do fit in the 4 byte space they have, but for utf-8 and utf-16 the characters will not be the same size of bytes.

June 03, 2022
On Friday, 3 June 2022 at 12:49:07 UTC, bauss wrote:
> I believe it's only true in unicode for utf-32 since all characters do fit in the 4 byte space they have

Depends how you define "character".
June 03, 2022
On Friday, 3 June 2022 at 12:52:30 UTC, Adam D Ruppe wrote:
> On Friday, 3 June 2022 at 12:49:07 UTC, bauss wrote:
>> I believe it's only true in unicode for utf-32 since all characters do fit in the 4 byte space they have
>
> Depends how you define "character".

I guess that's true as well, unicode really made it impossible to just say "this string has so many characters because it has this many bytes."