Jump to page: 1 2
Thread overview
static array internal & dangling reference
Nov 12, 2016
Picaud Vincent
Nov 12, 2016
cym13
Nov 12, 2016
Picaud Vincent
Nov 12, 2016
Mike Parker
Nov 12, 2016
Picaud Vincent
Nov 12, 2016
Andrew
Nov 12, 2016
Mike Parker
Nov 12, 2016
Mike Parker
Nov 13, 2016
cym13
Nov 14, 2016
Picaud Vincent
Nov 14, 2016
Jonathan M Davis
Nov 14, 2016
Picaud Vincent
Nov 14, 2016
Jonathan M Davis
Nov 14, 2016
Picaud Vincent
November 12, 2016
Hi all,
Still learning... This time what surprised me is how static arrays work.
I assume (is it true?) that for efficiency reason static size arrays like int[10] are on the stack and do not involve dynamic memory allocation:

First surprise: it is possible to share a static array:

void main() {

  int[10] sa;
  foreach(int i, ref sa_i;sa){
    sa_i=i;
  }
  writeln("\n vect init sa",sa);

  int[] sb=sa[3..6];       // <- did not think it was possible
  sb[2]=100;

  writeln("\n vect sa ",sa);
  writeln("\n vect sb ",sb);
}

which prints:

 vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

 vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are really shared

 vect sb [3, 4, 100]

Second surprise: I tried to create a dangling reference with:

int[] f()
{
  int[10] sa;
  foreach(int i, ref sa_i;sa){
    sa_i=i;
  }
  int[] sb=sa;
  return sb;
}

void f_test()
{
  auto sb=f();
  sb[2]=100; // I expected an "invalid access" (it points to "sa", a local variable)
}

However calling f_test seems ok and valgrind tool does not complain...

So my questions are:
0/ does I miss something? (in C++ for sure you create a dangling pointer)
1/ what is the internal mechanism for that? Is the GC involved? Performance impact?
2/ any link describing this in more details is welcome

Thank you

November 12, 2016
On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent wrote:
> Hi all,
> Still learning... This time what surprised me is how static arrays work.
> I assume (is it true?) that for efficiency reason static size arrays like int[10] are on the stack and do not involve dynamic memory allocation:

Yes, they're on the stack. They're just like a C-array: a plain region with items in sequence.

> First surprise: it is possible to share a static array:
>
> void main() {
>
>   int[10] sa;
>   foreach(int i, ref sa_i;sa){
>     sa_i=i;
>   }
>   writeln("\n vect init sa",sa);
>
>   int[] sb=sa[3..6];       // <- did not think it was possible
>   sb[2]=100;
>
>   writeln("\n vect sa ",sa);
>   writeln("\n vect sb ",sb);
> }

Note that sb is a slice: it's just a pointer and a length. A pointer to what? To the third element of sa because that's the beginning of the slice.

> which prints:
>
>  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
>  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are really shared

Well, sure, sb is a pointer to part of sa so you're accessing sa by reference when making use of sb (well, with an offset that is).

>  vect sb [3, 4, 100]
>
> Second surprise: I tried to create a dangling reference with:
>
> int[] f()
> {
>   int[10] sa;
>   foreach(int i, ref sa_i;sa){
>     sa_i=i;
>   }
>   int[] sb=sa;
>   return sb;
> }
>
> void f_test()
> {
>   auto sb=f();
>   sb[2]=100; // I expected an "invalid access" (it points to "sa", a local variable)
> }
>
> However calling f_test seems ok and valgrind tool does not complain...

Printing the address of sa[2] and sb[2] in f() as well as the address of sb[2] in f_test() we see that they're all the same. So yes you are using a dangling reference. It only gives you the good result because you haven't overwritten that part of the stack yet. Demonstration:

void f_test() {
    auto sb=f();
    sb[2] = 100;
    writeln(sb[2]); // prints 100
    int test[100];
    writeln(sb[2]); // prints 0
}

I don't know why valgrind isn't panicking.

> So my questions are:
> 0/ does I miss something? (in C++ for sure you create a dangling pointer)
> 1/ what is the internal mechanism for that? Is the GC involved? Performance impact?
> 2/ any link describing this in more details is welcome
>
> Thank you


November 12, 2016
On Saturday, 12 November 2016 at 10:33:05 UTC, Picaud Vincent wrote:
> Hi all,
> Still learning... This time what surprised me is how static arrays work.
> I assume (is it true?) that for efficiency reason static size arrays like int[10] are on the stack and do not involve dynamic memory allocation:

Yes, they are on the stack. That's what the 'static' in 'static array' implies. The same is true in C and C++ (and other programming languages that distinguish between static and dynamic arrays). It's why their length is fixed and known at compile time.

>
> First surprise: it is possible to share a static array:
>
> void main() {
>
>   int[10] sa;
>   foreach(int i, ref sa_i;sa){
>     sa_i=i;
>   }
>   writeln("\n vect init sa",sa);
>
>   int[] sb=sa[3..6];       // <- did not think it was possible
>   sb[2]=100;
>
>   writeln("\n vect sa ",sa);
>   writeln("\n vect sb ",sb);
> }
>
> which prints:
>
>  vect init sa[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
>  vect sa [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]  // <- sa and sb are really shared

sb is a slice of sa. A slice *always* shares the memory block backing the original array until the slice is reallocated, which on a fresh slice of an existing array is pretty much always going to happen as soon as you append to it.

>
> Second surprise: I tried to create a dangling reference with:
>
> int[] f()
> {
>   int[10] sa;
>   foreach(int i, ref sa_i;sa){
>     sa_i=i;
>   }
>   int[] sb=sa;
>   return sb;
> }
>
> void f_test()
> {
>   auto sb=f();
>   sb[2]=100; // I expected an "invalid access" (it points to "sa", a local variable)
> }
>
> However calling f_test seems ok and valgrind tool does not complain...
>
> So my questions are:
> 0/ does I miss something? (in C++ for sure you create a dangling pointer)
> 1/ what is the internal mechanism for that? Is the GC involved? Performance impact?
> 2/ any link describing this in more details is welcome
>

You *have* created a dangling pointer. It's just that for such a simple little program, the part of the stack where the original array was allocated isn't stomped at the point where you access it after the function call. The same sort of program will also work in C, where it's not uncommon for functions to return string literals that are immediately printed to stdout or a log file.

One difference from C in this case is that it's still possible to make things work even after the stack has been stomped and the memory is no longer valid simply by increasing the length of the returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause an allocation and sb will then own the memory block to which it points.

Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this:

int[] f()
{
  int[10] sa;
  foreach(int i, ref sa_i;sa){
    sa_i=i;
  }
  return sa;
}

You will now get an error about an escaping reference.

If you haven't read it already, I suggest taking a look at Steven's array article [1]. I also answer these questions in Chapter 2 of Learning D [2], and I'm pretty sure Ali talks about it somewhere in 'Programming in D' [3].

[1] https://dlang.org/d-array-article.html
[2] https://www.packtpub.com/application-development/learning-d
[3] http://ddili.org/ders/d.en/index.html
November 12, 2016
Thank you for your answer cym13.

I reproduced your result for:

On Saturday, 12 November 2016 at 10:45:23 UTC, cym13 wrote:
> void f_test() {
>     auto sb=f();
>     sb[2] = 100;
>     writeln(sb[2]); // prints 100
>     int test[100];
>     writeln(sb[2]); // prints 0
> }

now I am convinced of the invalid access, and this time valgrind was not happy, insulting me with:

==13545== Conditional jump or move depends on uninitialised value(s)
==13545==    at 0x10A6BA: _D3std4conv55__T11toTextRangeTiTS3std5stdio4File17LockingTextWriterZ11toTextRangeFNfiS3std5stdio4File17LockingTextWriterZv (indexType.d:5123)
==13545==    by 0x10ABED: _D3std5stdio4File14__T5writeTiTaZ5writeMFNfiaZv (stdio.d:1424)
==13545==    by 0x10A08C: _D3std5stdio14__T7writelnTiZ7writelnFNfiZv (stdio.d:3211)
....

However for my first example, I have checked again and

void f_test()
{
  auto sb=f();
  sb[2]=100;
}

Valgrind is silent... I thought it was because of a compiler optimization that removed the unusued "sb[2]=100" statement, but even with:


void f_test()
{
  auto sb=f();
  sb[2]=100;
  writeln(sb[2]);
}

which prints 100, Valgrind still do not complain... ?!?

I will try to investigate this and give a feedback if I find an explanation.

Vincent
November 12, 2016
On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:

Thank you very much for your clarifications & explanations, I am reassured to see that things work like in C.

I will also look the links you provided, thank you again for your time.

Vincent



November 12, 2016
On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
>>[...]
>
> You *have* created a dangling pointer. It's just that for such a simple little program, the part of the stack where the original array was allocated isn't stomped at the point where you access it after the function call. The same sort of program will also work in C, where it's not uncommon for functions to return string literals that are immediately printed to stdout or a log file.
>
> One difference from C in this case is that it's still possible to make things work even after the stack has been stomped and the memory is no longer valid simply by increasing the length of the returned slice: sb ~= 0, or perhaps sb.length+=n. This will cause an allocation and sb will then own the memory block to which it points.
>
> Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this:
>
> int[] f()
> {
>   int[10] sa;
>   foreach(int i, ref sa_i;sa){
>     sa_i=i;
>   }
>   return sa;
> }
>

I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.

To make it safe, should I be using:

auto sb = f().dup;

Thanks

Andrew
November 12, 2016
On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:

>>
>> Bear in mind that static arrays are implicitly sliced when passed to or returned from a function anywhere a dynamic array is expected. If you modify f() to look like this:
>>
>> int[] f()
>> {
>>   int[10] sa;
>>   foreach(int i, ref sa_i;sa){
>>     sa_i=i;
>>   }
>>   return sa;
>> }
>>
>
> I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.

If the return type is declared as int[10], then yes, returning sa should copy all 10 elements in the return. I just verified what you said and also found that both have the same address. However, if you stomp the stack and then access any value from the array, you'll find that it's valid. When returning int[], this is not the case. This looks to me like an optimization by the compiler to avoid the copy, but I know nothing of the implementation details.

And for clarity, I think we should be careful with the term 'by value', as dynamic arrays are passed around by value, not by reference, just without the members being copied around. Keep in mind that a dynamic array is a length and a pointer. Those are not passed by reference, but by value. So that this:

void func1(int[] foo) { foo ~= 10; }

And this:

void func2(int[] foo) { foo ~= 10; }

Are not the same. func1 appends to the local slice, meaning it is no longer connected to the original that was passed in. func2 appends to the original array.

>
> To make it safe, should I be using:
>
> auto sb = f().dup;

Yes, if you intend to keep the contents around indefinitely, this is the proper way to do it. My point in my previous post was just demonstrating that sb remains valid as long as its stack memory has not yet been overwritten and, as long as it is made use of immediately, nothing is going to break.

November 12, 2016
On Saturday, 12 November 2016 at 13:11:02 UTC, Mike Parker wrote:

> void func2(int[] foo) { foo ~= 10; }

Sorry, this should be (ref int[] foo)
November 13, 2016
On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:
> On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
>>[...]
>
> I thought that if you changed the function signature to int[10] f() to return a static array, this should be returned by value, and so should be safe to use, but it seems that this too points to the same memory.

This is likely because your examples are too simple and the local variables of each stack frame happen to be at the same place with the same value. There's nothing really conclusive here.

> To make it safe, should I be using:
>
> auto sb = f().dup;
>
> Thanks
>
> Andrew

November 13, 2016
On 11/13/16 12:52 PM, cym13 wrote:
> On Saturday, 12 November 2016 at 12:16:02 UTC, Andrew wrote:
>> On Saturday, 12 November 2016 at 11:03:31 UTC, Mike Parker wrote:
>>> [...]
>>
>> I thought that if you changed the function signature to int[10] f() to
>> return a static array, this should be returned by value, and so should
>> be safe to use, but it seems that this too points to the same memory.
>
> This is likely because your examples are too simple and the local
> variables of each stack frame happen to be at the same place with the
> same value. There's nothing really conclusive here.

No, it's some sort of compiler optimization.

Note that he is declaring an int[10] inside the function and then returning it. The compiler must see that the int[10] will be returned, and so it reuses the pre-allocated buffer for returning as the same address to avoid copying.

I would guess it's probably fine, with no dangling reference.

-Steve

« First   ‹ Prev
1 2