Thread overview
How is this code invalid?
Dec 17, 2022
thebluepandabear
Dec 17, 2022
ag0aep6g
Dec 17, 2022
H. S. Teoh
Dec 17, 2022
H. S. Teoh
Dec 17, 2022
thebluepandabear
Dec 17, 2022
H. S. Teoh
Dec 17, 2022
Ali Çehreli
December 17, 2022

I am reading the fantastic book about D by Ali Çehreli, and he gives the following example when he talks about variadic functions:

int[] numbersForLaterUse;

void foo(int[] numbers...) {
   numbersForLaterUse = numbers;
}

struct S {
  string[] namesForLaterUse;

  void foo(string[] names...) {
     namesForLaterUse = names;
  }
}

He says that the code above is a bug because:

"Both the free-standing function foo() and the member function S.foo() are in
error because they store slices to automatically-generated temporary arrays that
live on the program stack. Those arrays are valid only during the execution of the
variadic functions."

The thing is, when I run the code I get absolutely no error, so how is this exactly a 'bug' if the code runs properly? That's what I am confused about. What is the D compiler doing behind the scenes?

December 17, 2022

On Saturday, 17 December 2022 at 00:23:32 UTC, thebluepandabear wrote:

>
int[] numbersForLaterUse;

void foo(int[] numbers...) {
   numbersForLaterUse = numbers;
}

struct S {
  string[] namesForLaterUse;

  void foo(string[] names...) {
     namesForLaterUse = names;
  }
}

[...]

>

The thing is, when I run the code I get absolutely no error, so how is this exactly a 'bug' if the code runs properly? That's what I am confused about. What is the D compiler doing behind the scenes?

You're witnessing the wonders of undefined behavior. Invalid code can still produce the results you're hoping for, or it can produce garbage results, or it can crash, or it can do something else entirely. And just because running it once does one thing, does not mean that the next run will do the same.

For your particular code, here is an example where numberForLaterUse end up not being what we pass in:

int[] numbersForLaterUse;

void foo(int[] numbers...) {
   numbersForLaterUse = numbers; /* No! Don't! Bad programmer! Bad! */
}

void bar()
{
    int[3] n = [1, 2, 3];
    foo(n);
}

void main()
{
    bar();
    import std.stdio;
    writeln(numbersForLaterUse); /* prints garbage */
}

But again nothing at all is actually guaranteed about what that program does. It exhibits undefined behavior. So it could just as well print "[1, 2, 3]", making you think that everything is fine.

December 16, 2022
On Sat, Dec 17, 2022 at 12:23:32AM +0000, thebluepandabear via Digitalmars-d-learn wrote: [...]
> ```D
> int[] numbersForLaterUse;
> 
> void foo(int[] numbers...) {
>    numbersForLaterUse = numbers;
> }
> 
> struct S {
>   string[] namesForLaterUse;
> 
>   void foo(string[] names...) {
>      namesForLaterUse = names;
>   }
> }
> ```
[...]
> The thing is, when I run the code I get absolutely no error, so how is this exactly a 'bug' if the code runs properly? That's what I am confused about.  What is the D compiler doing behind the scenes?

Try labelling the above functions with @safe and see what the compiler says.

If you really want to see what could possibly have gone wrong, try this version of the code:

------------------------------snip-----------------------------------
int[] numbersForLaterUse;

void foo(int[] numbers...) {
   numbersForLaterUse = numbers;
}

struct S {
  string[] namesForLaterUse;

  void foo(string[] names...) {
     namesForLaterUse = names;
  }
}

void whatwentwrong() {
	import std.stdio;
	writeln(numbersForLaterUse);
}

void whatelsewentwrong(S s) {
	import std.stdio;
	writeln(s.namesForLaterUse);
}

void badCodeBad() {
  foo(1, 2, 3, 4, 5);
}

S alsoReallyBad() {
  S s;
  s.foo("hello", "world!");
  return s;
}

void main() {
  badCodeBad();
  whatwentwrong();

  auto s = alsoReallyBad();
  whatelsewentwrong(s);
}
------------------------------snip-----------------------------------

The results will likely differ depending on your OS and specific environment; but on my Linux machine, it outputs a bunch of garbage (instead of the expected numbers and "hello" "world!" strings) and crashes.


T

-- 
If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...
December 16, 2022
On Fri, Dec 16, 2022 at 05:39:08PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> If you really want to see what could possibly have gone wrong, try this version of the code:
[...]
> The results will likely differ depending on your OS and specific environment; but on my Linux machine, it outputs a bunch of garbage (instead of the expected numbers and "hello" "world!" strings) and crashes.
[...]

In case you're wondering, here's a brief explanation of why the above code triggers a problem:

When your program is running, the CPU has FIFO (first-in, first-out) queue that it uses as scratch space for computations, called the runtime stack.  Function arguments are typically passed by having the calling function push the values on the stack, and having the called function retrieve these values from the stack. In addition to function arguments, the CPU also stores various other information on the stack, such as the return address to jump to once the called function returns, and potentially other stuff, depending on the specific OS and CPU. Furthermore, the called function itself also reserves some space on the stack for storing local variables.  Together, this information is called a "stack frame".

When you call badCodeBad(), the arguments [ 1, 2, 3, 4, 5 ] are
allocated on the stack and passed to foo().  foo() then stores a slice
to these arguments,  i.e., a slice of the stack locations that currently
contain [ 1, 2, 3, 4, 5 ].  Then foo() returns to badCodeBad(), and
badCodeBad() returns to main.  The stack frame that contains the [ 1, 2,
3, 4, 5 ] is now no longer in scope.  However, it may not necessarily
have been overwritten with new data yet.

Then main() calls whatwentwrong(). This involves creating a new stack
frame for whatwentwrong(), pushing the return address on the stack, and
so on.  At this point, whatwentwrong()'s stack frame overwrites the
original stack frame where badCodeBad() stored the [ 1, 2, 3, 4, 5 ].
The array elements are now overwritten with other data that aren't
supposed to be interpreted as integers.  That's why when whatwentwrong()
tries to print the contents of numberForLaterUse, which now points to an
area on the stack that has just been overwritten by whatwentwrong()'s
stack frame, you get garbage output.

A similar thing happens when you call alsoReallyBad(). It allocates the
string array [ "hello", "world!" ] on the stack, and S.foo() wrongly
stores a slice to that location on the stack.  When alsoReallyBad()
returns, the stack frame that contains this array goes out of scope
(though not necessarily overwritten just yet).  When main() then calls
whatelsewentwrong(), that involves passing the instance of S as
argument, and also creating a new stack frame for alsoReallyBad(). All
of this new data overwrites the original stack frame, stomping all over
the [ "hello", "world!" ] array and overwriting it with stuff that isn't
supposed to be interpreted as a string array.

When whatelsewentwrong() then tries to print the contents of s.namesForLaterUse, the slice points to the location on the stack that now contains data that no longer contains the string array; writeln tries to interpret this as a string array, which results in garbage being printed.  Since a string is also an array, consisting of a pointer and a length, interpreting random data as a string causes writeln to read a random amount of data from a random location in memory. On my system, it just so happens part of range of memory locations is outside the range mapped by the OS to the program; this causes an invalid memory access that made the OS forcefully terminate the program.

//

The underlying cause of these problems is exactly what Ali said in his book: foo() and S.foo() tried to store a slice to a stack location past its lifetime.  Once the stack frame went out of scope, all bets are off as to what the slice now points to.  It could have been overwritten by other data that can no longer be interpreted as an int[] or string[]. In this case, it caused the program to print random garbage and crash. In more complicated scenarios, such a bug in the code can become a hole for a hacker to exploit.

Consider, for example, if the code tried to do some arithmetic on the int[] that it saved as numbersForLaterUse. Since the location that used to contain the int[] now contains a function stack frame, part of it could potentially contain a return address to main(). The hacker could exploit this by manipulating the program's input such that the arithmetic on the int[] overwrites this return address to point to something else, such as an OS call to format your hard drive.  Then when the function finishes what it's doing and tries to return, instead of returning to main() it jumps to the function that formats your hard drive.

The takeaway from all this is:

(1) It's Bad(tm) to store a slice to a stack location past its lifetime.

(2) Use @safe when possible so that the compiler will tell you when you're doing something wrong and potentially dangerous.


T

-- 
A computer doesn't mind if its programs are put to purposes that don't match their names. -- D. Knuth
December 17, 2022
>
> T

Thanks, I've tried to mark it with `@safe` and it did give me a warning.

I was also wondering, why is this code valid?

```D
int[] numbersForLaterUse;

@safe void foo(int[] numbers) {
	numbersForLaterUse = numbers;
}
```

December 16, 2022
On 12/16/22 18:20, H. S. Teoh wrote:

> scratch space for computations, called the runtime
> stack.

I called it "function call stack" where I gave a very simplistic view of it here:

  https://www.youtube.com/watch?v=NWIU5wn1F1I&t=236s

> (2) Use @safe when possible so that the compiler will tell you when
> you're doing something wrong and potentially dangerous.

Unfortunately, @safe is not as prominent in the book as it should be. Part of the reason is I think its implementation is not complete especially how it changes with -dip1000.

Ali

December 17, 2022
On Sat, Dec 17, 2022 at 02:36:10AM +0000, thebluepandabear via Digitalmars-d-learn wrote: [...]
> Thanks, I've tried to mark it with `@safe` and it did give me a warning.
> 
> I was also wondering, why is this code valid?
> 
> ```D
> int[] numbersForLaterUse;
> 
> @safe void foo(int[] numbers) {
> 	numbersForLaterUse = numbers;
> }
> ```

This code is safe provided the arguments are not allocated on the stack, which is usually the case because you can no longer call it with:

	foo(1, 2, 3, 4);

but you have to write:

	foo([ 1, 2, 3, 4 ]);

The [] here will allocate a new array on the heap, so the array elements will not go out of scope when the caller returns. (They will be collected by the GC after all references to them have gone out of scope. This is one of the advantages of using a GC: it saves you from having to worry about complicated lifetimes in such cases.)

You may still run into trouble, though, if you do this:

	int[3] data = [ 1, 2, 3 ]; // N.B.: stack-allocated
	foo(data[]);	// uh oh

To guard against this, use @safe and -dip1000, which will cause the compiler to detect this dangerous usage and generate an error.


T

-- 
Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?