H. S. Teoh
| On Fri, Dec 16, 2022 at 05:39:08PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...]
> If you really want to see what could possibly have gone wrong, try this version of the code:
[...]
> The results will likely differ depending on your OS and specific environment; but on my Linux machine, it outputs a bunch of garbage (instead of the expected numbers and "hello" "world!" strings) and crashes.
[...]
In case you're wondering, here's a brief explanation of why the above code triggers a problem:
When your program is running, the CPU has FIFO (first-in, first-out) queue that it uses as scratch space for computations, called the runtime stack. Function arguments are typically passed by having the calling function push the values on the stack, and having the called function retrieve these values from the stack. In addition to function arguments, the CPU also stores various other information on the stack, such as the return address to jump to once the called function returns, and potentially other stuff, depending on the specific OS and CPU. Furthermore, the called function itself also reserves some space on the stack for storing local variables. Together, this information is called a "stack frame".
When you call badCodeBad(), the arguments [ 1, 2, 3, 4, 5 ] are
allocated on the stack and passed to foo(). foo() then stores a slice
to these arguments, i.e., a slice of the stack locations that currently
contain [ 1, 2, 3, 4, 5 ]. Then foo() returns to badCodeBad(), and
badCodeBad() returns to main. The stack frame that contains the [ 1, 2,
3, 4, 5 ] is now no longer in scope. However, it may not necessarily
have been overwritten with new data yet.
Then main() calls whatwentwrong(). This involves creating a new stack
frame for whatwentwrong(), pushing the return address on the stack, and
so on. At this point, whatwentwrong()'s stack frame overwrites the
original stack frame where badCodeBad() stored the [ 1, 2, 3, 4, 5 ].
The array elements are now overwritten with other data that aren't
supposed to be interpreted as integers. That's why when whatwentwrong()
tries to print the contents of numberForLaterUse, which now points to an
area on the stack that has just been overwritten by whatwentwrong()'s
stack frame, you get garbage output.
A similar thing happens when you call alsoReallyBad(). It allocates the
string array [ "hello", "world!" ] on the stack, and S.foo() wrongly
stores a slice to that location on the stack. When alsoReallyBad()
returns, the stack frame that contains this array goes out of scope
(though not necessarily overwritten just yet). When main() then calls
whatelsewentwrong(), that involves passing the instance of S as
argument, and also creating a new stack frame for alsoReallyBad(). All
of this new data overwrites the original stack frame, stomping all over
the [ "hello", "world!" ] array and overwriting it with stuff that isn't
supposed to be interpreted as a string array.
When whatelsewentwrong() then tries to print the contents of s.namesForLaterUse, the slice points to the location on the stack that now contains data that no longer contains the string array; writeln tries to interpret this as a string array, which results in garbage being printed. Since a string is also an array, consisting of a pointer and a length, interpreting random data as a string causes writeln to read a random amount of data from a random location in memory. On my system, it just so happens part of range of memory locations is outside the range mapped by the OS to the program; this causes an invalid memory access that made the OS forcefully terminate the program.
//
The underlying cause of these problems is exactly what Ali said in his book: foo() and S.foo() tried to store a slice to a stack location past its lifetime. Once the stack frame went out of scope, all bets are off as to what the slice now points to. It could have been overwritten by other data that can no longer be interpreted as an int[] or string[]. In this case, it caused the program to print random garbage and crash. In more complicated scenarios, such a bug in the code can become a hole for a hacker to exploit.
Consider, for example, if the code tried to do some arithmetic on the int[] that it saved as numbersForLaterUse. Since the location that used to contain the int[] now contains a function stack frame, part of it could potentially contain a return address to main(). The hacker could exploit this by manipulating the program's input such that the arithmetic on the int[] overwrites this return address to point to something else, such as an OS call to format your hard drive. Then when the function finishes what it's doing and tries to return, instead of returning to main() it jumps to the function that formats your hard drive.
The takeaway from all this is:
(1) It's Bad(tm) to store a slice to a stack location past its lifetime.
(2) Use @safe when possible so that the compiler will tell you when you're doing something wrong and potentially dangerous.
T
--
A computer doesn't mind if its programs are put to purposes that don't match their names. -- D. Knuth
|