November 16, 2009
Bartosz Milewski Wrote:

> I read Andrei's chapter on arrays and there's one thing that concerns me.

I tried to voice the same concern earlier, but my voice is not loud yet. ;)

> When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.

I don't see it being any different than C++'s std::vector invalidating iterators when the vector is expanded. That has the same "non-determinism", but we've been living with it.

The reason we've been living happily with it is, because it's been spelled out: "may invalidate all iterators."

In D2's slices, what we're missing is the correct language to describe the semantics. As long as we set the semantics right, there will be no problem. (Earlier, I proposed "discretionary sharing semantics" for D2's slice semantics.)

The first code examples that are used to describe slices better say "slice" on the left hand side:

  int[] slice = new int[2];  // GOOD for the semantics
  int[] array = new int[2];  // BAD for the semantics

Ali

November 16, 2009
Walter Bright wrote:
> It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.

On the same platform, with the same compiler, compiler settings, and standard library implementation.  That makes it harder to test, not easier.  You now have to test with multiple compilers.

> It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.

The current behavior is unsafe in that you can accidentally have two variables pointing at the same buffer.  Let's say one buffer holds network input and the other holds some bytecode to execute.  Boom - a bug that can be exploited to execute malicious (byte-)code.


-- 
Rainer Deyke - rainerd@eldwood.com
November 16, 2009
Ali Cehreli wrote:
> Bartosz Milewski Wrote:
> 
>> I read Andrei's chapter on arrays and there's one thing that concerns me.
> 
> I tried to voice the same concern earlier, but my voice is not loud yet. ;)
> 
>> When a slice is extended, the decision to re-allocate, and therefore to cut
>> its connection to other slices, is non-deterministic.
> 
> I don't see it being any different than C++'s std::vector invalidating iterators when the vector is expanded. That has the same "non-determinism", but we've been living with it.

It is different because invalid STL iterators could do anything.


Andrei
November 16, 2009
Rainer Deyke wrote:
> Walter Bright wrote:
>> It's deterministic in the sense that if you run the program again with
>> the same inputs, you will get the same result. This is a highly useful
>> attribute for testing and debugging.
> On the same platform, with the same compiler, compiler settings, and
> standard library implementation.  That makes it harder to test, not
> easier.  You now have to test with multiple compilers.

That is still determinate. Indeterminate means you get different results if your run it again on the same machine.


>> It's safe as in memory safe. This is as opposed to undefined-behavior,
>> which is not memory safe. A buffer overflow is an example of
>> undefined-behavior.
> 
> The current behavior is unsafe in that you can accidentally have two
> variables pointing at the same buffer.  Let's say one buffer holds
> network input and the other holds some bytecode to execute.  Boom - a
> bug that can be exploited to execute malicious (byte-)code.

It is not two random arrays randomly sharing data.
November 16, 2009
== Quote from Bartosz Milewski (bartosz-nospam@relisoft.com)'s article
> I read Andrei's chapter on arrays and there's one thing that concerns me. When a
slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic. How does that influence program testing and can it be exploited to attack a buggy system?
> Here's a piece of code I came up with--it's not realistic but it describes a
certain class of scenarios:
> void f(byte[] code)
> {
>     doSomeChecks(code);
>     execute(code);
> }
> void doSomeChecks(byte[] a)
> {
> >   // to avoid special-casing empty arrays
>   a.length += 1; // extension
> >   // get byte b (or a whole bunch) from somewhere
>    a[0] = b; // write
> >   // check what happens when a[0] = b
> }
> Of course this code is incorrect because doSomeChecks assumes that it holds a
unique array, and the caller failed to make a copy before sending it to doSomeChecks. Bad bad programmer!
> However imagine this scenario: Under testing conditions, the array extension
just happens to always cause reallocation, so the write is NOT visible in the calling routine. All tests pass with flying colors. But since the re-allocation is, by language definition, non-deterministic, it might happen that, after the code has been deployed, the re-allocation stopped happening, say, on Sundays. If a hacker learns about that, he may be able to inject malicious code or data into the program.
> In this case the non-determinism built into the language helped camouflage a
serious bug. Should we, or shouldn't we, worry about it in practice?

The one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation?  How else could you **efficiently** implement dynamic arrays?
November 16, 2009
"Walter Bright" <newshound1@digitalmars.com> wrote in message news:hdr6mm$1bf0$1@digitalmars.com...
> Rainer Deyke wrote:
>> Walter Bright wrote:
>>> It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.
>> On the same platform, with the same compiler, compiler settings, and standard library implementation.  That makes it harder to test, not easier.  You now have to test with multiple compilers.
>
> That is still determinate. Indeterminate means you get different results if your run it again on the same machine.
>

Even if it is technically determinate if you run it on the same machine with the same inputs, that still does nothing to address Bartosz's claim that it's a potential security hole - Apps don't always get run on the same machine with the same inputs.


November 16, 2009
Nick Sabalausky wrote:
> Even if it is technically determinate if you run it on the same machine with the same inputs, that still does nothing to address Bartosz's claim that it's a potential security hole - Apps don't always get run on the same machine with the same inputs.

It's not a security hole in any more serious manner than any other routine programming bug would be. Very few ordinary programming bugs are exploitable.

A buffer overflow, however, is much more of a security hole because they are nearly always exploitable, because it allows arbitrary user data to be executed. This is not the case with the array resizing issue.

That's why I drew a distinction between undefined-behavior and implementation-defined behavior. The former is a couple more orders of magnitude more serious.
November 17, 2009
Hello Walter,

> Nick Sabalausky wrote:
> 
>> Deterministic? Only in the same sense that "resize or realloc upon
>> appending" is deterministic.
>> 
> It's deterministic in the sense that if you run the program again with
> the same inputs, you will get the same result. This is a highly useful
> attribute for testing and debugging.
> 
> A non-deterministic problem will give you a different result every
> time you run it. Threading problems are an example of this, as are any
> dependencies on uninitialized data.
> 
> This particular issue is implementation-defined.
> 

would you agree that it is not something the programer can predict in advance?


November 17, 2009
BCS wrote:
> would you agree that it is not something the programer can predict in advance?

He can, but it is not reasonable to expect him to. But it's still deterministic.
November 17, 2009
Walter Bright Wrote:

> Denis Koroskin wrote:
> > It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).
> 
> The LRU is thread local.

It will then prevent a moving GC from being possible in D.

Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.