| Thread overview | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
February 23, 2008 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
I'm still trying to upgrade my code base from 2.008 and it's getting closer
with 2.011. However, there are some points left which make the porting quite
annoying, unfortunately. On of those points are structs with invariant members.
Maybe I'm missing something - but I don't get what's the reasoning behind the
following error?
-----------------
struct FOO {
invariant int x;
int y;
}
int main()
{
FOO bar;
bar.y = 2;
return 0;
}
-----------------
gives the error:
test4.d(9): variable test4.main.bar cannot modify struct with immutable members
But on a related note on invariant - I think I have mentioned this already at
some point - but just in case. If 'invariant' should be theoretically safe in
the future, invariant values may not be allowed on the stack (the stack is never
invariant) - maybe something like "scope invariant" can be permitted - or some
heuristic which puts such values on the heap if a reference is takeen out of
scope (similar to closures). The same goes for invariant arrays and opCatAssign:
-----------------
import std.stdio;
int main()
{
invariant(int)[] arr = [1, 2, 3];
invariant(int)* someref = &arr[2];
arr.length = 2;
arr ~= 4;
writefln("array: %s - %s", arr, *someref);
return 0;
}
-----------------
prints "array: [1 2 4] - 4"
So currently any references to invariant data can be changed legally with
this stack principle. (Invariant arrays could be fixed by always reallocating on
opCatAssign)
| ||||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | On 23/02/2008, Sönke Ludwig <ludwig@informatik_dot_uni-luebeck.de> wrote: > Maybe I'm missing something - but I don't get what's the reasoning behind the > following error? > > ----------------- > struct FOO { > invariant int x; > int y; > } > > int main() > { > FOO bar; > bar.y = 2; > return 0; > } > ----------------- > gives the error: > test4.d(9): variable test4.main.bar cannot modify struct with immutable members Looks like a bug to me. > But on a related note on invariant - I think I have mentioned this already at > some point - but just in case. If 'invariant' should be theoretically safe in > the future, invariant values may not be allowed on the stack (the stack is never > invariant) That's not quite correct. In a declaration like: invariant int x = whatever; x will be invariant, and on the stack, and this is perfectly reasonable and safe because it's a local variable, and so won't exist when the function goes out of scope. If you're going to say "the stack is never invariant" because it gets reused, then you might as well say "the heap is never invariant", because it, too, gets reused. It's all about understanding the lifetime of an object. > int main() > { > invariant(int)[] arr = [1, 2, 3]; > invariant(int)* someref = &arr[2]; > arr.length = 2; > arr ~= 4; > writefln("array: %s - %s", arr, *someref); > return 0; > } > ----------------- > > prints "array: [1 2 4] - 4" > > So currently any references to invariant data can be changed legally with > this stack principle. This has nothing to do with the stack. The same bug would manifest if arr were on the heap. It certainly does look like a bug though. > (Invariant arrays could be fixed by always reallocating on > opCatAssign) No, that would be too inefficient. Appending to the end of a buffer is too common an operation to demand a reallocation every time. It is /much/ more sensible to allow the compiler to know the "reserved" size of an array, and only reallocate if the "reserved" area becomes full. I think the same problem will arise if you say arr = arr[0..2]; instead of arr.length = 2; I'm fairly sure this happens because arr.ptr points to the first byte of a resizable buffer. I'm fairly sure this is because ~= (opCatAssign) is making the /assumption/ that the array is unique. As has been mentioned before, there is no way to express uniqueness in D. The problem also exists in non-invariant arrays too. Both mutable and const arrays suffer from the same effect, whenever the arrays are non-unique. Perhaps D could partially work around the problem by banning ~= on all but mutable arrays (although as noted, that still wouldn't solve it completely), and then code which wants to append to a buffer would have to declare the buffer mutable, but could still use assumeUnique() to return it. But I'm not sure I like that idea. I think it just needs to be extremely well documented: /Do not use ~= on a non-unique array/. | |||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote: > Looks like a bug to me. I'd think so, too. > That's not quite correct. In a declaration like: > > invariant int x = whatever; > > x will be invariant, and on the stack, and this is perfectly > reasonable and safe because it's a local variable, and so won't exist > when the function goes out of scope. If you're going to say "the stack > is never invariant" because it gets reused, then you might as well say > "the heap is never invariant", because it, too, gets reused. It's all > about understanding the lifetime of an object. > That's right, of course, that this is a problem of lifetime. The difference is that objects on the heap are garbage collected and their lifetime is always long enough not to disturb invariant references (as long as delete is not used explicitly). The reason why I think that maybe there should be a difference between invariant and non-invariant stack values is that when you take a reference to a non-invariant value, you are not guaranteed to get the same data when you dereference it later anyways. Invariant, however, makes that claim. So ideally, whenever you get hold of a reference/pointer to invariant data, you should be able to store it and assume that it will never change. Is change in semantics (work on the compiler) is worth it in this case? I don't know. I can live with it this way, too - but nevertheless, it would be a similar generalization like with full-closures (and might be nice even for non-invariant data, or maybe it's overkill). > > This has nothing to do with the stack. The same bug would manifest if > arr were on the heap. > > It certainly does look like a bug though. > Just wanted to say that the array is used here like a stack. > >> (Invariant arrays could be fixed by always reallocating on >> opCatAssign) > > No, that would be too inefficient. Appending to the end of a buffer is > too common an operation to demand a reallocation every time. It is > /much/ more sensible to allow the compiler to know the "reserved" size > of an array, and only reallocate if the "reserved" area becomes full. > Right, I didn't think of incremental string building.. but maybe "always reallocate on length = X;" would be an OK solution? > I think the same problem will arise if you say > > arr = arr[0..2]; > > instead of > > arr.length = 2; > > I'm fairly sure this happens because arr.ptr points to the first byte > of a resizable buffer. I'm fairly sure this is because ~= > (opCatAssign) is making the /assumption/ that the array is unique. As > has been mentioned before, there is no way to express uniqueness in D. > > The problem also exists in non-invariant arrays too. Both mutable and > const arrays suffer from the same effect, whenever the arrays are > non-unique. ... but it's particularly bad when the invariant promise is broken at the same time. > > Perhaps D could partially work around the problem by banning ~= on all > but mutable arrays (although as noted, that still wouldn't solve it > completely), and then code which wants to append to a buffer would > have to declare the buffer mutable, but could still use assumeUnique() > to return it. But I'm not sure I like that idea. > > I think it just needs to be extremely well documented: /Do not use ~= > on a non-unique array/. > I think that if it's fixed, it has to be done transparently or it my be difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not. | |||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | On 23/02/2008, Sönke Ludwig <ludwig@informatik_dot_uni-luebeck.de> wrote: > I think that if it's fixed, it has to be done transparently or it my be > difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not. You know, the more I think about it, the more I come to the conclusion that you're right, and that x ~= y; should behave /exactly/ like x = x ~ y; and cause a reallocation every time. Incremental string building is the usual reason given why this isn't the case, but I think there's another way round that problem. Think about what happens at runtime when x ~= y; is executed. First, the compiler has to decide whether or not x is resizable (that is, does it point to the start of a resizable block?). Well - how does it know that? Answer: it has to ask the garbage collector. OK, so how does the garbage collector know that? Well, presumably, the gc has to have some sort of map or table, so it can look up the address x, and if it's in the table, then the answer is yes. But that lookup operation has to be slower that O(1). No matter how you look at it, there has to be at least O(log N) CPU time being consumed there, where N is the number of gc allocated blocks. It seems to me that a better approach would be to ditch that strategy altogether, let ~= have predictable behavior even when the array is non-unique, and use a /different/ strategy for incremental array building. I would suggest something as simple as: struct Buffer!(T) { private T[] array; private uint reserved; void opCatAssign() {...} T[] toArray() {...} const(T)[] toCArray() {...} invariant(T)[] toIArray() {...} /*etc*/ } would do the job. Then instead of doing void buildString() { string r; r ~= something; r ~= somethingElse; return r; } you would do void buildString() { Buffer!(char) r; r ~= something; r ~= somethingElse; return r.toIArray; } toIArray() would have to call either assumeUnique or idup under the hood. We could argue about which was the most sensible choice. But either way, you saved a lot of CPU time because you no longer have to decide at runtime whether or not this is a resizable buffer - that's now something you know at compile time. And even when you do know that, finding the reserved size is O(1), not O(log N). It's gotta be a winner all round. | |||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
On 23/02/2008, Janice Caron <caron800@googlemail.com> wrote:
> void buildString()
Except of course I meant
string buildString()
Ho hum.
| ||||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote:
> On 23/02/2008, Sönke Ludwig <ludwig@informatik_dot_uni-luebeck.de> wrote:
>> I think that if it's fixed, it has to be done transparently or it my be
>> difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.
>
> You know, the more I think about it, the more I come to the conclusion
> that you're right, and that
>
> x ~= y;
>
> should behave /exactly/ like
>
> x = x ~ y;
>
> and cause a reallocation every time. Incremental string building is
> the usual reason given why this isn't the case, but I think there's
> another way round that problem.
>
> Think about what happens at runtime when x ~= y; is executed. First,
> the compiler has to decide whether or not x is resizable (that is,
> does it point to the start of a resizable block?). Well - how does it
> know that? Answer: it has to ask the garbage collector. OK, so how
> does the garbage collector know that? Well, presumably, the gc has to
> have some sort of map or table, so it can look up the address x, and
> if it's in the table, then the answer is yes. But that lookup
> operation has to be slower that O(1). No matter how you look at it,
> there has to be at least O(log N) CPU time being consumed there, where
> N is the number of gc allocated blocks.
>
> It seems to me that a better approach would be to ditch that strategy
> altogether, let ~= have predictable behavior even when the array is
> non-unique, and use a /different/ strategy for incremental array
> building. I would suggest something as simple as:
>
> struct Buffer!(T)
> {
> private T[] array;
> private uint reserved;
> void opCatAssign() {...}
> T[] toArray() {...}
> const(T)[] toCArray() {...}
> invariant(T)[] toIArray() {...}
> /*etc*/
> }
>
> would do the job. Then instead of doing
>
> void buildString()
> {
> string r;
> r ~= something;
> r ~= somethingElse;
> return r;
> }
>
> you would do
>
> void buildString()
> {
> Buffer!(char) r;
> r ~= something;
> r ~= somethingElse;
> return r.toIArray;
> }
>
> toIArray() would have to call either assumeUnique or idup under the
> hood. We could argue about which was the most sensible choice. But
> either way, you saved a lot of CPU time because you no longer have to
> decide at runtime whether or not this is a resizable buffer - that's
> now something you know at compile time. And even when you do know
> that, finding the reserved size is O(1), not O(log N). It's gotta be a
> winner all round.
>
Sounds like a good approach. The rare cases where the in-place resizing
behavior of the build-in arrays is needed for performance could very
well be handled with a standard Buffer/vector/whatever. To me, it feels a bit
strange to use such an optimization without an explicit reserve()/capacity()
interface anyway... and the lookup performance argument is sound, too.
| |||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | Sönke Ludwig wrote: > So ideally, whenever you get hold of a reference/pointer to invariant data, you should be able to store it and assume that it will never change. > Is change in semantics (work on the compiler) is worth it in this case? > I don't know. I can live with it this way, too - but nevertheless, it would > be a similar generalization like with full-closures (and might be nice even > for non-invariant data, or maybe it's overkill). > Incidentally, this idea just came up in a thread on D.learn (although the post contains some oversights): http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.learn&artnum=11410 Although this would create some performance pitfalls, it would definitely make the language much more robust and remove a whole class of bugs. And those kind of bugs are pretty frequent mainly among beginners of C-like languages. | |||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | On Sat, 23 Feb 2008 13:37:10 +0000, Janice Caron wrote:
> On 23/02/2008, Sönke Ludwig <ludwig@informatik_dot_uni-luebeck.de> wrote:
>> I think that if it's fixed, it has to be done transparently or it my be
>> difficult to explain why s = s ~ "x"; is allowed but s ~= "x"; is not.
>
> You know, the more I think about it, the more I come to the conclusion that you're right, and that
>
> x ~= y;
>
> should behave /exactly/ like
>
> x = x ~ y;
>
> and cause a reallocation every time. Incremental string building is the usual reason given why this isn't the case, but I think there's another way round that problem.
>
> Think about what happens at runtime when x ~= y; is executed. First, the compiler has to decide whether or not x is resizable (that is, does it point to the start of a resizable block?). Well - how does it know that? Answer: it has to ask the garbage collector. OK, so how does the garbage collector know that? Well, presumably, the gc has to have some sort of map or table, so it can look up the address x, and if it's in the table, then the answer is yes. But that lookup operation has to be slower that O(1). No matter how you look at it, there has to be at least O(log N) CPU time being consumed there, where N is the number of gc allocated blocks.
>
> It seems to me that a better approach would be to ditch that strategy altogether, let ~= have predictable behavior even when the array is non-unique, and use a /different/ strategy for incremental array building. I would suggest something as simple as:
>
> struct Buffer!(T)
> {
> private T[] array;
> private uint reserved;
> void opCatAssign() {...}
> T[] toArray() {...}
> const(T)[] toCArray() {...}
> invariant(T)[] toIArray() {...}
> /*etc*/
> }
>
> would do the job. Then instead of doing
>
> void buildString()
> {
> string r;
> r ~= something;
> r ~= somethingElse;
> return r;
> }
>
> you would do
>
> void buildString()
> {
> Buffer!(char) r;
> r ~= something;
> r ~= somethingElse;
> return r.toIArray;
> }
>
> toIArray() would have to call either assumeUnique or idup under the hood. We could argue about which was the most sensible choice. But either way, you saved a lot of CPU time because you no longer have to decide at runtime whether or not this is a resizable buffer - that's now something you know at compile time. And even when you do know that, finding the reserved size is O(1), not O(log N). It's gotta be a winner all round.
Is it possible for you to actually benchmark the performance of the two approaches? (We don't know what else happens behind the scenes, so analysis alone isn't enough)
I don't think Walter would be able to turn down your approach if you have evidence to back it up.
| |||
February 23, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
On 23/02/2008, Janice Caron <caron800@googlemail.com> wrote:
> struct Buffer!(T)
> {
> private T[] array;
> private uint reserved;
> void opCatAssign() {...}
> T[] toArray() {...}
> const(T)[] toCArray() {...}
> invariant(T)[] toIArray() {...}
> /*etc*/
> }
Actually, I think we could simplify that interface to:
struct Buffer!(T)
{
private T[] array;
private uint reserved;
void opCatAssign() {...}
T[] toArray() {...}
}
because if you want to return a const or invariant array, you'd just
have to change the initial declaration of the Buffer. As in
Buffer!(invariant(char)) instead of Buffer!(char).
| ||||
February 24, 2008 Re: 2.011 invariant question | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sönke Ludwig | Sönke Ludwig wrote:
> Maybe I'm missing something - but I don't get what's the reasoning behind the
> following error?
>
> -----------------
> struct FOO {
> invariant int x;
> int y;
> }
>
> int main()
> {
> FOO bar;
> bar.y = 2;
> return 0;
> }
> -----------------
> gives the error:
> test4.d(9): variable test4.main.bar cannot modify struct with immutable members
Compiler bug.
| |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply