Jump to page: 1 2 3
Thread overview
[Issue 1654] New: Array concatenation should result in mutable or invariant depending on usage
Nov 09, 2007
d-bugmail
Jan 31, 2008
d-bugmail
Jan 31, 2008
d-bugmail
Jan 31, 2008
d-bugmail
Jan 31, 2008
d-bugmail
Jan 31, 2008
d-bugmail
Jan 31, 2008
d-bugmail
Feb 01, 2008
d-bugmail
Feb 01, 2008
d-bugmail
Mar 26, 2008
d-bugmail
Mar 26, 2008
Janice Caron
Mar 26, 2008
d-bugmail
Mar 28, 2008
d-bugmail
May 01, 2008
d-bugmail
May 01, 2008
d-bugmail
May 28, 2008
d-bugmail
May 28, 2008
d-bugmail
Jan 05, 2013
yebblies
Apr 26, 2013
yebblies
Apr 28, 2013
timon.gehr@gmx.ch
Apr 28, 2013
Kenji Hara
November 09, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1654

           Summary: Array concatenation should result in mutable or
                    invariant depending on usage
           Product: D
           Version: 2.007
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: schveiguy@yahoo.com


One of the great aspects of D is the array concatenation operator.  However, the current compiler sets the type of the result according to the operands. This prevents using invariant or const arrays to build a mutable array without doing extra work.  For example, the toStringz function:

char *toStringz(const(char)[] s)
{
  copy = new char[s.length + 1];
  copy[0..s.length] = s;
  copy[s.length] = 0;
  return copy.ptr;
}

This could easily be written as:
return (s ~ "\0").ptr;

Except that the result of the concatenation is a const(char)[].

But why does it need to be const?  I'm guessing that the reason is because when dealing with functions that take invariant strings, it would be ugly to always have to cast to invariant when doing concatenation.  And of course, functions that use invariant strings can be optimized differently than mutable or even const strings.

The issue I see is that the true result *IS* mutable, because it is generated from the heap.  It's artificially cast to invariant to keep the type the same.

So here is a proposal that would allow efficiency, and utilization of the fact that concatenation always results in newly allocated data:

First, there should be two array concatenation operator internal functions. One that takes two invariant arrays and one that takes two const arrays.  I'll call them icat and ccat respectively.  Both will return mutable arrays.  The icat function can be pure when pure functions are supported.

When the compiler encounters an array concatenation in code, if at least one argument is not invariant, then ccat will be used.  If both arguments are invariant, then icat will be used.

Regardless of the method used, if the resulting rvalue is expected to be invariant, then an implicit invariant cast is allowed.  This means that the following code does not need invariant casts:

char[] blah = "hello".dup;
string blah2 = blah ~ "world";

If the result is assigned to an lvalue that is either const or mutable, then no cast is needed (mutable array is implicitly castable to const).

This would allow us to use the full potential of array concatenation without having to worry about casting away invariant or writing several lines of code to get around this problem.

I'll further point out that the workaround for the current problem does not allow efficient concatenation.  For example:

const(char)[] a1, a2;

// method 1, just dup it.
// the problem is that we make a needless temporary copy of the data
char[] cat1 = (a1 ~ a2).dup;

// method 2, build a temporary array
// the problem is that the initialization of the array is needless.
char[] cat2 = new char[a1.length + a2.length]; // needless init of memory
cat2[0..a1.length] = a1;
cat2[a1.length..$] = a2;

In addition, idup is not needed for creating an invariant array out of the concatenation of two mutable or const arrays.


-- 

January 31, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654


andrei@metalanguage.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




------- Comment #1 from andrei@metalanguage.com  2008-01-31 14:14 -------
We are well aware of the limitation, and are considering a number of approaches to it, some along the lines suggested by this enhancement request. It is a hard problem for multiple reasons. One of them is that the solution suggested in the enhancement request is unsound for a number of types, including arrays of arrays. This is because concatenating two arrays of arrays is not "deep" - it will create a new outer array but it will not duplicate each of the contained array. Consider:

    int[][] array1 = new int[][10];
    invariant(int[]) x = ([ 1, 2, 3 ]).idup;
    invariant(int[])[] array2 = [ x ];
    int[][] array3 = array1 ~ array2;
    array3[10][0] = 5; // unsound, changes invariant array

This illustrates how code with the proposed semantics can violate immutability.

To recap today's semantics, consider:

import std.stdio;

struct S
{
    int * p;
}

void main()
{
    test!(int);
    test!(int[]);
    test!(S);
}

void test(T)()
{
    auto m = new T[10];
    auto c = new const(T)[15];
    auto i = new invariant(T)[20];
    auto mm = m ~ m;
    auto mc = m ~ c;
    auto mi = m ~ i;
    auto cc = c ~ c;
    auto ci = c ~ i;
    auto ii = i ~ i;
    writeln("mm: ", typeof(mm).stringof);
    writeln("mc: ", typeof(mc).stringof);
    writeln("mi: ", typeof(mi).stringof);
    writeln("cc: ", typeof(cc).stringof);
    writeln("ci: ", typeof(ci).stringof);
    writeln("ii: ", typeof(ii).stringof);
}

This program prints:

mm: int[]
mc: int[]
mi: int[]
cc: const(int)[]
ci: int[]
ii: invariant(int)[]
mm: int[][]
mc: const(int)[][]
mi: const(int)[][]
cc: const(int[])[]
ci: const(int)[][]
ii: invariant(int[])[]
mm: S[]
mc: const(S)[]
mi: const(S)[]
cc: const(S)[]
ci: const(S)[]
ii: invariant(S)[]

which is quite inconsistent (handles int, int[], and S all differently) and
worthy a bug report.

One direction explored to address this problem is allowing definition of polysemous types - types that can exist only as rvalues and allow implicit conversion to a number of other types. Within that framework it should be possible to design a parameterized type called Unique!(T) that allows conversions to invariant(T) and T alike. Then a function such as listdir could return Unique!(char[]) which can be converted, to its client's desire, to either a mutable or immutable array.

Unique!(T) would remain unsound because there is no statically checkable way to initialize it properly, so it needs to be used with care. Basically the issue of typing "fresh" data reveals a hole in our type system that would be hard to fix without complicating it to an extent we considered unacceptable. To estimate the work involved for both compiler designers and users, I suggest consulting the paper archjava.fluid.cs.cmu.edu/papers/oopsla02.pdf (look for "unique).


Andrei


-- 

January 31, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654





------- Comment #2 from schveiguy@yahoo.com  2008-01-31 14:30 -------
Thank you for looking at this issue.  You are correct in your statement that my proposed solution does not solve your specific example.  There could be other examples, such as if you had a struct S that contained a pointer, then concatenating an invariant(S)[] into a mutable S[] would result in the pointer violating the immutability.

However, this could be alleviated if we had the following rule:

if X is a data type, and v1 is of the type invariant(X) and v2 is of the type X, it is allowed to copy v1 to v2 provided that the X data type is not a pointer or a reference or contains any pointers or references.

I think this rule is sound and really should be an enhancement on its own, but it provides a way that my solution could work.  With this rule, my example passes the rule since copying an invariant(char) to a char is allowed, but your example fails because copying an invariant(int[]) to a int[] fails.

What do you think?


-- 

January 31, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654





------- Comment #3 from andrei@metalanguage.com  2008-01-31 15:45 -------
The rule you mention was suggested and rejected. It is sound, but the problem with it is that the semantics of a type becomes implicitly dependent on an implementation detail (e.g. a private pointer member). We don't want to separate "types with at least a pointer in'em" from "types with no pointers in'em".


-- 

January 31, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654





------- Comment #4 from schveiguy@yahoo.com  2008-01-31 16:32 -------
Forgive me for pushing this point, but it seems that D already has types that are inherently considered differently because they have pointers.  i.e. classes:

X *x = new X;

If X is a struct or int, this works.  If it is a class, it fails.

Also, how is treating value types different from pointer-containing types a problem?  I don't see how it is detrimental to the language.

Thanks for hearing my opinions on this, I'm sure you've discussed this type of thing ad-nauseam.


-- 

January 31, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654





------- Comment #5 from andrei@metalanguage.com  2008-01-31 17:26 -------
It's great to push the point. It is true that classes are reference types and as such are distinct from structs and builtins, which are value types. But that's a decision that is taken up front by the type designer: do I want dynamic polymorphism, inheritance, infinite lifetime (garbage collection), reference semantics? Then I write:

class Foo { ... }

Conversely: do I want eager copy, value semantics, deterministic termination (that's to come in D 2.0 in a couple of weeks)? Then I write:

struct Bar { ... }

This does introduce an inconsistency in the type system, but it tends to work well in practice and is documented upfront in the definition of the type. In contrast, agreeing to further part types depending on whether they contain pointers or not is murkier.

That is not even the main problem. At the risk of spreading FUD, Real Soon Now, not all types containing pointers will be non-values. In a couple of weeks we'll be able to write (syntax details may change):

struct Value {
  int * p = null;
  this() { p = new int; }
  ~this() { delete p; }
  this(scope) { p = new int(*p); }
}

This would effectively implement a structure containing a pointer but with 100% value semantics (the third function is automatically invoked whenever an object is copied into a new scope, right after the bitwise copy).

Note that the assignment operator will do the right thing, which is: (1)
duplicate the source into a temporary by bitwise copying it and then invoking
this(scope), (2) destroy the assignment target, (3) bitwise copy the temporary
into the assignment target.


Andrei


-- 

January 31, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654





------- Comment #6 from schveiguy@yahoo.com  2008-01-31 17:49 -------
I still believe the rule works.

The general rule I am extending is, for any basic type(int, char, byte, etc.), you can copy an invariant version to a mutable version without issue.  i.e.:

invariant(int) x = 1;
int y = x; // works (I think :)

If we imagine that any time you copy a structure or a buffer, instead of doing a memcpy (which is an implementation detail), you do a copy of each member/element.

so:

char[5] x;
x[] = "hello";

is equivalent to:

foreach(i, c; "hello")
   x[i] = c;

Which should work because of the first rule (I can copy an invariant(char) to a
char).

Now, for a struct, you can postulate the same thing:

struct S
{
  int x;
}

invariant(S) s1 = cast(invariant)S(1);
S s2;

s2 = s1;

is equivalent to:

s2.x = s1.x;

Which should work because of the first rule.

However, if we think about pointers:

invariant(char)* cp = "hello".ptr;
char * cp2;
cp2 = cp;

this fails because now you have a mutable pointer to invariant data.  Here is the rule I am trying to exploit.  Because a type has a pointer in it:

struct S
{
  char *cp;
}

invariant(S) s1;
S s2;
s2 = s1;

This is equivalent to:
s2.cp = s1.cp;

which fails the rule above, so the whole thing fails.

With your new syntax, let's look at how it works:

struct Value {
  int * p = null;
  this() { p = new int; }
  ~this() { delete p; }
  this(scope) { p = new int(*p); }
}

invariant(Value) v1;
Value v2;

v2 = v1;
Now, this is possible.  Because instead of using p, we are using *p, which is
of type invariant(int).  invariant(int) can be casted to int implicitly, and so
the rule still works.

I'm not sure if this was intended as a use for your new syntax, but I certainly still believe that it doesn't break the rules (BTW, I like the idea).

Addressing your point about parting types if they contain pointers are not, I don't really think of it that way.  What I think of it as is that an invariant pointer cannot be implicitly copied to a mutable pointer.  If by copying a type, you have to copy an invariant pointer to a mutable pointer, then the compile fails.  If you do not have to copy an invariant pointer to a mutable pointer, then the copy works.  It's just easier to explain as "types that contain invariant pointers cannot be copied to a non-invariant version of the type".  The compiler can check this without adding extra runtime information, so implementation-wise it does not affect the output.  Semantic-wise, it doesn't break existing code, it just allows operations that should be valid but aren't under the current scheme.  In fact, I think it's more consistent with the way builtin types work.

BTW, the same applies to const.

-Steve


-- 

February 01, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654


andrei@metalanguage.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |major




------- Comment #7 from andrei@metalanguage.com  2008-01-31 19:02 -------
That's a very interesting idea. Among other things, you show how our copy construction scheme is incomplete because it "loses" the qualifiers of the source, information that turns out to be essential.

I'll make sure I'll bring this up to Walter, and to give you proper credit if this idea makes it into the language. Thanks!


-- 

February 01, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654





------- Comment #8 from schveiguy@yahoo.com  2008-01-31 21:09 -------
*blushes* :)  Thanks for considering the idea!


-- 

March 26, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=1654


htvennik@zonnet.nl changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |htvennik@zonnet.nl




------- Comment #9 from htvennik@zonnet.nl  2008-03-26 15:05 -------
I read all comments on this enhancement request yesterday, and now I was thinking about it again and I suddenly realized that the solution is very simple in fact. D should be extended with a new implicit type qualifier 'unique'. By 'implicit' I mean that such a qualifier should NOT appear anywhere in the syntax, just like 'mutable' does not appear anywhere in the syntax, but still exists conceptually. The unique qualifier should be automatically applied by the compiler to the result type of any expression returning a reference to newly constructed (or copied) data. Anything of a type qualified as unique could be implicitly cast to any of mutable, invariant and const.

So if the array concatenation operator is modifier to always return a unique reference, the following would all be perfectly valid:

char[] mutableString = "foo" ~ "bar";
invariant(char[]) invariantString = "foo" ~ "bar";
invariant(char)[] reassignableInvariantString = "foo" ~ "bar";

Of course a NewExpression would also return a unique reference:

struct S {
    int i;
}

invariant(S) s = new S(4);   // valid


-- 

« First   ‹ Prev
1 2 3