Jump to page: 1 2
Thread overview
Documentation of D arrays
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
Sean Kelly
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
BCS
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
BCS
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
Frits van Bommel
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
Frits van Bommel
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
Frits van Bommel
Jan 12, 2007
Sebastian Biallas
Jan 12, 2007
Sebastian Biallas
January 12, 2007
Hello!

I'm trying to understand array handling in D. Unfortunately the official documentation[1] is not very helpful..

[1] http://www.digitalmars.com/d/arrays.html

By trial and error I found out that arrays are passed by some COW magic (where is this documentated?). So, if I want to change the content of an array visible for the caller, I have to pass it with an inout-statement (This works, but is it the canonical way?).

Next question: How can I initialize an array?
It seems like COW works only for parameters. Eg.

void foo(inout char[] s)
{
        s = "blub";
}
void bar()
{
	char[] s;
	foo(s);
	s[1] = 'a'; // will crash
}

So, how can I copy the string "blub" into s? s[] = "blub" doesn't work
because the .length won't be adjusted.
Oh, while writing this I noticed "blub".dup does work. It this the
preferred way or should I manually alter the .length?

So what exactly is T[]? According to the documentation it's a tuple
(pointer, length). So, if I pass a T[] to a function, pointer and length
are passed by value (unless I specify and (in)out statement)? Is this
some array magic or can I use this for own types?

I also found out that I can write
void foo(inout int[] a)
{
	a ~= 1;
}
So "~=" does not only support T[] as RHS but also T. Where is the
documentation for this?

Sorry, if these are obvious questions, but I can't figure this out by the official documentation (or I'm blind).

Regards,
Sebastian
January 12, 2007
Sebastian Biallas wrote:
> Hello!
> 
> I'm trying to understand array handling in D. Unfortunately the official
> documentation[1] is not very helpful..
> 
> [1] http://www.digitalmars.com/d/arrays.html
> 
> By trial and error I found out that arrays are passed by some COW magic
> (where is this documentated?). So, if I want to change the content of an
> array visible for the caller, I have to pass it with an inout-statement
> (This works, but is it the canonical way?).

Almost.  Dynamic arrays are declared internally like so in D:

struct Array
{
    size_t len;
    byte*  ptr;
}

So passing a dynamic array by value is essentially the same as passing around a pointer.  The only effect adding 'inout' to your function will have is that the length of the array can be altered and those changes will persist when the function completes.

There is a brief mention of this in:

http://www.digitalmars.com/d/function.html

"For dynamic array and object parameters, which are passed by reference, in/out/inout apply only to the reference and not the contents."

> Next question: How can I initialize an array?
> It seems like COW works only for parameters. Eg.
> 
> void foo(inout char[] s)
> {
>         s = "blub";
> }
> void bar()
> {
> 	char[] s;
> 	foo(s);
> 	s[1] = 'a'; // will crash
> }

Doing:

    s = "blurb";

allocates no memory, but rather just changes Array.ptr to point to "blurb" and sets Array.len appropriately.  The above code will actually work in Windows because the data segment where string constants are stored is not read-only.

> So, how can I copy the string "blub" into s? s[] = "blub" doesn't work
> because the .length won't be adjusted.

    s = "blurb".dup;

> Oh, while writing this I noticed "blub".dup does work. It this the
> preferred way or should I manually alter the .length?

Yes :-)

> So what exactly is T[]? According to the documentation it's a tuple
> (pointer, length). So, if I pass a T[] to a function, pointer and length
> are passed by value (unless I specify and (in)out statement)? Is this
> some array magic or can I use this for own types?

See above.  You could duplicate this in your own code by creating a struct containing pointers.  Also, I don't think it's a good idea to call T[] a Tuple in D because the term has a fairly specific connotation.  See the section entitled "Tuple Parameters" at http://www.digitalmars.com/d/template.html and also http://www.digitalmars.com/d/phobos/std_typetuple.html

> I also found out that I can write
> void foo(inout int[] a)
> {
> 	a ~= 1;
> }
> So "~=" does not only support T[] as RHS but also T. Where is the
> documentation for this?

http://www.digitalmars.com/d/arrays.html I suppose, though the description isn't explicit.  Rather, it's implied by "the ~= operator means append."

> Sorry, if these are obvious questions, but I can't figure this out by
> the official documentation (or I'm blind).

Not at all.  I've been using D for a few years now, and I still have trouble finding things in the spec.  It's pretty much all there, but not always in the most obvious location.


Sean
January 12, 2007
Reply to Sebastian,

> Hello!
> 
> I'm trying to understand array handling in D. Unfortunately the
> official documentation[1] is not very helpful..
> 
> [1] http://www.digitalmars.com/d/arrays.html
> 
> By trial and error I found out that arrays are passed by some COW
> magic (where is this documentated?). So, if I want to change the
> content of an array visible for the caller, I have to pass it with an
> inout-statement (This works, but is it the canonical way?).

Arrays are references types. If you pass an array to a function, the function gets a copy of the pointer length pair that the caller uses. The function can change the contents of the memory the references but can't change the callers reference to that data (unless you use out or inout). As to it seeming to be COW, if you change the length of an array sometimes the GC can't extend it in place and moves the whole thing to a bigger chunk of ram (this dosn't always happen). When the ~ and ~= operators are used, the GC always makes a copy.

[...]
> the official documentation (or I'm blind).

I offten feel that way myself. I have had so much trouble finding things that /were/ put in a good place that I have a CGI sript on my box that gives me a grep of the whole D spec converted into a webpage with links and everything.

> 
> Regards,
> Sebastian


January 12, 2007
Sean Kelly wrote:
> Sebastian Biallas wrote:
>> Hello!
>>
>> I'm trying to understand array handling in D. Unfortunately the official documentation[1] is not very helpful..
>>
>> [1] http://www.digitalmars.com/d/arrays.html
>>
>> By trial and error I found out that arrays are passed by some COW magic (where is this documentated?). So, if I want to change the content of an array visible for the caller, I have to pass it with an inout-statement (This works, but is it the canonical way?).
> 
> Almost.  Dynamic arrays are declared internally like so in D:
> 
> struct Array
> {
>     size_t len;
>     byte*  ptr;
> }
> 
> So passing a dynamic array by value is essentially the same as passing around a pointer.

But not (Array *) but (len, byte *), I guess?

> The only effect adding 'inout' to your function will
> have is that the length of the array can be altered and those changes
> will persist when the function completes.

Hmm, I'm quite sure I can alter the ptr, too (Implicitly, when I append to the array and there is not enough room).

> There is a brief mention of this in:
> 
> http://www.digitalmars.com/d/function.html
> 
> "For dynamic array and object parameters, which are passed by reference, in/out/inout apply only to the reference and not the contents."

Well, the word "reference" is way to much overloaded. Here you don't
pass the Array (you mentioned above) by reference but the content (the
object ptr points to).

>> Next question: How can I initialize an array?
>> It seems like COW works only for parameters. Eg.
>>
>> void foo(inout char[] s)
>> {
>>         s = "blub";
>> }
>> void bar()
>> {
>>     char[] s;
>>     foo(s);
>>     s[1] = 'a'; // will crash
>> }
> 
> Doing:
> 
>     s = "blurb";
> 
> allocates no memory, but rather just changes Array.ptr to point to "blurb" and sets Array.len appropriately.

Yeah, I guess I understood this already.

Is there something similar to the "const" keyword of C/C++ in D? It
looks a little bit fishy to me, that you can write illegal code in D so
easy.. In C/C++ you can return constant array in way, that the caller
a) knows, it's constant
b) errors are detected at compiler time.

> The above code will actually
> work in Windows because the data segment where string constants are
> stored is not read-only.

For some values of "work" :)

>> So what exactly is T[]? According to the documentation it's a tuple
>> (pointer, length). So, if I pass a T[] to a function, pointer and length
>> are passed by value (unless I specify and (in)out statement)? Is this
>> some array magic or can I use this for own types?
> 
> See above.  You could duplicate this in your own code by creating a struct containing pointers.

But without the COW part?

> Also, I don't think it's a good idea to
> call T[] a Tuple in D because the term has a fairly specific
> connotation.  See the section entitled "Tuple Parameters" at
> http://www.digitalmars.com/d/template.html and also
> http://www.digitalmars.com/d/phobos/std_typetuple.html

Yes, you're right.

>> I also found out that I can write
>> void foo(inout int[] a)
>> {
>>     a ~= 1;
>> }
>> So "~=" does not only support T[] as RHS but also T. Where is the
>> documentation for this?
> 
> http://www.digitalmars.com/d/arrays.html I suppose, though the description isn't explicit.  Rather, it's implied by "the ~= operator means append."

Hmm, that's not the answer I hoped I'd get :)
It's nice to have a language without suprises, but I could only figure
out that the above part by trying it.

>> Sorry, if these are obvious questions, but I can't figure this out by the official documentation (or I'm blind).
> 
> Not at all.  I've been using D for a few years now, and I still have trouble finding things in the spec.  It's pretty much all there, but not always in the most obvious location.

That's sad. On a first glance the documentation looks really good, but then it mostly is about syntax, not about semantic.
January 12, 2007
"Sebastian Biallas" <groups.5.sepp@spamgourmet.com> wrote in message news:eo6ofq$1q2b$2@digitaldaemon.com...
>
> But not (Array *) but (len, byte *), I guess?

Yeah, it's more like (in fact, _exactly_ like) passing around a two-element struct by value.  If you pass a struct by value into a function and modify its members, those changes won't be reflected in the calling function unless you use 'inout'.  The same thing applies for arrays since this is really what's going on behind the scenes.
>
> Hmm, I'm quite sure I can alter the ptr, too (Implicitly, when I append to the array and there is not enough room).

Yes, that's right.

> Well, the word "reference" is way to much overloaded. Here you don't
> pass the Array (you mentioned above) by reference but the content (the
> object ptr points to).

Yes, and this got me a few times too.  Though most of the time I don't need a function to modify an array that's passed into it, just one that's a class member, or maybe modify it and then return it.

> Is there something similar to the "const" keyword of C/C++ in D? It
> looks a little bit fishy to me, that you can write illegal code in D so
> easy.. In C/C++ you can return constant array in way, that the caller
> a) knows, it's constant
> b) errors are detected at compiler time.

No, and this issue has been beaten absolutely to death.  I really don't care what happens with this issue.  I've never actually run into any bugs that would be solved by having const, but your mileage may vary, I guess. PLEASE, I don't want to start another topic about this :)

> For some values of "work" :)

Hehe

> But without the COW part?

COW is not part of the language.  It's just a convention you can follow when writing array-processing functions.  These functions also typically return the array, so the function should be called as "s = foo(s)" instead of "foo(s)".  The "COW" behavior that you were talking about before -- how resizing/reallocating the array in the function had no effect in the caller -- was really just an effect of what I mentioned at the beginning of this post.  The local array "structure" members were changed in the array processing function when you resized the array, and those changes aren't reflected in the calling function.

> Hmm, that's not the answer I hoped I'd get :)
> It's nice to have a language without suprises, but I could only figure
> out that the above part by trying it.

At least it's a nice surprise :)


January 12, 2007
BCS wrote:
> Reply to Sebastian,
> 
>> Hello!
>>
>> I'm trying to understand array handling in D. Unfortunately the official documentation[1] is not very helpful..
>>
>> [1] http://www.digitalmars.com/d/arrays.html
>>
>> By trial and error I found out that arrays are passed by some COW magic (where is this documentated?). So, if I want to change the content of an array visible for the caller, I have to pass it with an inout-statement (This works, but is it the canonical way?).
> 
> Arrays are references types. If you pass an array to a function, the function gets a copy of the pointer length pair that the caller uses. The function can change the contents of the memory the references but can't change the callers reference to that data (unless you use out or inout).

Ah, you're right. I guess I have a better picture now.

> As to it seeming to be COW, if you change the length of an array sometimes the GC can't extend it in place and moves the whole thing to a bigger chunk of ram (this dosn't always happen). When the ~ and ~= operators are used, the GC always makes a copy.

Yeah, that's the trick. You can change it in-place (without
inout-statement), and the COW-part happens once you alter the length
(implicitly or explicitly).

I guess what I called COW isn't even the right term.

So, new question: How to I pass T[] to foo(), so that foo() isn't
allowed to change the content of T[]?
January 12, 2007
BCS wrote:
> bigger chunk of ram (this dosn't always happen). When the ~ and ~= operators are used, the GC always makes a copy.

~ always makes a copy, but ~= only does so when necessary.
January 12, 2007
Jarrett Billingsley wrote:
> "Sebastian Biallas" <groups.5.sepp@spamgourmet.com> wrote in message
>> Is there something similar to the "const" keyword of C/C++ in D? It
>> looks a little bit fishy to me, that you can write illegal code in D so
>> easy.. In C/C++ you can return constant array in way, that the caller
>> a) knows, it's constant
>> b) errors are detected at compiler time.
> 
> No, and this issue has been beaten absolutely to death.  I really don't care what happens with this issue.  I've never actually run into any bugs that would be solved by having const, but your mileage may vary, I guess. PLEASE, I don't want to start another topic about this :)

Sorry, I'm new to this newsgroup, will google :)

I'm from the C/C++/Java/Ruby world (not to mention the functional languages) and these languages have pretty easy constraints:

C: you pass everything by value
C++: you pass by value or by reference (and a reference is -- more or
less -- just a pointer)
Java: you pass either PODs or references(pointers) by value
Ruby: you pass everything by reference

D doen't fit it this categories that well. That arrays are passed by reference means something different, because an array in D isn't a first class object (or whatever I should call this).

Well, I guess there are some D idioms which avoid the const array problem.

January 12, 2007
Frits van Bommel wrote:
> BCS wrote:
>> bigger chunk of ram (this dosn't always happen). When the ~ and ~= operators are used, the GC always makes a copy.
> 
> ~ always makes a copy, but ~= only does so when necessary.

The first one is documented on the array page, but where is the documentation for ~=? Common knowledge by using D?

BtW: What exacly happens on:

a = b ~ c and a ~= b

? Is this some build-in opCat? What are the semantics?
January 12, 2007
Sebastian Biallas wrote:
> Frits van Bommel wrote:
>> BCS wrote:
>>> bigger chunk of ram (this dosn't always happen). When the ~ and ~=
>>> operators are used, the GC always makes a copy.
>> ~ always makes a copy, but ~= only does so when necessary.
> 
> The first one is documented on the array page, but where is the
> documentation for ~=? Common knowledge by using D?

Not sure, but it should be in the spec somewhere...

> BtW: What exacly happens on:
> 
> a = b ~ c and a ~= b
> 
> ? Is this some build-in opCat? What are the semantics?

You can see it as a built-in opCat if you like.
What happens behind the scenes is that a function in the runtime is called.
The source to these functions is in dmd/src/phobos/internal/gc/gc.d if you really want to know exactly what they do... (_d_arraycat for ~, _d_arrayappend for ~= with array, _d_arrayappendc for ~= with single element)
« First   ‹ Prev
1 2