Jump to page: 1 2
Thread overview
array .dup bug in 0.97 and 0.98? - matrix_dup_bug.zip
Aug 05, 2004
Dave
Aug 05, 2004
Nick
Aug 05, 2004
Dave
Aug 05, 2004
Dave
Aug 05, 2004
Regan Heath
Aug 06, 2004
Dave
Aug 06, 2004
J C Calvarese
Aug 06, 2004
Juanjo Álvarez
Aug 06, 2004
Dave
Aug 06, 2004
Juanjo Álvarez
Aug 07, 2004
Dave
Aug 08, 2004
Regan Heath
Aug 10, 2004
Dave
Aug 19, 2004
Juanjo Álvarez
Aug 19, 2004
Dave
August 05, 2004
Wow! Great language Walter!!

Please see attached. It contains the .d and two .cod files (one with the bug and one without). Comments in the .d file explain the potential problem fairly well, hopefully.

Also, would you like reports and examples on performance issues should I run into any at this time? I would be more than happy to take the time and provide this info. if it is wanted.

Thanks,
Dave


August 05, 2004
In article <cetin9$1oo1$1@digitaldaemon.com>, Dave says...

>Wow! Great language Walter!!
>
>Please see attached. It contains the .d and two .cod files (one with the bug and one without). Comments in the .d file explain the potential problem fairly well, hopefully.

I'm not quite sure I understand the problem. Are you trying to copy a double array with a dup? In a double array such as int[][] mm, each element mm[] is a pointer to an array int[]. Writing mm.dup gives you a copy of the table of row pointer, but the pointers still point to the same actual data. If you want to copy the entire matrix you would write something like

# int[][] copyMatrix(int [][] m)
# {
#   int[][] m2;
#   m2.length = m.length;
#   foreach(int i, inout int[] row; m2)
#     row = m[i].dup;
#   return m2;
# }
#
# ...
# m1 = copyMatrix(m2);

Incidently you can also write the last part as m1 = m2.copyMatrix(), but this
only works on arrays (as discussed recently on one of these NGs.)

Nick


August 05, 2004
In article <cetnkl$1r0r$1@digitaldaemon.com>, Nick says...
>
>In article <cetin9$1oo1$1@digitaldaemon.com>, Dave says...
>
>I'm not quite sure I understand the problem. Are you trying to copy a double array with a dup? In a double array such as int[][] mm, each element mm[] is a pointer to an array int[]. Writing mm.dup gives you a copy of the table of row pointer, but the pointers still point to the same actual data. If you want to copy the entire matrix you would write something like
>

Yes, I was trying to copy the whole matrix with a dup. on m1.

I guess what you say makes sense (if you are used to the C/C++ 'way' at least), but I gotta say is not explained well in the docs., the compiler didn't complain at all, and the thing didn't crash at runtime.

Besides that, I tried this:

int[][] m1 = mkmatrix(SIZE, SIZE); // mkmatrix allocates/init's each element
printf("%d, %d, %d\n",m1.length,m1[0].length, m1[m1.length - 1].length);
printf("%d %d %d %d\n",m1[0][0],m1[2][3],m1[3][2],m1[4][4]);

int[][] mx = m1.dup;
printf("%d, %d, %d\n",mx.length,mx[0].length, mx[mx.length - 1].length);
printf("%d %d %d %d\n",mx[0][0],mx[2][3],mx[3][2],mx[4][4]);

The results were the same for both:

30, 30, 30
1 64 93 125
30, 30, 30
1 64 93 125

The way I look at it and how the compiler apparently looks at it is that m1 and mx are both type int[][], so therefore a 'dup' that is applied directly to m1 should therefore allocate for and copy the entire thing.

Either way, there is a bug somewhere in the compiler - either it is copying the whole matrix as it is supposed to and the buglet is further along in the code or it is copying the entire matrix and is _not_ supposed to do that. If it is not supposed to allocate and copy everything, then at least a warning should be issued because the m1.dup code is pretty intuitive (and nice! if that is how it is supposed to work).

I think the dup part is working the way it should but the buglet is further along in the code, like in the mmult(...) part.

Maybe the compiler is getting references mixed up after a 'copy on write' or something..

- Dave


August 05, 2004
In article <cetvkv$1vpp$1@digitaldaemon.com>, Dave says...
>
>In article <cetnkl$1r0r$1@digitaldaemon.com>, Nick says...
>>
>>In article <cetin9$1oo1$1@digitaldaemon.com>, Dave says...
>>
>>I'm not quite sure I understand the problem. Are you trying to copy a double array with a dup? In a double array such as int[][] mm, each element mm[] is a pointer to an array int[]. Writing mm.dup gives you a copy of the table of row pointer, but the pointers still point to the same actual data. If you want to copy the entire matrix you would write something like
>>
>
>Yes, I was trying to copy the whole matrix with a dup. on m1.
>

I apologize - you are exactly correct in what is happening. I guess it would not necessarily be considered a bug and my reply just confused things more.

After the dup and subsequent writes to mm, the m1 data was being modified to be the same as mm, as you implied.

What was further confusing me was the copy-on-write stuff, for example, for operations on char[].

Which begs the question, since 1-D arrays are copy-on-write, shouldn't the 2nd dimension in a dup'ed matrix also be copy-on-write?? Or does copy-on-write only apply to char[]'s and not any other type of array??

I think that is really non-intuitive. I think of dup as a synonym for 'copy' and most object.copy() type of methods I've used and written actually make a deep copy of the object and it's contents (in this case, I'm looking at an int[][] as an object).

- Dave


August 05, 2004
On Thu, 5 Aug 2004 20:54:08 +0000 (UTC), Dave <Dave_member@pathlink.com> wrote:
> In article <cetvkv$1vpp$1@digitaldaemon.com>, Dave says...
>>
>> In article <cetnkl$1r0r$1@digitaldaemon.com>, Nick says...
>>>
>>> In article <cetin9$1oo1$1@digitaldaemon.com>, Dave says...
>>>
>>> I'm not quite sure I understand the problem. Are you trying to copy a double
>>> array with a dup? In a double array such as int[][] mm, each element mm[] is a
>>> pointer to an array int[]. Writing mm.dup gives you a copy of the table of row
>>> pointer, but the pointers still point to the same actual data. If you want to
>>> copy the entire matrix you would write something like
>>>
>>
>> Yes, I was trying to copy the whole matrix with a dup. on m1.
>>
>
> I apologize - you are exactly correct in what is happening. I guess it would not
> necessarily be considered a bug and my reply just confused things more.
>
> After the dup and subsequent writes to mm, the m1 data was being modified to be
> the same as mm, as you implied.
>
> What was further confusing me was the copy-on-write stuff, for example, for
> operations on char[].

char[] doesn't exactly have copy-on-write semantics, for example:

char[] a = "regan";
char[] b = a;

b[0] = 'm';
assert(b[0] == a[0]);

will _not_ assert, this is because a and b are references, in this case to the same data.

Similarly, this..

char[] a = "regan";
char[] b = a[1..3];

b[0] = 'o';
assert(a[1] == b[0]);

will also not assert, this is because b simply references the same data as a.

This however...

char[] a = "regan";
char[] b;

b = a ~ " was here";

will copy the contents of a, for b, and append the new data.
The docs specifically state that an append will copy.

> Which begs the question, since 1-D arrays are copy-on-write, shouldn't the 2nd
> dimension in a dup'ed matrix also be copy-on-write?? Or does copy-on-write only
> apply to char[]'s and not any other type of array??

An 'int[][]' is a reference, to an array of references, to arrays of ints.
So...

int[][] a;
a.dup

says duplicate the data 'a' references, which is an array of references, to arrays of ints, the array of references gets duplicated, but the arrays of ints they refer to do not.

Nicks function was duplicating the arrays of ints as well.

> I think that is really non-intuitive. I think of dup as a synonym for 'copy' and
> most object.copy() type of methods I've used and written actually make a deep
> copy of the object and it's contents (in this case, I'm looking at an int[][] as
> an object).

And it does, the difference is what is considered the object in this case.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
August 06, 2004
In article <opsb99pvnp5a2sq9@digitalmars.com>, Regan Heath says...
>
>And it does, the difference is what is considered the object in this case.
>
>Regan

Got it - thanks for the quick replies.

In that same attachment with the original post, there is a performance comparison between dmd, dmc, Intel and gcc for that code.

I suspect the question of array performance is going to be on many potential users minds and gcc and intel perform 3x better for this. Of course, Java is still living bad perf. down even though most runtimes are now pretty good, at least for native data types, and I would like to see D avoid that wrap.

I know, this code is kind-of trivial, artificial, etc. but a 3x difference is still pretty large (and yes, I used -O -inline -release).

Not complaining - just trying to make the language and tools better. Actually am quite pleased with things like OutBuffer performance, etc. Other things, like the native associative arrays though seem pretty slow compared to C++ hash_map<> and Java HashMap().

Does anyone know if there are planned improvements before what I presume will be the big release of 1.0? Is there any "roadmap" around I could look at?

Thanks.


August 06, 2004
Dave wrote:
> In article <opsb99pvnp5a2sq9@digitalmars.com>, Regan Heath says...
> 
>>And it does, the difference is what is considered the object in this case.
>>
>>Regan
> 
> 
> Got it - thanks for the quick replies.
> 
> In that same attachment with the original post, there is a performance
> comparison between dmd, dmc, Intel and gcc for that code.
> 
> I suspect the question of array performance is going to be on many potential
> users minds and gcc and intel perform 3x better for this. Of course, Java is
> still living bad perf. down even though most runtimes are now pretty good, at
> least for native data types, and I would like to see D avoid that wrap.
> 
> I know, this code is kind-of trivial, artificial, etc. but a 3x difference is
> still pretty large (and yes, I used -O -inline -release).
> 
> Not complaining - just trying to make the language and tools better. Actually am
> quite pleased with things like OutBuffer performance, etc. Other things, like
> the native associative arrays though seem pretty slow compared to C++ hash_map<>
> and Java HashMap().
> 
> Does anyone know if there are planned improvements before what I presume will be
> the big release of 1.0? Is there any "roadmap" around I could look at?
> 
> Thanks.

From: http://www.digitalmars.com/d/future.html

Future Directions
The following new features for D are planned, but the details have not been worked out:

   1. Mixins.
   2. Template inheritance.
   3. Array literal expressions.

Since #1 is essentially done, this page is likely out of date.

I get the impression that once the important bugs and are squashed (and don't ask me which ones are important because I don't know), it'll be stamped D 1.0 and sent out the door. I suspect that the only major features that might be added before 1.0 would be the ones inspired/demanded by the D Template Library project.

Then, I think Walter's planning on adding optimizations in the trek to 2.0. And new features that fit into the D paradigm would be accumulated, too.

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/
August 06, 2004
Dave wrote:

> In article <opsb99pvnp5a2sq9@digitalmars.com>, Regan Heath says...
>>
>>And it does, the difference is what is considered the object in this case.
>>
>>Regan
> 
> Got it - thanks for the quick replies.
> 
> In that same attachment with the original post, there is a performance comparison between dmd, dmc, Intel and gcc for that code.
> 
> I suspect the question of array performance is going to be on many potential users minds and gcc and intel perform 3x better for this. Of course, Java is still living bad perf. down even though most runtimes are now pretty good, at least for native data types, and I would like to see D avoid that wrap.
> 
> I know, this code is kind-of trivial, artificial, etc. but a 3x difference is still pretty large (and yes, I used -O -inline -release).
> 
> Not complaining - just trying to make the language and tools better. Actually am quite pleased with things like OutBuffer performance, etc. Other things, like the native associative arrays though seem pretty slow compared to C++ hash_map<> and Java HashMap().
> 
> Does anyone know if there are planned improvements before what I presume will be the big release of 1.0? Is there any "roadmap" around I could look at?
> 
> Thanks.

Somebody with time should implement and run the "Great Language Shootout" (updated version) to find problems like that in the current DMD compiler.

http://shootout.alioth.debian.org/

PS: How GDC compare to DMD on array handling?
August 06, 2004
In article <cevk5l$3e4$1@digitaldaemon.com>, Juanjo =?ISO-8859-15?Q?=C1lvarez?= says...
>
>Somebody with time should implement and run the "Great Language Shootout" (updated version) to find problems like that in the current DMD compiler.
>
>http://shootout.alioth.debian.org/
>
>PS: How GDC compare to DMD on array handling?

Where is the updated version? I've already run half the tests, thanks to a quick start from http://www.functionalfuture.com/d/

For this particular test, GDC actually took over 2x as long compiled with:
'-O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4 -static'. Based on
similiar differences between g++ and gcj, I think this has a lot to do with the
implementation of the frontend with the gcc backend and doesn't have anything to
do with D itself (gdc is brand-new, right)?

I would be happy to start on this comparison (but I can't 'promise' that I'll have the time to finish it anytime real soon).

I therefore need to know thoughts on:

- Use the built-in Associative Array, or not? If not, is there a DTL
hash_map-like class I should use? I would vote for the built-in AA because
that's what'll be used most and so far it seems pretty slow.
- Use the built-in string, OutBuffer or DTL string (if there is one)?
- I plan to always use the built-in arrays for the numeric stuff, unless
somebody can give me a real good reason to use something else (like DTL).

Or...

- Just make it go as fast as possible (barring inline assembler) no matter what the intent of the language design?

Thanks..


August 06, 2004
Dave wrote:


>>Somebody with time should implement and run the "Great Language Shootout" (updated version) to find problems like that in the current DMD compiler.
>>
>>http://shootout.alioth.debian.org/
>>
>>PS: How GDC compare to DMD on array handling?
> 
> Where is the updated version? I've already run half the tests, thanks to a quick start from http://www.functionalfuture.com/d/

The url posted is the one of the updated version.

> I would be happy to start on this comparison (but I can't 'promise' that I'll have the time to finish it anytime real soon).
> 
> I therefore need to know thoughts on:
> 
> - Use the built-in Associative Array, or not? If not, is there a DTL hash_map-like class I should use? I would vote for the built-in AA because that's what'll be used most and so far it seems pretty slow.

The objective would be to fix the slow parts of the language, if the native associative array is damn slow use it so maybe big W fix it :)

> - Use the built-in string, OutBuffer or DTL string (if there is one)?
> - I plan to always use the built-in arrays for the numeric stuff, unless
> somebody can give me a real good reason to use something else (like DTL).

Native, it will shown better how nice string manipulation can be in D.

> - Just make it go as fast as possible (barring inline assembler) no matter what the intent of the language design?

Please, not.

« First   ‹ Prev
1 2