Thread overview
Associative Array - where's my error?
Apr 28, 2011
Adrian Mercieca
Apr 28, 2011
Jesse Phillips
Apr 28, 2011
bearophile
Apr 28, 2011
Jesse Phillips
Apr 28, 2011
bearophile
Apr 28, 2011
Jesse Phillips
Apr 28, 2011
Andrej Mitrovic
Apr 28, 2011
Jesse Phillips
Apr 28, 2011
Andrej Mitrovic
Apr 28, 2011
Adrian Mercieca
April 28, 2011
Hi,

I am currently learning D, but have ran into a problem.

I wrote this simple program:

import std.stdio;
import std.string;
import std.conv;

void main()
{
	int[string] dataMap;

	foreach(line; stdin.byLine() ) {
		auto s = line.split("\t");
		auto anum = to!int(s[0]);
		auto aname = cast(string)s[1];
		dataMap[aname] = anum;
	}

	foreach(s; dataMap.keys.sort) {
		writeln(s);
	}
}

The code is in a file named 'rl.d'.

I am running this on Linux (Linux 2.6.32-31-generic-pae #61-Ubuntu SMP i686 GNU/Linux)

The compiler flags used are: dmd -w -wi -O -inline -noboundscheck -release rl.d.

I invoke the program from the command line with this command:

cat data.txt | ./rl

Basically, this is piping the contents of the data.txt file (below) into the program.

data.txt contents:

1	ABC
2	DEF
3	GHI
4	JKL
5	MNO
6	PQR
7	STU
8	VWX
9	YZ

There are two 'fields' per line, a number and a string separated by a tab. Now, when I run the program, I get this output:


YZ
YZ

YZ

YZ

YZ

YZ

YZ

YZ

YZ


Which is nowhere close to what I was expecting.

The program is supposed to read the input line by line, split each line on tab character and populate an associative array with the key being the 2nd string value and set the corresponding value to the value of the 1st field. Then it goes into a loop printing out the sorted keys of the associative array.

I was expecting output like:

ABC
DEF
GHI
JKL
MNO
PQR
STU
VWX
YZ

but, as shown, I get something completely different and unexpected.

Can anyone shed any light on this for me please? What am I doing wrong?

Thanks.
- Adrian.

April 28, 2011
Adrian Mercieca Wrote:

> Hi,
> 
> I am currently learning D, but have ran into a problem.
> 
> I wrote this simple program:
> 
> import std.stdio;
> import std.string;
> import std.conv;
> 
> void main()
> {
> 	int[string] dataMap;
> 
> 	foreach(line; stdin.byLine() ) {
> 		auto s = line.split("\t");
> 		auto anum = to!int(s[0]);
> 		auto aname = cast(string)s[1];
> 		dataMap[aname] = anum;
> 	}
> 
> 	foreach(s; dataMap.keys.sort) {
> 		writeln(s);
> 	}
> }
> 

Quite simple your error is in using cast(string). Solution use to!string(s[1])

The explanation:

byLine reuses the buffer as it iterates, this is why you see YZ, though I would expect remains of some other words in there. Associative Array keys must be immutable (so that their value is not changed by another reference). By casting to string you have told the type system "This value cannot be changed by me or any other reference to this data." But in reality you do continue to modify it because byLine still has the reference.

Casting types is making statements about the type, it is not converting the type (in general). You must have inserted the cast because the compiler said s[1] wasn't immutable and figured you could "force it" into being so. In such cases the compiler is telling something is wrong and you are saying "I don't care. I know I'm right." Don't use cast, to!() is much safer.

Also you could use s[1].idup instead of cast or to!().
April 28, 2011
Jesse Phillips:

> Casting types is making statements about the type, it is not converting the type (in general). You must have inserted the cast because the compiler said s[1] wasn't immutable and figured you could "force it" into being so. In such cases the compiler is telling something is wrong and you are saying "I don't care. I know I'm right." Don't use cast, to!() is much safer.

You have written a nice explanation of the situation, and indeed if you take care of using casts correctly, this situation is not bad. But in practice I think people will not use casts so carefully. This is why I have suggested a different design, essentially: the by line dups on default and doesn't do it on request (this difference is expressed with two different function names or a template argument boolean, etc). So when you don't know what you are doing the code works correctly.

Bye,
bearophile
April 28, 2011
bearophile Wrote:

> You have written a nice explanation of the situation, and indeed if you take care of using casts correctly, this situation is not bad. But in practice I think people will not use casts so carefully. This is why I have suggested a different design, essentially: the by line dups on default and doesn't do it on request (this difference is expressed with two different function names or a template argument boolean, etc). So when you don't know what you are doing the code works correctly.
> 
> Bye,
> bearophile

Thank you.

I think this just hides the issue. To fix the issue you would need byLine to return a string, since user code my still save and modify the returned value and store it in an AA.

Even then this is "fixing" one small area in Phobos. To really make this better you would need to change the semantics of cast to be what new users would "expect." However cast is supposed to be a very blunt tool, Java and C# have gone with changing the semantics and adding new features to provide some of the behavior. My friend recently discovered the 'as' keyword in C#. It was great because he could check for null instead of catching cast exceptions. D on the other hand left cast as a "dangerous" language feature and provided a "safe" library function.

So yes new users will get it wrong. I believe the correct solution is to educate at emphasize to!() over cast in examples.
April 28, 2011
Jesse Phillips:

> I think this just hides the issue. To fix the issue you would need byLine to return a string, since user code my still save and modify the returned value and store it in an AA.

I don't agree. If you perform a cast of the dup-ed char array to a string you will not have problems. The main problem is the missed duplication (and the bad usage of casts in general, for people coming from C/C++).

Bye,
bearophile
April 28, 2011
bearophile Wrote:

> Jesse Phillips:
> 
> > I think this just hides the issue. To fix the issue you would need byLine to return a string, since user code my still save and modify the returned value and store it in an AA.
> 
> I don't agree. If you perform a cast of the dup-ed char array to a string you will not have problems. The main problem is the missed duplication (and the bad usage of casts in general, for people coming from C/C++).
> 
> Bye,
> bearophile

_This_ code wouldn't not have problems, that doesn't mean code won't be written that does have problems.
April 28, 2011
There's more issues with string assignments. For example see this: http://d.puremagic.com/issues/show_bug.cgi?id=5059

In that example, key.name() returns a string. But "key" itself will be reused on every loop, and the string that key holds will be overwritten on each pass.

I haven't gotten any comments on that bug report yet. I'm not sure if it really is a bug, but it certainly is unexpected that an immutable string gets overwritten.
April 28, 2011
Andrej Mitrovic Wrote:

> There's more issues with string assignments. For example see this: http://d.puremagic.com/issues/show_bug.cgi?id=5059
> 
> In that example, key.name() returns a string. But "key" itself will be reused on every loop, and the string that key holds will be overwritten on each pass.

Commented on your bug. It is because of a cast(string) inside the KeySequence's opApply. My point still stands, to!() is safer.

> I haven't gotten any comments on that bug report yet. I'm not sure if it really is a bug, but it certainly is unexpected that an immutable string gets overwritten.

April 28, 2011
Thanks. I'm glad it wasn't a language implementation issue.
April 28, 2011
Hi everyone,

Ok - I think I got the gist of what you guys are saying.

Thanks for the help.
Coming from C/C++ and Java, there's quite a bit for me to learn.
But I'll get there; so far, I am very impressed with D.

Regards.
- Adrian.