Thread overview
Reading a file eats whole memory
Oct 21, 2007
Emil Wojak
Oct 21, 2007
div0
Oct 21, 2007
Frank Benoit
Oct 21, 2007
Emil Wojak
October 21, 2007
Hi!

Could someone please explain why this code tries to eat my 1 GB memory and gets killed by the kernel afterwards? Eventually it prints "Error: Out of memory" when I set ulimit on memory prior to launching the program.

The code:
import std.stream;

int main(char [][] args) {
	Stream input=new File(args[0]);

	char[] data;
	input.read(data);
	input.close();
	return 0;
}

My intention was to read the executable itself, which is about 444 kB.
I'm running Linux, compiling with Digital Mars D Compiler v1.022
October 21, 2007
Emil Wojak wrote:
> Hi!
> 
> Could someone please explain why this code tries to eat my 1 GB memory and gets killed by the kernel afterwards? Eventually it prints "Error: Out of memory" when I set ulimit on memory prior to launching the program.
> 
> The code:
> import std.stream;
> 
> int main(char [][] args) {
>     Stream input=new File(args[0]);
> 
>     char[] data;
>     input.read(data);
>     input.close();
>     return 0;
> }
> 
> My intention was to read the executable itself, which is about 444 kB.
> I'm running Linux, compiling with Digital Mars D Compiler v1.022

You are trying to read a string in, so I guess the routine is using the 1st four bytes as a string length count. That's how tango works anyway IIRC.

-- 
My enormous talent is exceeded only by my outrageous laziness.
October 21, 2007
"div0" <div0@users.sourceforge.net> wrote in message news:fffqid$1csk$1@digitalmars.com...
>
> You are trying to read a string in, so I guess the routine is using the 1st four bytes as a string length count. That's how tango works anyway IIRC.
>
> -- 

You are precisely right.

If you just want to get all the data in a file, just do:

import std.file;

int main(char[][] args)
{
    ubyte[] data = cast(ubyte[])std.file.read(args[0]);
    return 0;
}

Two things: one, std.file.read returns a void[], which is a bit like D's equivalent of a void* -- it can point to anything, but you can't modify its data, and it also has a length which indicates the number of bytes in the data.  Two, I'm casting to ubyte[] instead of char[].  Do NOT use char[] for "plain old data" as in C.  char is a UTF-8 datatype, not a "one byte" datatype.  You'll most likely get errors unless your input file is all plain ASCII or UTF-8 text.  D provides the byte and ubyte types for raw byte data.


October 21, 2007
Emil Wojak schrieb:
> Hi!
> 
> Could someone please explain why this code tries to eat my 1 GB memory and gets killed by the kernel afterwards? Eventually it prints "Error: Out of memory" when I set ulimit on memory prior to launching the program.
> 
> The code:
> import std.stream;
> 
> int main(char [][] args) {
>     Stream input=new File(args[0]);
> 
>     char[] data;
>     input.read(data);
>     input.close();
>     return 0;
> }
> 
> My intention was to read the executable itself, which is about 444 kB. I'm running Linux, compiling with Digital Mars D Compiler v1.022

other had commented the file reading...

Using arg[0] to access the programs binary is not save, because if it is called via the PATH variable it does not contain the path.

/proc/self/exe is a link to your executable.


October 21, 2007
Dnia 21-10-2007 o 17:45:54 Frank Benoit <keinfarbton@googlemail.com> napisaƂ(a):

Thank you everyone for your explanations. This test below proves what you wrote:

$ echo -en '\x03\x00\x00\x00abcdefgh' > string.dat

A test code:
-----------------
import std.stdio;
import std.stream;

int main(char [][] args) {
	Stream input=new File(args[1], FileMode.In);
	char[] data;
	input.read(data);
	writefln("data.length=", data.length, " data=", data);
	input.close();
	return 0;
}
-----------------
$ dmd test.d
$ ./test ./string.dat
data.length=3 data=abc

So the program reads 7 bytes - array length (4 bytes) + 3 bytes of data.
Switching type of data to ubyte[5] makes the program read exactly 5 bytes ("\x03\x00\x00\x00a").

> Using arg[0] to access the programs binary is not save, because if it is
> called via the PATH variable it does not contain the path.
> /proc/self/exe is a link to your executable.

Well, argv[0] was just a quick and dirty test file, nevertheless thanks for your hint :)