Thread overview
Converting from std.file.read's void[]
Sep 21, 2010
Jonathan M Davis
Sep 21, 2010
bearophile
Sep 22, 2010
bearophile
Sep 22, 2010
Jonathan M Davis
Sep 22, 2010
Jonathan M Davis
Sep 22, 2010
Kagamin
September 21, 2010
Okay, it seems that the way to read in a binary file is to use std.file.read() which reads in the file as a void[]. This immediately raises the question as to how to convert the void[] into something useful. It seems to me that casting void[]  to a ubyte[] is then the appropriate thing to do because then you can properly index it and grab the appropriate bytes that need to be converting into useful values. However, that still raises the question of how to get anything useful out of the bytes. UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine. But what about other types? Is it the correct thing to cast to T[] where T is whatever type the data represents and then index into it to get the values that you want of that type and then cast the next section of the data to U[] where U is the type for the next section of the data, etc.? Or is there a better way to handle this?

- Jonathan M Davis
September 21, 2010
Jonathan M Davis:

> UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine.

D2 string are immutable(char)[] and not char[].
Strings are UTF-8, while the raw bytes you read from a file may contain everything, so in some situations you need to use the validate function.


> But what about other types? Is it the correct thing to cast to T[] where T is whatever type the data represents and then index into it to get the values that you want of that type and then cast the next section of the data to U[] where U is the type for the next section of the data, etc.? Or is there a better way to handle this?

It's better to avoid casts when possible, and SafeD may even be restrict their usage. Take a look at the rawWrite/rawRead methods of std.stdio.File.

Bye,
bearophile
September 22, 2010
> Take a look at the rawWrite/rawRead methods of std.stdio.File.

I have just tried those a little. Python file object doesn't have a eof() method. This D2 program shows that eof() is false even when the whole file has being read, is this correct?


import std.stdio: File;
void main() {
    double[3] data = [0.5, 1.5, 2.5];
    auto f = File("test.raw", "wb");
    f.rawWrite(data);
    f.close();
    f = File("test.raw", "rb");
    assert(!f.eof());
    f.rawRead(data);
    assert(f.eof()); // Assertion failure
}

Bye,
bearophile
September 22, 2010
On Tuesday, September 21, 2010 16:41:57 bearophile wrote:
> Jonathan M Davis:
> > UTF-8 strings are easy because they're the same size as ubytes. Casting to char[] for the portion of the data that you want as a string seems to work just fine.
> 
> D2 string are immutable(char)[] and not char[].
> Strings are UTF-8, while the raw bytes you read from a file may contain
> everything, so in some situations you need to use the validate function.

Well, yes. I was talking about strings in the general sense (though UTF-8 strings), not necessarily the specific type string. The fact that you can cast to char[] makes getting strings easy, while the correct way to deal with types which aren't bytes isn't as obvious.

> 
> > But what about other types? Is it the correct thing to
> > cast to T[] where T is whatever type the data represents and then index
> > into it to get the values that you want of that type and then cast the
> > next section of the data to U[] where U is the type for the next section
> > of the data, etc.? Or is there a better way to handle this?
> 
> It's better to avoid casts when possible, and SafeD may even be restrict their usage. Take a look at the rawWrite/rawRead methods of std.stdio.File.

That does look like a better way to handle it. Thanks. Normally, I don't mess with binary files, so I'm not particularly well-versed in the correct ways to read them.

- Jonathan M Davis
September 22, 2010
On Tuesday, September 21, 2010 17:34:26 bearophile wrote:
> > Take a look at the rawWrite/rawRead methods of std.stdio.File.
> 
> I have just tried those a little. Python file object doesn't have a eof()
> method. This D2 program shows that eof() is false even when the whole file
> has being read, is this correct?
> 
> 
> import std.stdio: File;
> void main() {
>     double[3] data = [0.5, 1.5, 2.5];
>     auto f = File("test.raw", "wb");
>     f.rawWrite(data);
>     f.close();
>     f = File("test.raw", "rb");
>     assert(!f.eof());
>     f.rawRead(data);
>     assert(f.eof()); // Assertion failure
> }
> 
> Bye,
> bearophile

I believe that the typical behaviour in C and C++ is that eof() is false until you've tried to read beyond the end of the file. So, you get one more read than you might expect. You do the read, an then check eof() rather than checking eof() and then doing the read if it isn't true.

- Jonathan M Davis
September 22, 2010
Jonathan M Davis Wrote:

> Okay, it seems that the way to read in a binary file is to use std.file.read() which reads in the file as a void[]. This immediately raises the question as to how to convert the void[] into something useful.

You may like the BinaryReader interface http://msdn.microsoft.com/en-us/library/system.io.binaryreader_members.aspx
September 22, 2010
On Tue, 21 Sep 2010 19:06:43 -0400, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> Okay, it seems that the way to read in a binary file is to use std.file.read()
> which reads in the file as a void[]. This immediately raises the question as to
> how to convert the void[] into something useful. It seems to me that casting
> void[]  to a ubyte[] is then the appropriate thing to do because then you can
> properly index it and grab the appropriate bytes that need to be converting into
> useful values. However, that still raises the question of how to get anything
> useful out of the bytes. UTF-8 strings are easy because they're the same size as
> ubytes. Casting to char[] for the portion of the data that you want as a string
> seems to work just fine. But what about other types? Is it the correct thing to
> cast to T[] where T is whatever type the data represents and then index into it
> to get the values that you want of that type and then cast the next section of
> the data to U[] where U is the type for the next section of the data, etc.? Or
> is there a better way to handle this?

You can slice void arrays, even though you cannot index them.  If you know for instance that a struct S resides at the 15th byte, you can do:

(cast(S[])arr[15..$])[0];

or:

*(cast(S*)arr.ptr + 15);

there are various ways to get the data.  Only if you know the data is an *array* of a certain type is it useful to cast the entire array.

-Steve