Jump to page: 1 2
Thread overview
[Issue 4572] New: std.file.read return type
Aug 03, 2010
David Simcha
Aug 03, 2010
nfxjfg@gmail.com
Aug 03, 2010
nfxjfg@gmail.com
Aug 04, 2010
nfxjfg@gmail.com
Oct 15, 2010
Sobirari Muhomori
August 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572

           Summary: std.file.read return type
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: bearophile_hugs@eml.cc


--- Comment #0 from bearophile_hugs@eml.cc 2010-08-02 17:25:39 PDT ---
I am unsure about this, but I think that it is better for std.file.read() to return something like a ubyte[] instead of a void[].

It's a problem to perform a cast(ubyte[]) in SafeD:
auto data = cast(ubyte[])std.file.read();

Or maybe in SafeD I have to use other safer functions to load binary data, like slurp() or something similar.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572


David Simcha <dsimcha@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dsimcha@yahoo.com


--- Comment #1 from David Simcha <dsimcha@yahoo.com> 2010-08-03 05:59:07 PDT ---
I agree, and here's another reason to add to the list:

void[] implies an array that may contain pointers.  I just looked at the impl. of std.file and was surprised to learn that it allocates the array as not containing pointers via GC.malloc.  However, when you do a new void[N], you get an array that may contain pointers.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572


nfxjfg@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg@gmail.com


--- Comment #2 from nfxjfg@gmail.com 2010-08-03 07:15:34 PDT ---
I say using the void[] type is a design error in this case. void[] is just a safer void*. void[] should only be used in cases when you have to reference a specific region of memory. You use it because the type system is not expressive enough to describe the actual type (see Variant and so on).

But in this case, it's very clear what the actual data type is: it's an array of bytes. File contents are nothing else than untyped bag of bytes. Thus the return type should be ubyte[].

Note that reinterpret casting ubyte[] slices to other arrays or structs is not clean: there are issues of byte order and padding. This too doesn't apply to void[]. There's a clear difference.

Tange makes the same error.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572



--- Comment #3 from bearophile_hugs@eml.cc 2010-08-03 09:09:48 PDT ---
Thank you David and nfxjfg :-)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 03, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572



--- Comment #4 from nfxjfg@gmail.com 2010-08-03 09:22:36 PDT ---
One can add that .dup-ing a void[] will make all the precaution in std.file.read of allocating the void[] with GC.BlkAttr.NO_SCAN useless. The dup'ed array won't have the NO_SCAN flag set. It really shows that void[] is not the natural type to use here.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 04, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572


Steven Schveighoffer <schveiguy@yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy@yahoo.com


--- Comment #5 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-04 05:24:44 PDT ---
(In reply to comment #4)
> One can add that .dup-ing a void[] will make all the precaution in std.file.read of allocating the void[] with GC.BlkAttr.NO_SCAN useless. The dup'ed array won't have the NO_SCAN flag set. It really shows that void[] is not the natural type to use here.

any array operation which copies a block to another copies the flags from the original array block.  So the NO_SCAN flag should persist.  If it doesn't, that's a bug (looking at it, dup is the only function that doesn't copy the block attributes, I'll address this on the mailing list).

I think the void[] type is more consistent than ubyte.

Consider two things.  First, a read where the array is provided by the caller. If this function is typed:

ubyte[] read(ubyte[] into);

then, you must cast your data to a ubyte[].  But void[] can be implicitly cast to from anything, so void[] wins here.

Second, a write operation:

int write(ubyte[] from);

Again, cast is necessary where void[] would not require it.

To be sure std.file.read() could return a ubyte[], and unless you intend to actually deal with it as a ubyte[], you need to cast to the type you need.  But shouldn't it be consistent with other operations?

But we can forgo all this stuff if we add a template parameter to read, so you can specify exactly the type you want.  If you know your file is a bunch of int's, you could do:

std.file.read!(int)();

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 04, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572



--- Comment #6 from bearophile_hugs@eml.cc 2010-08-04 05:52:31 PDT ---
Thank you for your answer Steven.

> then, you must cast your data to a ubyte[].  But void[] can be implicitly cast to from anything, so void[] wins here.

If you compile this program with dmd 2.047:

void main() {
    auto v = new void[10];
    ubyte a1, a2;
    a1 = v;
    v = a2;
}


It produces the errors:
test.d(4): Error: cannot implicitly convert expression (v) of type void[] to
ubyte
test.d(5): Error: cannot implicitly convert expression (a2) of type ubyte to
void[]


> But we can forgo all this stuff if we add a template parameter to read, so you
> can specify exactly the type you want.  If you know your file is a bunch of
> int's, you could do:
> std.file.read!(int)();

It seems a good idea. So I presume std.file.read!(int[])(); reads a matrix.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 04, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572



--- Comment #7 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-04 06:02:21 PDT ---
(In reply to comment #6)
> Thank you for your answer Steven.
> 
> > then, you must cast your data to a ubyte[].  But void[] can be implicitly cast to from anything, so void[] wins here.
> 
> If you compile this program with dmd 2.047:
> 
> void main() {
>     auto v = new void[10];
>     ubyte a1, a2;
>     a1 = v;
>     v = a2;
> }
> 
> 
> It produces the errors:
> test.d(4): Error: cannot implicitly convert expression (v) of type void[] to
> ubyte
> test.d(5): Error: cannot implicitly convert expression (a2) of type ubyte to
> void[]

Yes, D does not convert non-arrays to void[].  You can do something like this:

v = (&a2)[0..1];

which is typically what is done.  ubyte[] provides no automatic conversion from anything, including other array types.

> > But we can forgo all this stuff if we add a template parameter to read, so you
> > can specify exactly the type you want.  If you know your file is a bunch of
> > int's, you could do:
> > std.file.read!(int)();
> 
> It seems a good idea. So I presume std.file.read!(int[])(); reads a matrix.

Well, I would assume it would return an int[][], which probably would mean nothing since arrays are pointer/length values, and any pointer/length values read from a file would be bogus.  I'd say read should reject reading elements that have references in them.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 04, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572



--- Comment #8 from bearophile_hugs@eml.cc 2010-08-04 08:20:32 PDT ---
Steven Schveighoffer:

> Well, I would assume it would return an int[][], which probably would mean nothing since arrays are pointer/length values, and any pointer/length values read from a file would be bogus.  I'd say read should reject reading elements that have references in them.

Yes, you are right, I probably meant a static array, so this reads a dynamic
array of ints:
std.file.read!(int[])();
A static array of two ints:
std.file.read!(int[2])();
A static array of static array of ints (the memory of a fixed-size matrix is
contiguous in D):
std.file.read!(int[2][5])();

The problem is that such syntax doesn't read the items of the static array in-place, the result is a static array that gets copied. So you need an argument by ref:

int[2][5] m;
std.file.read(T)(ref T m);

I don't like this a lot.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 04, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4572



--- Comment #9 from nfxjfg@gmail.com 2010-08-04 09:23:45 PDT ---
(In reply to comment #5)
> any array operation which copies a block to another copies the flags from the original array block.  So the NO_SCAN flag should persist.  If it doesn't, that's a bug (looking at it, dup is the only function that doesn't copy the block attributes, I'll address this on the mailing list).

Concatenating arrays doesn't preserve the bit either. And if the user allocates the void[], the NO_SCAN bit won't be set anyway. The user must remember to allocate a ubyte[] array to get it right. How many users will even understand why? (Also if you copy one array into another, flags won't be preserved. If you allocate the void[] with C-malloc, lifetime.d won't do the right thing either.)

Let me repeat: the only reason why you want to use void[] here is because void[] implicitly converts to any other array type.

But doing that is bogus anyway. If you convert it to an int[], you implicitly assume the data stored on the disk is in the same endian as your machine's. If it's an array of struct, you assume there are no padding issues (or further byte order issues). If it's a char[], you assume it's valid utf-8.

No really, there should be specialized functions for proper I/O with certain types. In no case should the user be encouraged to do the false thing.

void[] encourages to do 2 false things: 1. allocating the incorrect type, and 2. reading/writing data in a machine specific format.

I think it's a good thing to remind the user of 2. by forcing him to explicitly cast the array. Yes, dear user, you ARE reinterpret casting your data to a bunch of bytes!

I guess Andrei will remove the usage of void[] anyway because of his beloved Java subset of D. void[] would allow aliasing pointers with integers, which is a no-no in SafeD.

> But we can forgo all this stuff if we add a template parameter to read, so you can specify exactly the type you want.  If you know your file is a bunch of int's, you could do:
> 
> std.file.read!(int)();

I guess nothing can stop the devs from killing Phobos with even more template bloat. Your function call signature forgets anything about endians, though.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2