Jump to page: 1 2
Thread overview
randomIO, std.file, core.stdc.stdio
Jul 25, 2016
Charles Hixson
Jul 26, 2016
ketmar
Jul 26, 2016
Charles Hixson
Jul 26, 2016
ketmar
Jul 26, 2016
Charles Hixson
Jul 26, 2016
ketmar
Jul 26, 2016
Charles Hixson
Jul 27, 2016
ketmar
Jul 26, 2016
Charles Hixson
Jul 26, 2016
Charles Hixson
Jul 26, 2016
Charles Hixson
Jul 26, 2016
Adam D. Ruppe
Jul 27, 2016
Charles Hixson
Jul 27, 2016
Rene Zwanenburg
Jul 27, 2016
Charles Hixson
July 25, 2016
Are there reasons why one would use rawRead and rawWrite rather than fread and fwrite when doiing binary random io?  What are the advantages?

In particular, if one is reading and writing structs rather than arrays or ranges, are there any advantages?

July 26, 2016
On Monday, 25 July 2016 at 18:54:27 UTC, Charles Hixson wrote:
> Are there reasons why one would use rawRead and rawWrite rather than fread and fwrite when doiing binary random io?  What are the advantages?
>
> In particular, if one is reading and writing structs rather than arrays or ranges, are there any advantages?

yes: keeping API consistent. ;-)

for example, my stream i/o modules works with anything that has `rawRead`/`rawWrite` methods, but don't bother to check for any other.

besides, `rawRead` is just looks cleaner, even with all `(&a)[0..1])` noise.

so, a question of style.
July 25, 2016
On 07/25/2016 05:18 PM, ketmar via Digitalmars-d-learn wrote:
> On Monday, 25 July 2016 at 18:54:27 UTC, Charles Hixson wrote:
>> Are there reasons why one would use rawRead and rawWrite rather than fread and fwrite when doiing binary random io?  What are the advantages?
>>
>> In particular, if one is reading and writing structs rather than arrays or ranges, are there any advantages?
>
> yes: keeping API consistent. ;-)
>
> for example, my stream i/o modules works with anything that has `rawRead`/`rawWrite` methods, but don't bother to check for any other.
>
> besides, `rawRead` is just looks cleaner, even with all `(&a)[0..1])` noise.
>
> so, a question of style.
>
OK.  If it's just a question of "looking cleaner" and "style", then I will prefer the core.stdc.stdio approach.  I find it's appearance extremely much cleaner...except that that's understating things. I'll probably wrap those routines in a struct to ensure things like files being properly closed, and not have explicit pointers persisting over large areas of code.

(I said a lot more, but it was just a rant about how ugly I find rawRead/rawWrite syntax, so I deleted it.)
July 26, 2016
On Tuesday, 26 July 2016 at 01:19:49 UTC, Charles Hixson wrote:
> then I will prefer the core.stdc.stdio approach.  I find it's appearance extremely much cleaner...

only if you are really used to write C code. when you see pointer, or explicit type size argument in D, it is a sign of C disease.

> I'll probably wrap those routines in a struct to ensure things like files being properly closed, and not have explicit pointers persisting over large areas of code.

exactly what std.stdio.File did! ;-)
July 25, 2016
On 07/25/2016 07:11 PM, ketmar via Digitalmars-d-learn wrote:
> On Tuesday, 26 July 2016 at 01:19:49 UTC, Charles Hixson wrote:
>> then I will prefer the core.stdc.stdio approach.  I find it's appearance extremely much cleaner...
>
> only if you are really used to write C code. when you see pointer, or explicit type size argument in D, it is a sign of C disease.
>
>> I'll probably wrap those routines in a struct to ensure things like files being properly closed, and not have explicit pointers persisting over large areas of code.
>
> exactly what std.stdio.File did! ;-)
>
Yes, but I really despise the syntax they came up with.  It's probably good if most of your I/O is ranges, but mine hasn't yet ever been.  (Combining ranges with random I/O?)
July 26, 2016
On Tuesday, 26 July 2016 at 04:05:22 UTC, Charles Hixson wrote:
> Yes, but I really despise the syntax they came up with.  It's probably good if most of your I/O is ranges, but mine hasn't yet ever been.  (Combining ranges with random I/O?)

that's why i wrote iv.stream, and then iv.vfs, with convenient things like `readNum!T`, for example. you absolutely don't need to reimplement the whole std.stdio.File if all you need it better API. thanks to UFCS, you can write your new API as free functions accepting std.stdio.File as first arg. or even generic stream, like i did in iv.stream:


enum isReadableStream(T) = is(typeof((inout int=0) {
  auto t = T.init;
  ubyte[1] b;
  auto v = cast(void[])b;
  t.rawRead(v);
}));

enum isWriteableStream(T) = is(typeof((inout int=0) {
  auto t = T.init;
  ubyte[1] b;
  t.rawWrite(cast(void[])b);
}));

T readInt(T : ulong, ST) (auto ref ST st) if (isReadableStream!ST) {
  T res;
  ubyte* b = cast(ubyte*)&res;
  foreach (immutable idx; 0..T.sizeof) {
    if (st.rawRead(b[idx..idx+1]).length != 1) throw new Exception("read error");
  }
  return res;
}


and then:
  auto fl = File("myfile");
  auto i = fl.readInt!uint;

something like that.
July 26, 2016
On 7/25/16 9:19 PM, Charles Hixson via Digitalmars-d-learn wrote:
> On 07/25/2016 05:18 PM, ketmar via Digitalmars-d-learn wrote:
>> On Monday, 25 July 2016 at 18:54:27 UTC, Charles Hixson wrote:
>>> Are there reasons why one would use rawRead and rawWrite rather than
>>> fread and fwrite when doiing binary random io?  What are the advantages?
>>>
>>> In particular, if one is reading and writing structs rather than
>>> arrays or ranges, are there any advantages?
>>
>> yes: keeping API consistent. ;-)
>>
>> for example, my stream i/o modules works with anything that has
>> `rawRead`/`rawWrite` methods, but don't bother to check for any other.
>>
>> besides, `rawRead` is just looks cleaner, even with all `(&a)[0..1])`
>> noise.
>>
>> so, a question of style.
>>
> OK.  If it's just a question of "looking cleaner" and "style", then I
> will prefer the core.stdc.stdio approach.  I find it's appearance
> extremely much cleaner...except that that's understating things. I'll
> probably wrap those routines in a struct to ensure things like files
> being properly closed, and not have explicit pointers persisting over
> large areas of code.

It's more than just that. Having a bounded array is safer than a pointer/length separated parameters. Literally, rawRead and rawWrite are inferred @safe, whereas fread and fwrite are not.

But D is so nice with UFCS, you don't have to live with APIs you don't like. Allow me to suggest adding a helper function to your code:

rawReadItem(T)(File f, ref T item) @trusted
{
   f.rawRead(&item[0 .. 1]);
}

-Steve
July 26, 2016
On 07/25/2016 09:22 PM, ketmar via Digitalmars-d-learn wrote:
> On Tuesday, 26 July 2016 at 04:05:22 UTC, Charles Hixson wrote:
>> Yes, but I really despise the syntax they came up with.  It's probably good if most of your I/O is ranges, but mine hasn't yet ever been.  (Combining ranges with random I/O?)
>
> that's why i wrote iv.stream, and then iv.vfs, with convenient things like `readNum!T`, for example. you absolutely don't need to reimplement the whole std.stdio.File if all you need it better API. thanks to UFCS, you can write your new API as free functions accepting std.stdio.File as first arg. or even generic stream, like i did in iv.stream:
>
>
> enum isReadableStream(T) = is(typeof((inout int=0) {
>   auto t = T.init;
>   ubyte[1] b;
>   auto v = cast(void[])b;
>   t.rawRead(v);
> }));
>
> enum isWriteableStream(T) = is(typeof((inout int=0) {
>   auto t = T.init;
>   ubyte[1] b;
>   t.rawWrite(cast(void[])b);
> }));
>
> T readInt(T : ulong, ST) (auto ref ST st) if (isReadableStream!ST) {
>   T res;
>   ubyte* b = cast(ubyte*)&res;
>   foreach (immutable idx; 0..T.sizeof) {
>     if (st.rawRead(b[idx..idx+1]).length != 1) throw new Exception("read error");
>   }
>   return res;
> }
>
>
> and then:
>   auto fl = File("myfile");
>   auto i = fl.readInt!uint;
>
> something like that.
>
That's sort of what I have in mind, but I want to do what in Fortran would be (would have been?) called record I/O, except that I want a file header that specifies a few things like magic number, records allocated, head of free list, etc.  In practice I don't see any need for record size not known at compile time...except that if there are different
versions of the program, they might include different things, so, e.g., the size of the file header might need to be variable.

This is a design problem I'm still trying to wrap my head around. Efficiency seems to say "you need to know the size at compile time", but flexibility says "you can't depend on the size at compile time".  The only compromise position seems to compromise safety (by depending on void * and record size parameters that aren't guaranteed safe).  I'll probably eventually decide in favor of "size fixed at compile time", but I'm still dithering.  But clearly efficiency dictates that the read size not be a basic type.  I'm currently thinking of a struct that's about 1 KB in size.  As far as the I/O routines are concerned this will probably all be uninterpreted bytes, unless I throw in some sequencing for error recovery...but that's probably making things too complex, and should be left for a higher level.

Clearly this is a bit of a specialized case, so I wouldn't be considering implementing all of stdio, only the relevant bits, and those wrapped with an interpretation based around record number.

The thing is, I'd probably be writing this wrapper anyway, what I was wondering originally is whether there was any reason to use std.file as the underlying library rather than going directly to core.stdc.stdio.
July 26, 2016
On 07/26/2016 05:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:
> On 7/25/16 9:19 PM, Charles Hixson via Digitalmars-d-learn wrote:
>> On 07/25/2016 05:18 PM, ketmar via Digitalmars-d-learn wrote:
>>> On Monday, 25 July 2016 at 18:54:27 UTC, Charles Hixson wrote:
>>>> Are there reasons why one would use rawRead and rawWrite rather than
>>>> fread and fwrite when doiing binary random io?  What are the advantages?
>>>>
>>>> In particular, if one is reading and writing structs rather than
>>>> arrays or ranges, are there any advantages?
>>>
>>> yes: keeping API consistent. ;-)
>>>
>>> for example, my stream i/o modules works with anything that has
>>> `rawRead`/`rawWrite` methods, but don't bother to check for any other.
>>>
>>> besides, `rawRead` is just looks cleaner, even with all `(&a)[0..1])`
>>> noise.
>>>
>>> so, a question of style.
>>>
>> OK.  If it's just a question of "looking cleaner" and "style", then I
>> will prefer the core.stdc.stdio approach.  I find it's appearance
>> extremely much cleaner...except that that's understating things. I'll
>> probably wrap those routines in a struct to ensure things like files
>> being properly closed, and not have explicit pointers persisting over
>> large areas of code.
>
> It's more than just that. Having a bounded array is safer than a pointer/length separated parameters. Literally, rawRead and rawWrite are inferred @safe, whereas fread and fwrite are not.
>
> But D is so nice with UFCS, you don't have to live with APIs you don't like. Allow me to suggest adding a helper function to your code:
>
> rawReadItem(T)(File f, ref T item) @trusted
> {
>    f.rawRead(&item[0 .. 1]);
> }
>
> -Steve
>
That *does* make the syntax a lot nicer, and I understand the safety advantage of not using pointer/length separated parameters.  But I'm going to be wrapping the I/O anyway, and the external interface is going to be more like:
struct RF (T, long magic)
{
....
void read (size_t recNo, ref T val){...}
size_t read (ref T val){...}
...
}
where a sequential read returns the record number, or you specify the record number and get an indexedIO read.  So the length with be T.sizeof, and will be specified at the time the file is opened.  To me this seems to eliminate the advantage of stdfile, and stdfile seems to add a level of indirection.

Ranges aren't free, are they? If so then I should probably use stdfile, because that is probably less likely to change than core.stdc.stdio.  When I see "f.rawRead(&item[0 .. 1])" it looks to me as if unneeded code is being generated explictly to be thrown away.  (I don't like using pointer/length either, but it's actually easier to understand than this kind of thing, and this LOOKS like it's generating extra code.)

That said, perhaps I should use stdio anyway.  When doing I/O it's the disk speed that's the really slow part, and that so dominates things that worrying about trivialities is foolish.  And since it's going to be wrapped anyway, the ugly will be confined to a very small routine.
July 26, 2016
On 7/26/16 12:58 PM, Charles Hixson via Digitalmars-d-learn wrote:

> Ranges aren't free, are they? If so then I should probably use stdfile,
> because that is probably less likely to change than core.stdc.stdio.

Do you mean slices?

> When I see "f.rawRead(&item[0 .. 1])" it looks to me as if unneeded code
> is being generated explictly to be thrown away.  (I don't like using
> pointer/length either, but it's actually easier to understand than this
> kind of thing, and this LOOKS like it's generating extra code.)

This is probably a misunderstanding on your part.

&item is accessing the item as a pointer. Since the compiler already has it as a reference, this is a noop -- just an expression to change the type.

[0 .. 1] is constructing a slice out of a pointer. It's done all inline by the compiler (there is no special _d_constructSlice function), so that is very very quick. There is no bounds checking, because pointers do not have bounds checks.

So there is pretty much zero overhead for this. Just push the pointer and length onto the stack (or registers, not sure of ABI), and call rawRead.

> That said, perhaps I should use stdio anyway.  When doing I/O it's the
> disk speed that's the really slow part, and that so dominates things
> that worrying about trivialities is foolish.  And since it's going to be
> wrapped anyway, the ugly will be confined to a very small routine.

Having written a very templated io library (https://github.com/schveiguy/iopipe), I can tell you that in my experience, the slowdown comes from 2 things: 1) spending time calling the kernel, and 2) not being able to inline.

This of course assumes that proper buffering is done. Buffering should mitigate most of the slowdown from the disk. It is expensive, but you amortize the expense by buffering.

C's i/o is pretty much as good as it gets for an opaque non-inlinable system, as long as your requirements are simple enough. The std.stdio code should basically inline into the calls you should be making, and it handles a bunch of stuff that optimizes the calls (such as locking the file handle for one complex operation).

-Steve
« First   ‹ Prev
1 2