Jump to page: 1 2
Thread overview
Re: Canonical/Idiomatic in memory files
May 28, 2013
Jonathan M Davis
May 28, 2013
Adam D. Ruppe
May 29, 2013
Adam D. Ruppe
May 29, 2013
Russel Winder
May 29, 2013
Diggory
May 29, 2013
Walter Bright
May 29, 2013
Russel Winder
May 29, 2013
Walter Bright
May 29, 2013
Paulo Pinto
May 29, 2013
Russel Winder
May 29, 2013
Paulo Pinto
May 29, 2013
Walter Bright
May 29, 2013
Jonathan M Davis
May 28, 2013
On Tuesday, May 28, 2013 16:04:32 Russel Winder wrote:
> Hi,
> 
> I think I am missing something very obvious…
> 
> For unit tests I want a std.stdio.File object that is backed by an in memory string buffer rather than an actual file on disc, be it temporary or otherwise. I think I am missing where this is documented in the documentation. Or is it actually not available?

Do you mean something like std.mmfile.MmFile which operates on a file as an array in memory using mmap, or do you mean operating on memory that has no connection to a file at all (in which case, I'm not sure why you'd want to use File)? If the former, well use std.file.MmFile. If the latter, I expect that you're out of luck. Unfortunately, std.stdio.File is currently just a wrapper around FILE and is thus limited by what you can do with FILE - that and I don't think that the File API took into consideration the possibility of operating on anything other than an actual file (and I honestly don't know what you'd be trying to do with it where it would make any sense for to operate on anything other than an actual file).

- Jonathan M Davis
May 28, 2013
On Tuesday, 28 May 2013 at 17:46:36 UTC, Jonathan M Davis wrote:
> (and I honestly don't know what
> you'd be trying to do with it where it would make any sense for to operate on anything other than an actual file).

That's easy! Suppose you have a nice library that's practically unchangeable:

SomethingHard readComplicatedFileFormat(File f);

and you'd like to embed the data in the exe. If it took string, that'd be easy:

auto t = readComplicatedFileFormat(import("my.file"));

but since it only takes File, you need some kind of adapter.
May 29, 2013
On Tue, 2013-05-28 at 13:46 -0400, Jonathan M Davis wrote: […]
> Do you mean something like std.mmfile.MmFile which operates on a file as an array in memory using mmap, or do you mean operating on memory that has no connection to a file at all (in which case, I'm not sure why you'd want to use File)? If the former, well use std.file.MmFile. If the latter, I expect that you're out of luck. Unfortunately, std.stdio.File is currently just a wrapper around FILE and is thus limited by what you can do with FILE - that and I don't think that the File API took into consideration the possibility of operating on anything other than an actual file (and I honestly don't know what you'd be trying to do with it where it would make any sense for to operate on anything other than an actual file).

Looks like I am out of luck then. :-(

The context is writing a small program that can be a filter or operate on files: actually it is the wc program. So it needs to work with opened files and stdin. That is fine (sort of). The issue comes when writing unit tests for the code: unit tests should not touch the file system, so I need a memory buffer backed std.stdio.File for the tests. A mock file in a sense.As noted earlier Go, Python, all JVM languages have such things, and it really needs to be part of D. If there really is nothing like this, I should add a JIRA issue and see if I can create a pull request later in the summer.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


May 29, 2013
On Wednesday, 29 May 2013 at 04:48:07 UTC, Russel Winder wrote:
> On Tue, 2013-05-28 at 13:46 -0400, Jonathan M Davis wrote:
> […]
>> Do you mean something like std.mmfile.MmFile which operates on a file as an array in memory using mmap, or do you mean operating on memory that has no connection to a file at all (in which case, I'm not sure why you'd want to use File)? If the former, well use std.file.MmFile. If the latter, I expect that you're out of luck. Unfortunately, std.stdio.File is currently just a wrapper around FILE and is thus limited by what you can do with FILE - that and I don't think that the File API took into consideration the possibility of operating on anything other than an actual file (and I honestly don't know what you'd be trying to do with it where it would make any sense for to operate on anything other than an actual file).
>
> Looks like I am out of luck then. :-(
>
> The context is writing a small program that can be a filter or operate
> on files: actually it is the wc program. So it needs to work with opened
> files and stdin. That is fine (sort of). The issue comes when writing
> unit tests for the code: unit tests should not touch the file system, so
> I need a memory buffer backed std.stdio.File for the tests. A mock file
> in a sense.As noted earlier Go, Python, all JVM languages have such
> things, and it really needs to be part of D. If there really is nothing
> like this, I should add a JIRA issue and see if I can create a pull
> request later in the summer.

It could be implemented in phobos fairly easily - on posix there is "fmemopen" or "open_memstream" to get a FILE* from a memory buffer. On windows there is "CreateFileMapping" in conjunction with "MapViewOfFile". The semantics are slightly different but they both allow the functionality you are after.
May 29, 2013
On 5/28/2013 9:47 PM, Russel Winder wrote:
> Looks like I am out of luck then. :-(

Not at all.

> The context is writing a small program that can be a filter or operate
> on files: actually it is the wc program. So it needs to work with opened
> files and stdin. That is fine (sort of). The issue comes when writing
> unit tests for the code: unit tests should not touch the file system, so
> I need a memory buffer backed std.stdio.File for the tests. A mock file
> in a sense.As noted earlier Go, Python, all JVM languages have such
> things, and it really needs to be part of D. If there really is nothing
> like this, I should add a JIRA issue and see if I can create a pull
> request later in the summer.

Coincidentally, I wrote a wc program a year ago:
-------------------------
import std.stdio, std.file, std.string, std.array, std.algorithm, std.typecons;
import lazysplit;

alias Tuple!(int, "lines", int, "words", int, "chars") Lwc;

void main(string[] args) {
    writeln("   lines   words   bytes file");

    auto total = args[1 .. args.length].wctotal();

    if (args.length > 2)
        writefln("--------------------------------------\n%8s%8s%8s total",
            total[0..3]);
}

auto wctotal(R)(R args) {
    Lwc total;
    foreach (arg; args) {
        auto t = arg.File().byLine(KeepTerminator.yes).wc();

        writefln("%8s%8s%8s %s", t[0..3], arg);

        foreach(i, v; t)
            total[i] += v;
    }
    return total;
}

auto wc(R)(R r) {
    Lwc t;
    foreach (line; r) {
        t.lines += 1;
        t.words += line.lazySplit().count();
        t.chars += line.length;
    }
    return t;
}
--------------------------------------

Just replace "arg.File().byLine(KeepTerminator.yes)" with a string filled with your mocked data.
May 29, 2013
On Wednesday, May 29, 2013 05:47:55 Russel Winder wrote:
> On Tue, 2013-05-28 at 13:46 -0400, Jonathan M Davis wrote: […]
> 
> > Do you mean something like std.mmfile.MmFile which operates on a file as
> > an
> > array in memory using mmap, or do you mean operating on memory that has no
> > connection to a file at all (in which case, I'm not sure why you'd want to
> > use File)? If the former, well use std.file.MmFile. If the latter, I
> > expect that you're out of luck. Unfortunately, std.stdio.File is
> > currently just a wrapper around FILE and is thus limited by what you can
> > do with FILE - that and I don't think that the File API took into
> > consideration the possibility of operating on anything other than an
> > actual file (and I honestly don't know what you'd be trying to do with it
> > where it would make any sense for to operate on anything other than an
> > actual file).
> 
> Looks like I am out of luck then. :-(
> 
> The context is writing a small program that can be a filter or operate on files: actually it is the wc program. So it needs to work with opened files and stdin. That is fine (sort of). The issue comes when writing unit tests for the code: unit tests should not touch the file system, so I need a memory buffer backed std.stdio.File for the tests. A mock file in a sense.As noted earlier Go, Python, all JVM languages have such things, and it really needs to be part of D. If there really is nothing like this, I should add a JIRA issue and see if I can create a pull request later in the summer.

A replacement for std.stdio is in the works, and it will likely have this sort of thing in it (certainly, it will have the stuff necessary for enabling streams and the like for std I/O). Unfortunately, the person who is working on it has been very busy, so it's not ready yet. But there's no question that we want to overhaul std.stdio. And part of that effort will be adding the stuff which will replace the various stream modules in Phobos.

- Jonathan M Davis
May 29, 2013
On Tue, 2013-05-28 at 22:33 -0700, Walter Bright wrote: […]
> Coincidentally, I wrote a wc program a year ago:

I note the one on the D web site could do with being made more idiomatic. cf. http://dlang.org/wc.html

> -------------------------
> import std.stdio, std.file, std.string, std.array, std.algorithm, std.typecons; import lazysplit;

As far as I know currently (which may mean I am very wrong), D imports import all symbols defined in the module into the current name space. This is like "star imports" in Python and Java which are now seen as not the right thing to do. Python's default is to import the namespace not the symbols in it and many believe this is the right thing to do.

> alias Tuple!(int, "lines", int, "words", int, "chars") Lwc;

I used an int[3] for this, but I like the labelled tuple. Is this really just a three item dictionary though. Might a dictionary be a better structure for this?

> void main(string[] args) {
>      writeln("   lines   words   bytes file");
> 
>      auto total = args[1 .. args.length].wctotal();
> 
>      if (args.length > 2)
>          writefln("--------------------------------------\n%8s%8s%8s total",
>              total[0..3]);
> }

To emulate /usr/bin/wc, I dispensed with the - sequence. Less code ;-)

Is there an idiom of when to use 1..args.length and when to use 1..$ ?

> auto wctotal(R)(R args) {
>      Lwc total;
>      foreach (arg; args) {
>          auto t = arg.File().byLine(KeepTerminator.yes).wc();
> 
>          writefln("%8s%8s%8s %s", t[0..3], arg);
> 
>          foreach(i, v; t)
>              total[i] += v;
>      }
>      return total;
> }

Why a template? R is always string? The above does not cope with a parameter being "-" to indicate use the stdin.

> auto wc(R)(R r) {
>      Lwc t;
>      foreach (line; r) {
>          t.lines += 1;
>          t.words += line.lazySplit().count();
>          t.chars += line.length;
>      }
>      return t;
> }

The body of this function looks almost, but not quite, exactly like mine :-)

> Just replace "arg.File().byLine(KeepTerminator.yes)" with a string filled with your mocked data.

No that is not acceptable, the code under test must remain unchanged in order to be tested.

I have been having trouble with UFCS: x.f() has not been compiling I have had to use f(x). There must be a reason why rdmd and ldc have not allowed me to use whichever of the forms I want, I will have to investigate further to create a smaller exemplar of the issue.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


May 29, 2013
On 5/29/2013 12:02 AM, Russel Winder wrote:
> On Tue, 2013-05-28 at 22:33 -0700, Walter Bright wrote:
> […]
>> Coincidentally, I wrote a wc program a year ago:
>
> I note the one on the D web site could do with being made more
> idiomatic. cf. http://dlang.org/wc.html

Yup.


>> -------------------------
>> import std.stdio, std.file, std.string, std.array, std.algorithm, std.typecons;
>> import lazysplit;
>
> As far as I know currently (which may mean I am very wrong), D imports
> import all symbols defined in the module into the current name space.

This is incorrect. They are implemented as sort of "second class citizens" in the current name space. This means that any declaration in the current name space overrides any in the import name space. If the name is not found in the current name space, and is found in more than one import, an ambiguity error is generated.


> This is like "star imports" in Python and Java which are now seen as not
> the right thing to do. Python's default is to import the namespace not
> the symbols in it and many believe this is the right thing to do.

D imports also allow "cherry-picking" of individual names out of it.

>> alias Tuple!(int, "lines", int, "words", int, "chars") Lwc;
>
> I used an int[3] for this, but I like the labelled tuple. Is this really
> just a three item dictionary though. Might a dictionary be a better
> structure for this?

It's equivalent to:
   struct Tuple { int lines; int words; int chars; }
which is much more efficient than a dictionary.


>> void main(string[] args) {
>>       writeln("   lines   words   bytes file");
>>
>>       auto total = args[1 .. args.length].wctotal();
>>
>>       if (args.length > 2)
>>           writefln("--------------------------------------\n%8s%8s%8s total",
>>               total[0..3]);
>> }
>
> To emulate /usr/bin/wc, I dispensed with the - sequence. Less code ;-)
>
> Is there an idiom of when to use 1..args.length and when to use 1..$ ?

Not really.


>> auto wctotal(R)(R args) {
>>       Lwc total;
>>       foreach (arg; args) {
>>           auto t = arg.File().byLine(KeepTerminator.yes).wc();
>>
>>           writefln("%8s%8s%8s %s", t[0..3], arg);
>>
>>           foreach(i, v; t)
>>               total[i] += v;
>>       }
>>       return total;
>> }
>
> Why a template? R is always string?

It could be a string, wstring or a dstring.

> The above does not cope with a
> parameter being "-" to indicate use the stdin.

That's true. For stdin, you'd replace "arg.File()" with "stdin".

>> auto wc(R)(R r) {
>>       Lwc t;
>>       foreach (line; r) {
>>           t.lines += 1;
>>           t.words += line.lazySplit().count();
>>           t.chars += line.length;
>>       }
>>       return t;
>> }
>
> The body of this function looks almost, but not quite, exactly like
> mine :-)
>
>> Just replace "arg.File().byLine(KeepTerminator.yes)" with a string filled with
>> your mocked data.
>
> No that is not acceptable, the code under test must remain unchanged in
> order to be tested.

You just need to get the component programming religion! and get away from using FILE*. There isn't anything fundamentally different from using a fake FILE* and using a template with a different InputRange. If that's still unacceptable, you can create an InputRange that is a class with virtual functions empty(), front(), and popFront(), then use derived classes for the File or string.

May 29, 2013
On Wednesday, 29 May 2013 at 07:32:53 UTC, Walter Bright wrote:
> On 5/29/2013 12:02 AM, Russel Winder wrote:
>> On Tue, 2013-05-28 at 22:33 -0700, Walter Bright wrote:
>> […]
>
> You just need to get the component programming religion! and get away from using FILE*. There isn't anything fundamentally different from using a fake FILE* and using a template with a different InputRange. If that's still unacceptable, you can create an InputRange that is a class with virtual functions empty(), front(), and popFront(), then use derived classes for the File or string.

One of the things that made me a fan of OO, was when I understood
the types of file manipulations that were possible with the IO abstractions
available in most languages in comparison with was is possible in
a pure procedural world.

Sure iostreams, Java IO, .NET, Smalltalk might offer complex IO models, but
they are quite powerful for doing generic code over abstract data sources.

--
Paulo
May 29, 2013
On Wed, 2013-05-29 at 00:32 -0700, Walter Bright wrote: […]
> This is incorrect. They are implemented as sort of "second class citizens" in the current name space. This means that any declaration in the current name space overrides any in the import name space. If the name is not found in the current name space, and is found in more than one import, an ambiguity error is generated.

OK so this is a quasi-namespace import which helps. Dealing with multiple names in imports by just creating a compile error stops tragedy but seems a little awkward. It does reinforce my belief in individual importing of names though.

[…]
> It's equivalent to:
>     struct Tuple { int lines; int words; int chars; }
> which is much more efficient than a dictionary.

Interesting. I probably should already have known this. I suspect further exchanges on this should move the to learn mailing list!

[…]
> You just need to get the component programming religion! and get away from using FILE*. There isn't anything fundamentally different from using a fake FILE* and using a template with a different InputRange. If that's still unacceptable, you can create an InputRange that is a class with virtual functions empty(), front(), and popFront(), then use derived classes for the File or string.

I'm not sure I would call it component – didn't components die in the 1980 when no-one could agree what a component was?

I had missed that there was the possibility of polymorphism over sequence of strings and open file. Which is silly as that is exactly what you can do in Python and which I do a lot. Doh moment.

Thanks.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


« First   ‹ Prev
1 2