Jump to page: 1 2
Thread overview
December 11
I'd like to read from a file, one byte at a time, without loading the whole file in memory.

I was hoping I could do something like

   auto f = File("somefile");
   foreach(c; f.byChar) {
       process(c);
   }

but there appears to be no such way to do it anymore. Instead, the stdlib seems to provide several functions to do chunked reads from the file where I have to manually manage the buffer. I see that D1 had a stream, but it's no longer here and I understand ranges are supposed to be used instead.

What's the explanation here? Why is there no more stream and what am I supposed to use instead? Do I really need to be manually managing the read buffer myself?
December 11
On 12/11/17 3:51 PM, Jordi Gutiérrez Hermoso wrote:
> I'd like to read from a file, one byte at a time, without loading the whole file in memory.
> 
> I was hoping I could do something like
> 
>     auto f = File("somefile");
>     foreach(c; f.byChar) {
>         process(c);
>     }
> 
> but there appears to be no such way to do it anymore. Instead, the stdlib seems to provide several functions to do chunked reads from the file where I have to manually manage the buffer. I see that D1 had a stream, but it's no longer here and I understand ranges are supposed to be used instead.
> 
> What's the explanation here? Why is there no more stream and what am I supposed to use instead? Do I really need to be manually managing the read buffer myself?

Use the undead repository:

http://code.dlang.org/packages/undead

https://github.com/dlang/undeaD

https://github.com/dlang/undeaD/blob/master/src/undead/stream.d

-Steve
December 11
On Monday, 11 December 2017 at 21:21:51 UTC, Steven Schveighoffer wrote:
> Use the undead repository:

Wow, really? Is the removal of stream from D some kind of error that hasn't been corrected yet?
December 11
On 12/11/17 5:58 PM, Jordi Gutiérrez Hermoso wrote:
> On Monday, 11 December 2017 at 21:21:51 UTC, Steven Schveighoffer wrote:
>> Use the undead repository:
> 
> Wow, really? Is the removal of stream from D some kind of error that hasn't been corrected yet?

No, it was removed because it was considered subpar/obsolete. It doesn't jive with the rest of Phobos.

But modules that are removed are put into the undead repository for those who wish to continue using it.

-Steve
December 11
On Monday, December 11, 2017 22:58:53 Jordi Gutiérrez Hermoso via Digitalmars-d-learn wrote:
> On Monday, 11 December 2017 at 21:21:51 UTC, Steven Schveighoffer
>
> wrote:
> > Use the undead repository:
> Wow, really? Is the removal of stream from D some kind of error that hasn't been corrected yet?

std.stream was deemed to not be up to Phobos' current standards, and it's not in line with Phobos' current design and implementation (most notably, it doesn't support ranges at all). No one has cared enough to come up with an alternative implementation and propose it for inclusion in Phobos. However, for most needs, ranges do what you might do with a stream solution, and std.bitmanip provides useful functions for byte-level manipulation (e.g. taking the first elements from a range of ubytes and converting them to int). Depending on what you're looking for, http://code.dlang.org/packages/iopipe could also fit in quite well, though it's very much a work in progress, and there are several serialization libraries on code.dlang.org if that's more what you're looking for.

- Jonathan M Davis


December 11
On Monday, 11 December 2017 at 20:51:41 UTC, Jordi Gutiérrez Hermoso wrote:
> I'd like to read from a file, one byte at a time, without loading the whole file in memory.
>
> I was hoping I could do something like
>
>    auto f = File("somefile");
>    foreach(c; f.byChar) {
>        process(c);
>    }
>
> but there appears to be no such way to do it anymore. Instead, the stdlib seems to provide several functions to do chunked reads from the file where I have to manually manage the buffer. I see that D1 had a stream, but it's no longer here and I understand ranges are supposed to be used instead.
>
> What's the explanation here? Why is there no more stream and what am I supposed to use instead? Do I really need to be manually managing the read buffer myself?

This should work;

 scope f = new MmFile("somefile");
   foreach(c; cast(string)f[]) {
       process(c);
   }
December 11
On Monday, 11 December 2017 at 22:58:53 UTC, Jordi Gutiérrez Hermoso wrote:
> On Monday, 11 December 2017 at 21:21:51 UTC, Steven Schveighoffer wrote:
>> Use the undead repository:
>
> Wow, really? Is the removal of stream from D some kind of error that hasn't been corrected yet?

Well of course you can use ranges for it, see e.g. this simple example:

---
void main(string[] args)
{
    import std.conv, std.range, std.stdio;
    foreach (d; File(__FILE_FULL_PATH__).byChunk(4096).join.take(5)) {
        writefln("%s", d.to!char);
    }
}
---

Run here: https://run.dlang.io/is/Ann9e9

Though if you need superb performance, iopipe or similar will be faster.
December 12
On Monday, 11 December 2017 at 20:51:41 UTC, Jordi Gutiérrez Hermoso wrote:
> I'd like to read from a file, one byte at a time, without loading the whole file in memory.
>

just playing around with this....

// --------------------

module test;

import std.stdio, std.file, std.exception;

void main()
{
    string filename = "test.txt";
    enforce(filename.exists, "Umm..that file does not exist!");

    auto file = File(filename, "r");
    char[] charBuf;

    while (!file.eof())
    {
        charBuf = file.rawRead(new char[1]);

        if(!file.eof())
            process(cast(char)(charBuf[0]));
    }

    return;
}

void process(char someChar)
{
    import std.ascii : isPrintable;

    if( isPrintable(someChar) )
        writeln("found a printable character: ", someChar);
    else
        writeln("found a non printable character");

}
// --------------------

December 12
On Tuesday, 12 December 2017 at 02:15:13 UTC, codephantom wrote:
>
> just playing around with this....
>

also...in case you only want to read n bytes..

// -----------------------

module test;

import std.stdio, std.file, std.exception;
import std.datetime.stopwatch;


void main()
{
    string filename = "test.txt";  // a text file
    //string filename = "test.exe"; // a binary file

    enforce(filename.exists, "Umm..that file does not exist!");

    auto file = File(filename, "r");
    ubyte[] buf;

    import std.datetime : MonoTime;
    auto t2 = MonoTime.currTime;

    // just read the first n bytes.
    int bytesToRead = 4; // change this n
    int bufCount;
    while ( !file.eof() && bufCount < bytesToRead )
    {
        buf = file.rawRead(new ubyte[1]);

        if(!file.eof())
        {
            process(cast(char)(buf[0]));
            bufCount++;
        }
    }


    writeln("-------------------------------------");
    writeln("this took : ", MonoTime.currTime - t2);
    writeln("-------------------------------------");
    writeln();

    return;
}

void process(char someChar)
{
    import std.ascii : isPrintable;

    if( isPrintable(someChar) )
        writeln("found a printable character: ", someChar);
    else
        writeln("found a non printable character");

}
// -----------------------

December 12
On 12/11/17 6:33 PM, Seb wrote:
> Though if you need superb performance, iopipe or similar will be faster.
Since iopipe was mentioned several times, I will say a couple things:

1. iopipe is not made for processing one element at a time, it focuses on buffers. The reason for this is because certain tasks (i.e. parsing) are much more efficient with buffered data than when using the range API. Even with FILE *, using fgetc for every character is going to suck when compared to fread, and processing the resulting array in-memory.

2. If you do want to process by element, I recommend the following chain:

// an example that uses iopipe's file stream and assumes it's UTF8 text.
// other mechanisms are available.
auto mypipe = openDev("somefile") // open a file
    .bufd                         // buffer it
    .assumeText                   // assume it's utf-8 text
    .ensureDecodeable;            // ensure there are no partial code-points in the window

// convert to range of "chunks", and then join into one large range
foreach(c; mypipe.asInputRange.joiner) {
    process(c);
}

Note, due to Phobos's auto-decoding, joiner is going to auto-decode all of the data. This means typeof(c) is going to be dchar, and not char, and everything needs to be proper utf-8. If you want to process the bytes raw, you can omit the .assumeText.ensureDecodeable part, and the data will be ubytes.

-Steve
« First   ‹ Prev
1 2