Jump to page: 1 2 3
Thread overview
Read and write gzip files easily.
Feb 19, 2014
Kamil Slowikowski
Feb 19, 2014
Artem Tarasov
Feb 19, 2014
Adam D. Ruppe
Feb 19, 2014
Artem Tarasov
Feb 19, 2014
Vladimir Panteleev
Feb 20, 2014
Kamil Slowikowski
Feb 20, 2014
Adam D. Ruppe
Feb 20, 2014
Kamil Slowikowski
Feb 19, 2014
Kamil Slowikowski
Feb 19, 2014
Craig Dillabaugh
Feb 19, 2014
Craig Dillabaugh
Feb 19, 2014
Craig Dillabaugh
Feb 19, 2014
nazriel
Feb 20, 2014
Stephan Schiffels
Feb 20, 2014
Kamil Slowikowski
Feb 20, 2014
Artem Tarasov
Feb 20, 2014
Stephan Schiffels
May 03, 2015
Per Nordlöw
May 03, 2015
Per Nordlöw
May 11, 2017
Nordlöw
May 03, 2015
Russel Winder
February 19, 2014
Hi there, I'm new to D and have a lot of learning ahead of me. It would
be extremely helpful to me if someone with D experience could show me
some code examples.

I'd like to neatly read and write gzipped files for my work. I have read
several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out.

Here's a Python script that does what I want. Can you please show me
an example in D that does the same thing?

<code>
#!/usr/bin/env python

import gzip

# Read a gzipped file and print the contents line by line.
with gzip.open("input.gz") as stream:
    for line in stream:
        print line

# Write some text to a gzipped file.
with gzip.open("output.gz", "w") as stream:
    stream.write("some output goes here\n")
</code>


I have a second request. I would like to start using D more in my work,
and in particular I would like to use and extend the BioD library. Artem
Tarasov made a nice module to handle BGZF, and I would like to see an
example like my Python code above using Artem's module.

Read more about BGZF:
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html

BioD:
https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d
February 19, 2014
Wow, that's unexpected :)

Unfortunately, there's no standard module for processing gzip/bz2. The former can be dealt with using etc.c.zlib, but there's no convenient interface for working with file as a stream. Thus, the easiest way that I know of is as follows:

import std.stdio, std.process;
auto pipe = pipeShell("gunzip -c " ~ filename); // replace with pigz if you
wish
File input = pipe.stdout;

Regarding your second request, this forum is not an appropriate place to provide usage examples for a library, so that will go into a private e-mail.


On Wed, Feb 19, 2014 at 7:51 PM, Kamil Slowikowski <kslowikowski@gmail.com>wrote:

>
> I have a second request. I would like to start using D more in my work, and in particular I would like to use and extend the BioD library. Artem Tarasov made a nice module to handle BGZF, and I would like to see an example like my Python code above using Artem's module.
>


February 19, 2014
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote:
> Hi there, I'm new to D and have a lot of learning ahead of me. It would
> be extremely helpful to me if someone with D experience could show me
> some code examples.
>
> I'd like to neatly read and write gzipped files for my work. I have read
> several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out.
>
> Here's a Python script that does what I want. Can you please show me
> an example in D that does the same thing?
>
> <code>
> #!/usr/bin/env python
>
> import gzip
>
> # Read a gzipped file and print the contents line by line.
> with gzip.open("input.gz") as stream:
>     for line in stream:
>         print line
>
> # Write some text to a gzipped file.
> with gzip.open("output.gz", "w") as stream:
>     stream.write("some output goes here\n")
> </code>
>
>
> I have a second request. I would like to start using D more in my work,
> and in particular I would like to use and extend the BioD library. Artem
> Tarasov made a nice module to handle BGZF, and I would like to see an
> example like my Python code above using Artem's module.
>
> Read more about BGZF:
> http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
>
> BioD:
> https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d

It is not part of the standard library, but you may want to have a look at the GzipInputStream in vibeD.

http://vibed.org/api/vibe.stream.zlib/GzipInputStream

February 19, 2014
On Wednesday, 19 February 2014 at 16:32:54 UTC, Craig Dillabaugh wrote:
> On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote:
>
> It is not part of the standard library, but you may want to have a look at the GzipInputStream in vibeD.
>
> http://vibed.org/api/vibe.stream.zlib/GzipInputStream

Also meant to add, this thread belongs in the D.learn forum rather than here.

February 19, 2014
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov wrote:
> Unfortunately, there's no standard module for processing gzip/bz2.

std.zlib handles gzip but it doesn't present a file nor range interface over it.

This will work though:

void main() {
        import std.zlib;
        import std.stdio;
        auto uc = new UnCompress();

        foreach(chunk; File("testd.gz").byChunk(1024)) {
                auto uncompressed = uc.uncompress(chunk);
                writeln(cast(string) uncompressed);
        }

        // also look at anything left in the buffer
        writeln(cast(string) uc.flush());
}


And if you are writing, use new Compress(HeaderFormat.gzip) then call the compress method and write what it returns to teh file.
February 19, 2014
Ah, indeed. I dismissed it because it allocates on each call, and heavy GC usage in multithreaded app is a performance killer.

On Wed, Feb 19, 2014 at 8:36 PM, Adam D. Ruppe <destructionator@gmail.com>wrote:

>
> std.zlib handles gzip but it doesn't present a file nor range interface over it.
>


February 19, 2014
On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov wrote:
> the easiest way that I
> know of is as follows:
>
> import std.stdio, std.process;
> auto pipe = pipeShell("gunzip -c " ~ filename); // replace with pigz if you
> wish
> File input = pipe.stdout;

Artem, thank you! I've used a similar trick in the past with Python because calling the system's gzip or pigz in a subprocess.Pipe is faster than using the python gzip module. I'm very glad to see how easy it is in D.

> Regarding your second request, this forum is not an appropriate place to
> provide usage examples for a library, so that will go into a private e-mail.

Thanks, again! I'm looking forward to hearing from you :)


@Adam D. Ruppe
Thanks for your example! I couldn't find such an example anywhere on the web.


@Craig Dillabaugh
Please feel free to move the thread, sorry for posting in the wrong place.
February 19, 2014
On Wednesday, 19 February 2014 at 15:51:53 UTC, Kamil Slowikowski wrote:
> Hi there, I'm new to D and have a lot of learning ahead of me. It would
> be extremely helpful to me if someone with D experience could show me
> some code examples.
>
> I'd like to neatly read and write gzipped files for my work. I have read
> several threads on these forums on the topic of std.zlib or std.zip and I haven't been able to figure it out.
>
> Here's a Python script that does what I want. Can you please show me
> an example in D that does the same thing?
>
> <code>
> #!/usr/bin/env python
>
> import gzip
>
> # Read a gzipped file and print the contents line by line.
> with gzip.open("input.gz") as stream:
>     for line in stream:
>         print line
>
> # Write some text to a gzipped file.
> with gzip.open("output.gz", "w") as stream:
>     stream.write("some output goes here\n")
> </code>
>
>
> I have a second request. I would like to start using D more in my work,
> and in particular I would like to use and extend the BioD library. Artem
> Tarasov made a nice module to handle BGZF, and I would like to see an
> example like my Python code above using Artem's module.
>
> Read more about BGZF:
> http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
>
> BioD:
> https://github.com/biod/BioD/blob/d2bea0a0da63eb820fcf11ae367456b2c367ec04/bio/core/bgzf/compress.d

Witaj Kamil :)

Feel free to also visit #d channel on freenode IRC network.
February 19, 2014
> @Craig Dillabaugh
> Please feel free to move the thread, sorry for posting in the wrong place.

Actually, the thread can't be moved I believe, it is here forever.

Not a big deal though, lots of people new to D post questions here and miss the D.learn forum, so you are not alone. Since I didn't have a good answer to your original question I decided I should let you know about D.learn.


February 19, 2014
On Wednesday, 19 February 2014 at 16:36:29 UTC, Adam D. Ruppe wrote:
> On Wednesday, 19 February 2014 at 16:27:32 UTC, Artem Tarasov wrote:
>> Unfortunately, there's no standard module for processing gzip/bz2.
>
> std.zlib handles gzip but it doesn't present a file nor range interface over it.
>
> This will work though:
>
> void main() {
>         import std.zlib;
>         import std.stdio;
>         auto uc = new UnCompress();
>
>         foreach(chunk; File("testd.gz").byChunk(1024)) {
>                 auto uncompressed = uc.uncompress(chunk);
>                 writeln(cast(string) uncompressed);
>         }
>
>         // also look at anything left in the buffer
>         writeln(cast(string) uc.flush());
> }
>

Regrettably, the above code has a bug. Currently, std.zlib stores a reference to the buffer passed to it, and since byChunk reuses the buffer, the code will fail when uncompressing multiple chunks.
« First   ‹ Prev
1 2 3