Jump to page: 1 2 3
Thread overview
Help optimizing UnCompress for gzipped files
Jan 02, 2018
Christian Köstlin
Jan 02, 2018
Stefan Koch
Jan 02, 2018
Adam D. Ruppe
Jan 02, 2018
Christian Köstlin
Jan 03, 2018
Christian Köstlin
Jan 03, 2018
Christian Köstlin
Jan 04, 2018
Christian Köstlin
Jan 04, 2018
Christian Köstlin
Jan 04, 2018
Christian Köstlin
Jan 05, 2018
Christian Köstlin
Jan 05, 2018
Christian Köstlin
Jan 06, 2018
Christian Köstlin
Jan 09, 2018
Christian Köstlin
Jan 02, 2018
Adam D. Ruppe
Jan 03, 2018
Christian Köstlin
January 02, 2018
Hi all,

over the holidays, I played around with processing some gzipped json data. First version was implemented in ruby, but took too long, so I tried, dlang. This was already faster, but not really satisfactory fast. Then I wrote another version in java, which was much faster.

After this I analyzed the first step of the process (gunzipping the data from a file to memory), and found out, that dlangs UnCompress is much slower than java, and ruby and plain c.

There was some discussion on the forum a while ago: http://forum.dlang.org/thread/pihxxhjgnveulcdtadvg@forum.dlang.org

The code I used and the numbers I got are here: https://github.com/gizmomogwai/benchmarks/tree/master/gunzip

I used an i7 macbook with os x 10.13.2, ruby 2.5.0 built via rvm, python3 installed by homebrew, builtin clang compiler, ldc-1.7.0-beta1, java 1.8.0_152.

Is there anything I can do to speed up the dlang stuff?

Thanks in advance,
Christian
January 02, 2018
On Tuesday, 2 January 2018 at 10:27:11 UTC, Christian Köstlin wrote:
> Hi all,
>
> over the holidays, I played around with processing some gzipped json data. First version was implemented in ruby, but took too long, so I tried, dlang. This was already faster, but not really satisfactory fast. Then I wrote another version in java, which was much faster.
>
> After this I analyzed the first step of the process (gunzipping the data from a file to memory), and found out, that dlangs UnCompress is much slower than java, and ruby and plain c.
>
> There was some discussion on the forum a while ago: http://forum.dlang.org/thread/pihxxhjgnveulcdtadvg@forum.dlang.org
>
> The code I used and the numbers I got are here: https://github.com/gizmomogwai/benchmarks/tree/master/gunzip
>
> I used an i7 macbook with os x 10.13.2, ruby 2.5.0 built via rvm, python3 installed by homebrew, builtin clang compiler, ldc-1.7.0-beta1, java 1.8.0_152.
>
> Is there anything I can do to speed up the dlang stuff?
>
> Thanks in advance,
> Christian

Yes indeed. You can make it much faster by using a sliced static array as buffer.
I suspect that most of the slowdown is caused by the gc.
As there should be only calls to the gzip library
January 02, 2018
On Tuesday, 2 January 2018 at 10:27:11 UTC, Christian Köstlin wrote:
> After this I analyzed the first step of the process (gunzipping the data from a file to memory), and found out, that dlangs UnCompress is much slower than java, and ruby and plain c.

Yeah, std.zlib is VERY poorly written. You can get much better performance by just calling the C functions yourself instead. (You can just import import etc.c.zlib; it is included still)

Improving it would mean changing the public API. I think the one-shot compress/uncompress functions are ok, but the streaming class does a lot of unnecessary work inside like copying stuff around.
January 02, 2018
On Tuesday, 2 January 2018 at 11:22:06 UTC, Stefan Koch wrote:
> You can make it much faster by using a sliced static array as buffer.

Only if you want data corruption! It keeps a copy of your pointer internally: https://github.com/dlang/phobos/blob/master/std/zlib.d#L605

It also will always overallocate new buffers on each call <https://github.com/dlang/phobos/blob/master/std/zlib.d#L602>

There is no efficient way to use it. The implementation is substandard because the API limits the design.

If we really want a fast std.zlib, the API will need to be extended with new functions to fix these. Those new functions will probably look a LOT like the underlying C functions... which is why I say just use them right now.

> I suspect that most of the slowdown is caused by the gc.
> As there should be only calls to the gzip library

plz measure before spreading FUD about the GC.
January 02, 2018
On 1/2/18 8:57 AM, Adam D. Ruppe wrote:
> On Tuesday, 2 January 2018 at 11:22:06 UTC, Stefan Koch wrote:
>> You can make it much faster by using a sliced static array as buffer.
> 
> Only if you want data corruption! It keeps a copy of your pointer internally: https://github.com/dlang/phobos/blob/master/std/zlib.d#L605
> 
> It also will always overallocate new buffers on each call <https://github.com/dlang/phobos/blob/master/std/zlib.d#L602>
> 
> There is no efficient way to use it. The implementation is substandard because the API limits the design.

iopipe handles this quite well. And deals with the buffers properly (yes, it is very tricky. You have to ref-count the zstream structure, because it keeps internal pointers to *itself* as well!). And no, iopipe doesn't use std.zlib, I use the etc.zlib functions (but I poached some ideas from std.zlib when writing it).

https://github.com/schveiguy/iopipe/blob/master/source/iopipe/zip.d

I even wrote a json parser for iopipe. But it's far from complete. And probably needs updating since I changed some of the iopipe API.

https://github.com/schveiguy/jsoniopipe

Depending on the use case, it might be enough, and should be very fast.

-Steve
January 02, 2018
On 02.01.18 15:09, Steven Schveighoffer wrote:
> On 1/2/18 8:57 AM, Adam D. Ruppe wrote:
>> On Tuesday, 2 January 2018 at 11:22:06 UTC, Stefan Koch wrote:
>>> You can make it much faster by using a sliced static array as buffer.
>>
>> Only if you want data corruption! It keeps a copy of your pointer internally: https://github.com/dlang/phobos/blob/master/std/zlib.d#L605
>>
>> It also will always overallocate new buffers on each call <https://github.com/dlang/phobos/blob/master/std/zlib.d#L602>
>>
>> There is no efficient way to use it. The implementation is substandard because the API limits the design.
> 
> iopipe handles this quite well. And deals with the buffers properly (yes, it is very tricky. You have to ref-count the zstream structure, because it keeps internal pointers to *itself* as well!). And no, iopipe doesn't use std.zlib, I use the etc.zlib functions (but I poached some ideas from std.zlib when writing it).
> 
> https://github.com/schveiguy/iopipe/blob/master/source/iopipe/zip.d
> 
> I even wrote a json parser for iopipe. But it's far from complete. And probably needs updating since I changed some of the iopipe API.
> 
> https://github.com/schveiguy/jsoniopipe
> 
> Depending on the use case, it might be enough, and should be very fast.
> 
> -Steve
Thanks Steve for this proposal (actually I already had an iopipe version on my harddisk that I applied to this problem) Its more or less your unzip example + putting the data to an appender (I hope this is how it should be done, to get the data to RAM).

iopipe is already better than the normal dlang version, almost like java, but still far from the solution. I updated https://github.com/gizmomogwai/benchmarks/tree/master/gunzip

I will give the direct gunzip calls a try ...

In terms of json parsing, I had really nice results with the fast.json pull parser, but its comparing a little bit apples with oranges, because I did not pull out all the data there.

---
Christian
January 02, 2018
On 1/2/18 1:01 PM, Christian Köstlin wrote:
> On 02.01.18 15:09, Steven Schveighoffer wrote:
>> On 1/2/18 8:57 AM, Adam D. Ruppe wrote:
>>> On Tuesday, 2 January 2018 at 11:22:06 UTC, Stefan Koch wrote:
>>>> You can make it much faster by using a sliced static array as buffer.
>>>
>>> Only if you want data corruption! It keeps a copy of your pointer
>>> internally: https://github.com/dlang/phobos/blob/master/std/zlib.d#L605
>>>
>>> It also will always overallocate new buffers on each call
>>> <https://github.com/dlang/phobos/blob/master/std/zlib.d#L602>
>>>
>>> There is no efficient way to use it. The implementation is substandard
>>> because the API limits the design.
>>
>> iopipe handles this quite well. And deals with the buffers properly
>> (yes, it is very tricky. You have to ref-count the zstream structure,
>> because it keeps internal pointers to *itself* as well!). And no, iopipe
>> doesn't use std.zlib, I use the etc.zlib functions (but I poached some
>> ideas from std.zlib when writing it).
>>
>> https://github.com/schveiguy/iopipe/blob/master/source/iopipe/zip.d
>>
>> I even wrote a json parser for iopipe. But it's far from complete. And
>> probably needs updating since I changed some of the iopipe API.
>>
>> https://github.com/schveiguy/jsoniopipe
>>
>> Depending on the use case, it might be enough, and should be very fast.
>>
> Thanks Steve for this proposal (actually I already had an iopipe version
> on my harddisk that I applied to this problem) Its more or less your
> unzip example + putting the data to an appender (I hope this is how it
> should be done, to get the data to RAM).

Well, you don't need to use appender for that (and doing so is copying a lot of the data an extra time). All you need is to extend the pipe until there isn't any more new data, and it will all be in the buffer.

// almost the same line from your current version
auto mypipe = openDev("../out/nist/2011.json.gz")
                  .bufd.unzip(CompressionFormat.gzip);

// This line here will work with the current release (0.0.2):
while(mypipe.extend(0) != 0) {}

//But I have a fix for a bug that hasn't been released yet, this would work if you use iopipe-master:
mypipe.ensureElems();

// getting the data is as simple as looking at the buffer.
auto data = mypipe.window; // ubyte[] of the data

> iopipe is already better than the normal dlang version, almost like
> java, but still far from the solution. I updated
> https://github.com/gizmomogwai/benchmarks/tree/master/gunzip
> 
> I will give the direct gunzip calls a try ...
> 
> In terms of json parsing, I had really nice results with the fast.json
> pull parser, but its comparing a little bit apples with oranges, because
> I did not pull out all the data there.

Yeah, with jsoniopipe being very raw, I wouldn't be sure it was usable in your case. The end goal is to have something fast, but very easy to construct. I wasn't planning on focusing on the speed (yet) like other libraries do, but ease of writing code to use it.

-Steve
January 02, 2018
On 1/2/18 3:13 PM, Steven Schveighoffer wrote:
> // almost the same line from your current version
> auto mypipe = openDev("../out/nist/2011.json.gz")
>                    .bufd.unzip(CompressionFormat.gzip);

Would you mind telling me the source of the data? When I do get around to it, I want to have a good dataset to test things against, and would be good to use what others reach for.

-Steve
January 03, 2018
On 02.01.18 21:48, Steven Schveighoffer wrote:
> On 1/2/18 3:13 PM, Steven Schveighoffer wrote:
>> // almost the same line from your current version
>> auto mypipe = openDev("../out/nist/2011.json.gz")
>>                    .bufd.unzip(CompressionFormat.gzip);
> 
> Would you mind telling me the source of the data? When I do get around to it, I want to have a good dataset to test things against, and would be good to use what others reach for.
> 
> -Steve
Hi Steve,

thanks for looking into this.
I use data from nist.gov, the Makefile includes these download instructions:
    curl -s
https://static.nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-2011.json.gz >
out/nist/2011.json.gz`

--
Christian Köstlin

January 03, 2018
On 02.01.18 21:13, Steven Schveighoffer wrote:
> Well, you don't need to use appender for that (and doing so is copying a lot of the data an extra time). All you need is to extend the pipe until there isn't any more new data, and it will all be in the buffer.
> 
> // almost the same line from your current version
> auto mypipe = openDev("../out/nist/2011.json.gz")
>                   .bufd.unzip(CompressionFormat.gzip);
> 
> // This line here will work with the current release (0.0.2):
> while(mypipe.extend(0) != 0) {}
Thanks for this input, I updated the program to make use of this method and compare it to the appender thing as well.

>> I will give the direct gunzip calls a try ...
I added direct gunzip calls as well... Those are really good, as long as
I do not try to get the data into ram :) then it is "bad" again.
I wonder what the real difference to the lowlevel solution with own
appender and the c version is. For me they look almost the same (ugly,
only the performance seems to be nice).

Funny thing is, that if I add the clang address sanitizer things to the c program, I get almost the same numbers as for java :)


> Yeah, with jsoniopipe being very raw, I wouldn't be sure it was usable in your case. The end goal is to have something fast, but very easy to construct. I wasn't planning on focusing on the speed (yet) like other libraries do, but ease of writing code to use it.
> 
> -Steve

--
Christian Köstlin
« First   ‹ Prev
1 2 3