May 12, 2017 Re: Processing a gzipped csv-file by line-by-line | ||||
---|---|---|---|---|
| ||||
Posted in reply to H. S. Teoh | On 5/11/17 8:18 PM, H. S. Teoh via Digitalmars-d-learn wrote: > On Wed, May 10, 2017 at 11:40:08PM +0000, Jesse Phillips via Digitalmars-d-learn wrote: >> If you can get the zip to decompress into a range of dchar then >> std.csv will work with it. It is by far not the fastest, but much >> speed is lost since it supports input ranges and doesn't specialize on >> any other range type. > > I actually spent some time today to look into whether fastcsv can > possibly be made to work with general input ranges as long as they > support slicing... and immediately ran into the infamous autodecoding > issue: strings are not random-access ranges because of autodecoding, so > it would require either extensive code surgery to make it work, or ugly > hacks to bypass autodecoding. I'm quite tempted to attempt the latter, > in fact, but not now since it's getting busier at work and I don't have > that much free time to spend on a major refactoring of fastcsv. Yeah, iopipe treats char[] as a random-access sliceable range :) Autodecoding gets annoying if you want to do anything fancy (like chain(somestr, someotherstr)) > Alternatively, I could possibly hack together a version of fastcsv that > took a range of const(char)[] as input (rather than a single string), so > that, in theory, it could handle arbitrarily large input files as long > as the caller can provide a range of data blocks, e.g., File.byChunk, or > in this particular case, a range of decompressed data blocks from > whatever decompressor is used to extract the data. As long as you > consume the individual rows without storing references to them > indefinitely (don't try to make an array of the entire dataset), > fastcsv's optimizations should still work, since unreferenced blocks > will eventually get cleaned up by the GC when memory runs low. I'm interested in getting a fast CSV parser built on top of iopipe. I may fork your code and see if I can get it to work. Since you already work on arrays, it should be quite simple, since arrays are also iopipes by default. -Steve |
Copyright © 1999-2021 by the D Language Foundation