September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Boulund | On Tuesday, 15 September 2015 at 08:53:37 UTC, Fredrik Boulund wrote:
>> my favourite for streaming a file:
>> enum chunkSize = 4096;
>> File(fileName).byChunk(chunkSize).map!"cast(char[])a".joiner()
>
> Is this an efficient way of reading this type of file? What should one keep in mind when choosing chunkSize?
It provides you only one char at a time instead of a whole line. It will be quite constraining for your code if not mind-bending.
|
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | On Tuesday, 15 September 2015 at 09:09:00 UTC, Kagamin wrote: > On Tuesday, 15 September 2015 at 08:53:37 UTC, Fredrik Boulund wrote: >>> my favourite for streaming a file: >>> enum chunkSize = 4096; >>> File(fileName).byChunk(chunkSize).map!"cast(char[])a".joiner() >> >> Is this an efficient way of reading this type of file? What should one keep in mind when choosing chunkSize? reasonably efficient, yes. See http://stackoverflow.com/a/237495 for a discussion of chunk sizing when streaming a file. > It provides you only one char at a time instead of a whole line. It will be quite constraining for your code if not mind-bending. http://dlang.org/phobos/std_string.html#.lineSplitter File(fileName).byChunk(chunkSize).map!"cast(char[])a".joiner().lineSplitter() |
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Boulund | On Tuesday, 15 September 2015 at 08:45:00 UTC, Fredrik Boulund wrote: > On Monday, 14 September 2015 at 15:04:12 UTC, John Colvin wrote: >> [...] > > Thanks for the offer, but don't go out of your way for my sake. Maybe I'll just build this in a clean environment instead of on my work computer to get rid of all the hassle. The Red Hat llvm-devel packages are broken, dependent on libffi-devel which is unavailable. Getting the build environment up to speed on my main machine would take me a lot more time than I have right now. > > Tried building LDC from scratch but it fails because of missing LLVM components, despite having LLVM 3.4.2 installed (though lacking devel components). try this: https://dlangscience.github.io/resources/ldc-0.16.0-a2_glibc2.11.3.tar.xz |
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Tuesday, 15 September 2015 at 09:19:29 UTC, John Colvin wrote:
>> It provides you only one char at a time instead of a whole line. It will be quite constraining for your code if not mind-bending.
>
> http://dlang.org/phobos/std_string.html#.lineSplitter
>
> File(fileName).byChunk(chunkSize).map!"cast(char[])a".joiner().lineSplitter()
lineSplitter doesn't work with input ranges.
|
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | On Tuesday, 15 September 2015 at 13:01:06 UTC, Kagamin wrote:
> On Tuesday, 15 September 2015 at 09:19:29 UTC, John Colvin wrote:
>>> It provides you only one char at a time instead of a whole line. It will be quite constraining for your code if not mind-bending.
>>
>> http://dlang.org/phobos/std_string.html#.lineSplitter
>>
>> File(fileName).byChunk(chunkSize).map!"cast(char[])a".joiner().lineSplitter()
>
> lineSplitter doesn't work with input ranges.
Ugh
|
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Tuesday, 15 September 2015 at 10:01:30 UTC, John Colvin wrote:
> try this: https://dlangscience.github.io/resources/ldc-0.16.0-a2_glibc2.11.3.tar.xz
Nope, :(
$ ldd ldc2
./ldc2: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./ldc2)
linux-vdso.so.1 => (0x00007fff2ffd8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000318a000000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000318a400000)
libncurses.so.5 => /lib64/libncurses.so.5 (0x000000319bc00000)
librt.so.1 => /lib64/librt.so.1 (0x000000318a800000)
libz.so.1 => /lib64/libz.so.1 (0x000000318ac00000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000318dc00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003189c00000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000318c000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003189800000)
/lib64/ld-linux-x86-64.so.2 (0x0000003189400000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x0000003199000000)
Thanks for trying though!
|
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Boulund | On Tuesday, 15 September 2015 at 13:49:04 UTC, Fredrik Boulund wrote:
> On Tuesday, 15 September 2015 at 10:01:30 UTC, John Colvin wrote:
>> [...]
>
> Nope, :(
>
> [...]
Oh well, worth a try I guess.
|
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Boulund | On Tue, Sep 15, 2015 at 08:55:43AM +0000, Fredrik Boulund via Digitalmars-d-learn wrote: > On Monday, 14 September 2015 at 18:31:38 UTC, H. S. Teoh wrote: > >I tried implementing a crude version of this (see code below), and found that manually calling GC.collect() even as frequently as once every 5000 loop iterations (for a 500,000 line test input file) still gives about 15% performance improvement over completely disabling the GC. Since most of the arrays involved here are pretty small, the frequency could be reduced to once every 50,000 iterations and you'd pretty much get the 20% performance boost for free, and still not run out of memory too quickly. > > Interesting, I'll have to go through your code to understand exactly what's going on. I also noticed some GC-related stuff high up in my profiling, but had no idea what could be done about that. Appreciate the suggestions! It's very simple, actually. Basically you just call GC.disable() at the beginning of the program to disable automatic collection cycles, then at period intervals in you manually trigger collections by calling GC.collect(). The way I implemented it in my test code was to use a global counter that I decrement once every loop iteration. When the counter reaches zero, GC.collect() is called, and then the counter is reset to its original value. This is encapsulated in the gcTick() function, so that it's easy to tweak the frequency of the manual collections without modifying several different places in the code each time. T -- BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL |
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Fredrik Boulund | I had some luck building a local copy of llvm in my home directory, using a linux version about as old as yours (llvm 3.5 i used) specifying: --configure --prefix=/home/andrew/llvm so make install would install it somewhere I had permissions. Then I changed the cmake command to: cmake -L -DLLVM_CONFIG="/home/andrew/llvm/bin/llvm-config" .. and I got a working install of ldc. Make yourself a cup of tea while you wait though if you try it, llvm was about an hour and a half to compile. On Tuesday, 15 September 2015 at 13:49:04 UTC, Fredrik Boulund wrote: > On Tuesday, 15 September 2015 at 10:01:30 UTC, John Colvin wrote: >> try this: https://dlangscience.github.io/resources/ldc-0.16.0-a2_glibc2.11.3.tar.xz > > Nope, :( > > $ ldd ldc2 > ./ldc2: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ./ldc2) > linux-vdso.so.1 => (0x00007fff2ffd8000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x000000318a000000) > libdl.so.2 => /lib64/libdl.so.2 (0x000000318a400000) > libncurses.so.5 => /lib64/libncurses.so.5 (0x000000319bc00000) > librt.so.1 => /lib64/librt.so.1 (0x000000318a800000) > libz.so.1 => /lib64/libz.so.1 (0x000000318ac00000) > libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000318dc00000) > libm.so.6 => /lib64/libm.so.6 (0x0000003189c00000) > libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000318c000000) > libc.so.6 => /lib64/libc.so.6 (0x0000003189800000) > /lib64/ld-linux-x86-64.so.2 (0x0000003189400000) > libtinfo.so.5 => /lib64/libtinfo.so.5 (0x0000003199000000) > > Thanks for trying though! |
September 15, 2015 Re: Speeding up text file parser (BLAST tabular format) | ||||
---|---|---|---|---|
| ||||
Posted in reply to Andrew Brown | On Tuesday, 15 September 2015 at 18:42:29 UTC, Andrew Brown wrote:
> I had some luck building a local copy of llvm in my home directory, using a linux version about as old as yours (llvm 3.5 i used) specifying:
>
> --configure --prefix=/home/andrew/llvm
>
> so make install would install it somewhere I had permissions.
>
> Then I changed the cmake command to:
>
> cmake -L -DLLVM_CONFIG="/home/andrew/llvm/bin/llvm-config" ..
>
> and I got a working install of ldc.
>
> Make yourself a cup of tea while you wait though if you try it, llvm was about an hour and a half to compile.
>
Thanks for your suggestion. I'm amazed by the amount of effort you guys put into helping me. Unfortunately the only precompiled version of libstdc++ available for the system in question via Red Hat repo's is 4.4.7, and compiling llvm from scratch requires at least 4.7. I'll be fine using DMD for now as I'm still learning more about D :).
|
Copyright © 1999-2021 by the D Language Foundation