Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
March 14, 2014 Improving IO Speed | ||||
---|---|---|---|---|
| ||||
I have a program in C++ that I am translating to D as a way to investigate and learn D. The program is used to process potentially hundreds of TB's of financial transactions data so it is crucial that it be performant. Right now the C++ version is orders of magnitude faster. Here is a simple example of what I am doing in D: import std.stdio : writefln; import std.stream; align(1) struct TaqIdx { align(1) char[10] symbol; align(1) int tdate; align(1) int begrec; align(1) int endrec; } void main() { auto input = new File("T201212A.IDX"); TaqIdx tmp; int count; while(!input.eof()) { input.readExact(&tmp, TaqIdx.sizeof); // Do something with the data } } Do you have any suggestions for improving the speed in this situation? Thank you! TJB |
March 14, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to TJB | TJB:
> Do you have any suggestions for improving the speed in this situation?
I have never used readExact so far, so I don't have many suggestions. But try to not pack the struct.
Bye,
bearophile
|
March 14, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to TJB | On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
> Do you have any suggestions for improving the speed in this situation?
>
> Thank you!
>
> TJB
I expect you'd get better performance with std.stdio rather than std.stream. stream is class based and (AFAIK) not as optimized for performance.
I'd make it look like this:
void main()
{
auto input = File("T201212A.IDX"); //Not a class
TaqIdx tmp;
...
From there, I'd use either of `byChunk` or `rawRead`, I don't know which is most efficient.
TaqIdx[] buf = (&tmp)[0 .. 1];
while (input.rawRead().length)
{
...
}
or
ubyte[] buf = (cast(ubyte*)&tmp)[0 .. TaqIdx.sizeof];
foreach ( b ; file.byChunks(buf) )
{
...
}
Give it a try and see if it runs faster.
|
March 14, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Friday, 14 March 2014 at 18:26:36 UTC, bearophile wrote:
> TJB:
>
>> Do you have any suggestions for improving the speed in this situation?
>
> I have never used readExact so far, so I don't have many suggestions. But try to not pack the struct.
Given he's using a raw read, I suspect he doesn't have a choice. That said, depending on how heavily the struct is used, he could unpack the struct post-rawRead.
|
March 14, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to TJB | On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
> I have a program in C++ that I am translating to D as a way to investigate and learn D. The program is used to process potentially hundreds of TB's of financial transactions data so it is crucial that it be performant. Right now the C++ version is orders of magnitude faster.
>
> Here is a simple example of what I am doing in D:
>
> import std.stdio : writefln;
> import std.stream;
>
> align(1) struct TaqIdx
> {
> align(1) char[10] symbol;
> align(1) int tdate;
> align(1) int begrec;
> align(1) int endrec;
> }
>
> void main()
> {
> auto input = new File("T201212A.IDX");
> TaqIdx tmp;
> int count;
>
> while(!input.eof())
> {
> input.readExact(&tmp, TaqIdx.sizeof);
> // Do something with the data
> }
> }
>
> Do you have any suggestions for improving the speed in this situation?
>
> Thank you!
>
> TJB
I am not sure how std.stream buffers data (the library has been marked for removal, so perhaps not very efficiently), but what happens if you read in a large array of your TaqIdx structs with each read.
|
March 14, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to Craig Dillabaugh | On Friday, 14 March 2014 at 19:11:12 UTC, Craig Dillabaugh wrote:
> On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
>> I have a program in C++ that I am translating to D as a way to investigate and learn D. The program is used to process potentially hundreds of TB's of financial transactions data so it is crucial that it be performant. Right now the C++ version is orders of magnitude faster.
>>
>> Here is a simple example of what I am doing in D:
>>
>> import std.stdio : writefln;
>> import std.stream;
>>
>> align(1) struct TaqIdx
>> {
>> align(1) char[10] symbol;
>> align(1) int tdate;
>> align(1) int begrec;
>> align(1) int endrec;
>> }
>>
>> void main()
>> {
>> auto input = new File("T201212A.IDX");
>> TaqIdx tmp;
>> int count;
>>
>> while(!input.eof())
>> {
>> input.readExact(&tmp, TaqIdx.sizeof);
>> // Do something with the data
>> }
>> }
>>
>> Do you have any suggestions for improving the speed in this situation?
>>
>> Thank you!
>>
>> TJB
>
> I am not sure how std.stream buffers data (the library has been marked for removal, so perhaps not very efficiently), but what happens if you read in a large array of your TaqIdx structs with each read.
Well, one thing that I found out by experimentation was that if I replace
auto input = new File("T201212A.IDX");
with
auto input = new BufferedFile("T201212A.IDX");
The performance gap vanishes. Now I have nearly identical execution times between the two codes. But perhaps if std.stream is scheduled for removal I shouldn't be using it?
|
March 15, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to TJB | Did you try setvbuf method of std.stdio.File? |
March 21, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to TJB | On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote:
> align(1) struct TaqIdx
> {
> align(1) char[10] symbol;
> align(1) int tdate;
> align(1) int begrec;
> align(1) int endrec;
> }
Won't help with speed, but you can write it with less repetition:
align(1) struct TaqIdx
{
align(1):
char[10] symbol;
int tdate;
int begrec;
int endrec;
}
The outer align(1) is still necessary to avoid the padding.
|
May 09, 2014 Re: Improving IO Speed | ||||
---|---|---|---|---|
| ||||
Posted in reply to TJB | Try this; import std.mmfile; scope mmFile = new MmFile("T201212A.IDX"); TaqIdx* arr = cast(TaqIdx*)mmFile[0..mmFile.length].ptr; for (ulong i = 0; i < mmFile.length/TaqIdx.sizeof; ++i) { // do something... writeln(arr[i].symbol); } On Friday, 14 March 2014 at 18:00:58 UTC, TJB wrote: > I have a program in C++ that I am translating to D as a way to investigate and learn D. The program is used to process potentially hundreds of TB's of financial transactions data so it is crucial that it be performant. Right now the C++ version is orders of magnitude faster. > > Here is a simple example of what I am doing in D: > > import std.stdio : writefln; > import std.stream; > > align(1) struct TaqIdx > { > align(1) char[10] symbol; > align(1) int tdate; > align(1) int begrec; > align(1) int endrec; > } > > void main() > { > auto input = new File("T201212A.IDX"); > TaqIdx tmp; > int count; > > while(!input.eof()) > { > input.readExact(&tmp, TaqIdx.sizeof); > // Do something with the data > } > } > > Do you have any suggestions for improving the speed in this situation? > > Thank you! > > TJB |
Copyright © 1999-2021 by the D Language Foundation