Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
August 08, 2015 file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
I'm playing around with the range based operations and with raw file io. I couldn't figure out a way to get rid of the outer foreach loops. Nice execution time of 537 msec for this, which creates and reads back a file of about 160MB (20_000_000 doubles). import std.algorithm; import std.stdio; import std.conv; import std.math; import std.range; import std.file; import std.datetime; import std.array; void main() { auto fn = "numberList.db"; auto f = File(fn,"wb"); scope(exit) std.file.remove(fn); std.datetime.StopWatch sw; sw.start(); foreach(elem; chunks(iota(10.5,20_000_010.5,1.0),1000000)){ f.rawWrite(elem.array()); } f.close(); f = File(fn,"rb"); const int n = 1000000; double dbv[] = new double[n]; foreach(i; iota(10,20_000_000+10,n)){ f.rawRead!(double)(dbv); } f.close(); long tm = sw.peek().msecs; writeln("time msecs:", tm); } |
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On 08/08/2015 04:11 PM, Jay Norwood wrote:
> I'm playing around with the range based operations and with raw file
> io. I couldn't figure out a way to get rid of the outer foreach loops.
When the body of the foreach loop performs something, then std.algorithm.each can be useful:
import std.algorithm;
import std.stdio;
import std.range;
import std.datetime;
void main()
{
auto fn = "numberList.db";
std.datetime.StopWatch sw;
sw.start();
scope(exit) std.file.remove(fn);
{
auto f = File(fn,"wb");
iota(10.5, 20_000_010.5, 1.0)
.chunks(1000000)
.each!(a => f.rawWrite(a.array));
}
{
auto f = File(fn,"rb");
const int n = 1000000;
// NOTE: D-style syntax on the left-hand side
double[] dbv = new double[n];
// NOTE: No need to tell rawRead the type as double
iota(10, 20_000_000 + 10, n)
.each!(a => f.rawRead(dbv));
}
long tm = sw.peek().msecs;
writeln("time msecs:", tm);
}
Ali
|
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
>
> {
> auto f = File(fn,"wb");
>
> iota(10.5, 20_000_010.5, 1.0)
> .chunks(1000000)
> .each!(a => f.rawWrite(a.array));
> }
>
> Ali
Thanks. There are many examples of numeric to string data output in the docs, saving byLine. Those are on the order of 30x slower than this rawWrite example. This will be more useful to many people.
|
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
> // NOTE: No need to tell rawRead the type as double
> iota(10, 20_000_000 + 10, n)
> .each!(a => f.rawRead(dbv));
> }
>
> Ali
Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an error msg in compiler error in 2.067.1. The f.rawRead!(double)(dbv) form works.
Error: template instance rawRead!(dbv) does not match template declaration rawRead(T)(T[] buffer)
|
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jay Norwood | On 08/08/2015 07:07 PM, Jay Norwood wrote: > On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote: >> // NOTE: No need to tell rawRead the type as double >> iota(10, 20_000_000 + 10, n) >> .each!(a => f.rawRead(dbv)); >> } >> >> Ali > > Your f.rawRead(dbv) form compiles, but f.rawRead!(dbv) results in an > error msg in compiler error in 2.067.1. The f.rawRead!(double)(dbv) > form works. rawRead is a member function template with one template parameter and one function parameter: T[] rawRead(T)(T[] buffer); The single template parameter T is the element type of its function parameter, which is a dynamic array. In this case, function template type deduction works and the template parameter need not be provided because dbv is of type double[] and it is obvious that T is double: f.rawRead(dbv) // <- compiles It is the same thing as proving T explicitly as double: f.rawRead!(double)(dbv) // <- compiles The code that does not compile has an error because it provides dbv as a template argument (because it is in the parameter list that comes right after !): f.rawRead!(dbv) // oops, dbv should be the function argument not the template argument Ali |
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Ali Çehreli | On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote: > Ali Now benchmarks write and read separately: https://github.com/nordlow/justd/blob/0d746b2c1800a82a61a6cb7edcabfd9664066b2c/tests/t_rawio.d Couldn't the chunk logic be deduced aswell? Something like: void rawWriteInAutoChunks(R)(File f, R r) { const count = preferred_disk_write_size / sizeof(ElementType!R); return r.chunks(count).each!(a => f.rawWrite(a.array)); } What would a suitable value for `preferred_disk_write_size` be? |
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote: > Couldn't the chunk logic be deduced aswell? Yes :) See update at: https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.d > What would a suitable value for `preferred_disk_write_size` be? Is there a suitable constant somewhere in Phobos? |
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Sunday, 9 August 2015 at 11:06:34 UTC, Nordlöw wrote:
> On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
>> Couldn't the chunk logic be deduced aswell?
>
> Yes :)
>
> See update at:
>
> https://github.com/nordlow/justd/blob/a633b52876388921ec49c189f374746f7b4d8c93/tests/t_rawio.d
>
>> What would a suitable value for `preferred_disk_write_size` be?
>
> Is there a suitable constant somewhere in Phobos?
So, to be clear, I think you must be saying that you want to specify the disk chunk size separate from the array size. Is that correct?
I stepped through the original code (with the foreach loops) and I see single calls to fwrite and fread for each array.
The rawWrite is executing a single fwrite per array
f.rawWrite(elem.array())
auto result =
.fwrite(buffer.ptr, T.sizeof, buffer.length, _p.handle);
The rawRead is executing a sing fread per array
immutable result =
fread(buffer.ptr, T.sizeof, buffer.length, _p.handle);
|
August 09, 2015 Re: file rawRead and rawWrite in chunks example | ||||
---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Sunday, 9 August 2015 at 10:40:06 UTC, Nordlöw wrote:
> On Sunday, 9 August 2015 at 00:50:16 UTC, Ali Çehreli wrote:
>> Ali
>
> Now benchmarks write and read separately:
>
>
I benchmarked my first results:
D:\visd\raw\raw\Release>raw
time write msecs:457
time read msecs:75
This is for 160MB of data. The write includes initialization of the values.
The read time is faster than my ssd drive, so I have to assume this is win7 or the ssd caching the data.
If I increase double count to 200,000,000 (to 1.6GB of data), the times are:
D:\visd\raw\raw\Release>raw
time write msecs:7236
time read msecs:11979
08/09/2015 10:12 AM 1,600,000,000 numberList.db
So that's around 220MB/sec for the writes and 133MB/sec for the reads. That's an intel 520 series 180GB ssd, but in an SATA 3Gb/s interface in a laptop. Sequential write speed for that ssd should be about 257MB/sec. Sequential read should be close to 395MB/sec for this drive on a 6Gb/sec SATA. So read speed is lower than I'd expect.
If I move this program over to my work computer, the same 1.6GB measurement returns these times below on a Samsung 840 SSD, which is on a 6Gb/sec SATA interface. I believe the 458MB/sec write speeds. I suspect the read timing is again just measuring win7's cached data.
J:\visd>raw
time write msecs:3489
time read msecs:579
|
Copyright © 1999-2021 by the D Language Foundation