Thread overview | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
September 24, 2011 Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Hi folks, I wasn't sure whether this should go here or in the D devel list... I'm trying to port a program where threads read from a file, process the data, then write the output data. The program is cpu-bound. In C++ I can do something like this: class QueueIn { ifstream in; mutex m; string get() { string s; m.lock(); getline(in, s); m.unlock(); return s; } }; class QueueOut { ofstream out; mutex m; void put(string s) { m.lock(); out.write(s); m.unlock(); } }; In D, I'm so far having trouble figuring out the right idiom to do what I want. I looked at std.parallel, but it seems a bit tricky to make my stuff work in this setting. A std.stdio File cannot be part of shared class. How would you do this with the latest D2? Thanks Jerry |
September 24, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jerry Quinn | On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote:
> Hi folks,
>
> I wasn't sure whether this should go here or in the D devel list...
>
> I'm trying to port a program where threads read from a file, process the data, then write the output data. The program is cpu-bound. In C++ I can do something like this:
>
> class QueueIn {
> ifstream in;
> mutex m;
>
> string get() {
> string s;
> m.lock();
> getline(in, s);
> m.unlock();
> return s;
> }
> };
>
> class QueueOut {
> ofstream out;
> mutex m;
> void put(string s) {
> m.lock();
> out.write(s);
> m.unlock();
> }
> };
>
>
> In D, I'm so far having trouble figuring out the right idiom to do what I want. I looked at std.parallel, but it seems a bit tricky to make my stuff work in this setting. A std.stdio File cannot be part of shared class.
>
> How would you do this with the latest D2?
A direct rewrite would involve using shared and synchronized (either on the class or a synchronized block around the code that you want to lock). However, the more idiomatic way to do it would be to use std.concurrency and have the threads pass messages to each other using send and receive.
So, what you'd probably do is spawn 3 threads from the main thread. One would read the file and send the data to another thread. That second thread would process the data, then it would send it to the third thread, which would write it to disk.
Unfortunately, I'm not aware of any good code examples of this sort of thing online. TDPL has some good examples, but obviously you'd have to have the book to read it. Given some time, I could probably cook up an example, but I don't have anything on hand.
- Jonathan M Davis
|
September 24, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jonathan M Davis | Jonathan M Davis Wrote: > On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote: > > A direct rewrite would involve using shared and synchronized (either on the class or a synchronized block around the code that you want to lock). However, the more idiomatic way to do it would be to use std.concurrency and have the threads pass messages to each other using send and receive. I'm trying the direct rewrite but having problems with shared and synchronized. class queue { File file; this(string infile) { file.open(infile); } synchronized void put(string s) { file.writeln(s); } } queue.d(10): Error: template std.stdio.File.writeln(S...) does not match any function template declaration queue.d(10): Error: template std.stdio.File.writeln(S...) cannot deduce template function from argument types !()(string) Remove the synchronized and it compiles fine with 2.055. > So, what you'd probably do is spawn 3 threads from the main thread. One would read the file and send the data to another thread. That second thread would process the data, then it would send it to the third thread, which would write it to disk. I think that would become messy when you have multiple processing threads. The reader and writer would have to handshake with all the processors. > Unfortunately, I'm not aware of any good code examples of this sort of thing online. TDPL has some good examples, but obviously you'd have to have the book to read it. Given some time, I could probably cook up an example, but I don't have anything on hand. std.parallelism actually looks the closest to what I want. Not sure if I can make it work easily though. |
September 24, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jerry Quinn | On Saturday, September 24, 2011 01:05:52 Jerry Quinn wrote: > Jonathan M Davis Wrote: > > On Friday, September 23, 2011 23:01:17 Jerry Quinn wrote: > > > > A direct rewrite would involve using shared and synchronized (either on the class or a synchronized block around the code that you want to lock). However, the more idiomatic way to do it would be to use std.concurrency and have the threads pass messages to each other using send and receive. > > I'm trying the direct rewrite but having problems with shared and synchronized. > > class queue { > File file; > this(string infile) { > file.open(infile); > } > synchronized void put(string s) { > file.writeln(s); > } > } > > queue.d(10): Error: template std.stdio.File.writeln(S...) does not match any > function template declaration queue.d(10): Error: template > std.stdio.File.writeln(S...) cannot deduce template function from argument > types !()(string) > > Remove the synchronized and it compiles fine with 2.055. Technically, sychronized should go on the _class_ and not the function, but I'm not sure if dmd is currently correctly implemented in that respect (since if it is, it should actually be an error to stick synchronized on a function). Regardless of that, however, unless you use shared (I don't know if you are), each instance of queue is going to be on its own thread. So, no mutexes will be necessary, but you won't be able to have multiple threads writing to the same File. It could get messy. You could use a synchronized block instead of synchronizing the class void put(string s) { synchronized(this) { file.writeln(s); } } and see if that works. But you still need the variable to be shared. > > So, what you'd probably do is spawn 3 threads from the main thread. One would read the file and send the data to another thread. That second thread would process the data, then it would send it to the third thread, which would write it to disk. > > I think that would become messy when you have multiple processing threads. The reader and writer would have to handshake with all the processors. The reader therad would be free to read at whatever rate that it can. And the processing thread would be free to process at whatever rate that it can. The writer thread would be free to write and whatever rate it can. I/O issues would be reduced (especially if the files be read and written to are on separate drives), since the reading and writing threads would be separate. I don't see why it would be particularly messy. It's essentially what TDPL suggests a file copy function should do except that there's a thread in the middle which does so processing of the data before sending it the thread doing the writing. Really, this is a classic example of the sort of situation which std.concurrency's message passing is intended for. > > Unfortunately, I'm not aware of any good code examples of this sort of thing online. TDPL has some good examples, but obviously you'd have to have the book to read it. Given some time, I could probably cook up an example, but I don't have anything on hand. > > std.parallelism actually looks the closest to what I want. Not sure if I can make it work easily though. For std.parallelism to work, each iteration of a parallel foreach _must_ be _completely_ separate from the others. They can't access the same data. So, for instance, they could access separate elements in an array, but they can't ever access the same element (unless none of the write to it), and something like a file is not going to work at all, since then you'd be trying to write to the file from multiple threads at once with no synchronization whatsoever. std.parallelism is for when you do the same thing many times _in parallel_ with each other, and your use case does not sound like that at all. I really think that std.concurrency is what you should be using if you don't want to do it the C/C++ way and use shared. - Jonathan M Davis |
September 24, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jerry Quinn | If you didn't know, the concurrency chapter of tdpl is a free chapter: http://www.informit.com/articles/article.aspx?p=1609144 It has an example of file copying with message passing: http://www.informit.com/articles/article.aspx?p=1609144&seqNum=7 |
September 25, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lutger Blijdestijn | Lutger Blijdestijn Wrote:
> If you didn't know, the concurrency chapter of tdpl is a free chapter: http://www.informit.com/articles/article.aspx?p=1609144
>
> It has an example of file copying with message passing: http://www.informit.com/articles/article.aspx?p=1609144&seqNum=7
What I really want is a shared fifo where the input is lines from a file, and many workers grab something from the fifo. They then push their results into a shared reordering output queue.
|
September 26, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jerry Quinn | Jerry Quinn , dans le message (digitalmars.D.learn:29763), a écrit :
> Lutger Blijdestijn Wrote:
>
>> If you didn't know, the concurrency chapter of tdpl is a free chapter: http://www.informit.com/articles/article.aspx?p=1609144
>>
>> It has an example of file copying with message passing: http://www.informit.com/articles/article.aspx?p=1609144&seqNum=7
>
> What I really want is a shared fifo where the input is lines from a file, and many workers grab something from the fifo. They then push their results into a shared reordering output queue.
My 2 cent advice:
Does the queue really has to be a file ?
You could read it completely before starting, and then just share
your instructions as strings for example.
That may make your life easier.
I would actually use a file if I use multiprocessing, but not
necessarily for multithreading.
|
September 28, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Christophe | travert@phare.normalesup.org (Christophe) writes:
> Jerry Quinn , dans le message (digitalmars.D.learn:29763), a écrit :
>> What I really want is a shared fifo where the input is lines from a file, and many workers grab something from the fifo. They then push their results into a shared reordering output queue.
>
> My 2 cent advice:
>
> Does the queue really has to be a file ?
> You could read it completely before starting, and then just share
> your instructions as strings for example.
Yes, these files could be large enough that the memory cost of loading is an issue. Also, I should be able to do this with input from stdin.
At this point, I'm trying to figure out how to implement a shared fifo in D as much as solve my original problem :-)
Jerry
|
September 29, 2011 Re: Multithreaded file IO? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jerry | Jerry , dans le message (digitalmars.D.learn:29830), a écrit :
> travert@phare.normalesup.org (Christophe) writes:
>
>> Jerry Quinn , dans le message (digitalmars.D.learn:29763), a écrit :
>>> What I really want is a shared fifo where the input is lines from a file, and many workers grab something from the fifo. They then push their results into a shared reordering output queue.
>>
>> My 2 cent advice:
>>
>> Does the queue really has to be a file ?
>> You could read it completely before starting, and then just share
>> your instructions as strings for example.
>
> Yes, these files could be large enough that the memory cost of loading is an issue. Also, I should be able to do this with input from stdin.
>
> At this point, I'm trying to figure out how to implement a shared fifo in D as much as solve my original problem :-)
Ok, an idea then could be to make a separate thread that deal with the File object, or a big shared object well protected with a mutex, to ditribute instructions that are as much immutable as possible.
Good luck.
|
Copyright © 1999-2021 by the D Language Foundation