Hello everyone. We have some D code running in production that reads files containing lines of JSON data, that we would like to parse and process.
These files can be processed in parallel, so we create one thread for processing each file. However I noticed significant slowdowns when processing multiple files in parallel, as opposed to processing only one file.
Here is a simple code snippet reproducing the issue. It reads from a file containing the same json copy pasted 100k times, like so:
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
...
It gives the following output:
➜ ./test 1
(file ) (thread id 140310703728384) starting processing file
(file )Done in 1 sec, 549 ms, 257 μs, and 6 hnsecs
➜ ./test 3
(file ) (thread id 140071550318336) starting processing file
(file ) (thread id 140078235236096) starting processing file
(file ) (thread id 140078221063936) starting processing file
(file )Done in 4 secs, 296 ms, 780 μs, and 9 hnsecs
(file )Done in 4 secs, 360 ms, 498 μs, and 3 hnsecs
(file )Done in 4 secs, 393 ms, 342 μs, and 6 hnsecs
Another curious thing is that this behaviour is not present when compiling the code with the --build=profile
option.
For reference:
➜ ldc2 --version
LDC - the LLVM D compiler (1.24.0):
based on DMD v2.094.1 and LLVM 11.0.1
import std.file;
import core.thread.osthread;
import std.conv;
import std.concurrency;
import std.json;
import std.stdio;
import std.encoding;
import std.datetime.systime : Clock;
import std.process;
import std.functional;
import std.algorithm;
import std.bitmanip;
void parseInThread(string[] lines)
{
writefln("(file %s) (thread id %s) starting processing file", "", thisThreadID);
auto startTime = Clock.currTime;
foreach (line; lines)
{
line.parseJSON;
}
writefln("(file %s )Done in %s", "", Clock.currTime - startTime);
}
class T
{
Thread t_;
string _filename;
string[] _lines;
this(string[] lines)
{
_lines = lines.dup;
t_ = new Thread(() { parseInThread(_lines); });
}
void opCall()
{
t_.start;
}
void join()
{
t_.join;
}
}
int main(string[] args)
{
T[] threads;
string filenameBase = "./file";
foreach (k; 1 .. args[1].to!int + 1)
{
auto v = filenameBase ~ k.to!string;
auto newFile = File(v ~ "", "r");
string[] lines;
foreach (line; newFile.byLine)
{
lines ~= (line.to!string);
}
newFile.close;
threads ~= new T(lines);
}
foreach (thread; threads)
{
thread();
}
foreach (thread; threads)
{
thread.join;
}
return 0;
}
Thanks in advance, this has been annoying me for a couple of days and I have no idea what might be the problem. Strangely enough I also have the same problem when using vibe-d
json library for parsing.