Thread overview
Why is Json parsing slower in multiple threads?
Jun 20, 2023
Kagamin
Jun 20, 2023
Stefan Koch
Jun 20, 2023
FeepingCreature
Jun 20, 2023
Sergey
Jun 21, 2023
Andrej Mitrovic
June 20, 2023

Hello everyone. We have some D code running in production that reads files containing lines of JSON data, that we would like to parse and process.

These files can be processed in parallel, so we create one thread for processing each file. However I noticed significant slowdowns when processing multiple files in parallel, as opposed to processing only one file.

Here is a simple code snippet reproducing the issue. It reads from a file containing the same json copy pasted 100k times, like so:

{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
{ "s" : "string", "i" : 42}
...

It gives the following output:

➜ ./test 1
(file ) (thread id 140310703728384) starting processing file
(file  )Done in 1 sec, 549 ms, 257 μs, and 6 hnsecs

➜ ./test 3
(file ) (thread id 140071550318336) starting processing file
(file ) (thread id 140078235236096) starting processing file
(file ) (thread id 140078221063936) starting processing file
(file  )Done in 4 secs, 296 ms, 780 μs, and 9 hnsecs
(file  )Done in 4 secs, 360 ms, 498 μs, and 3 hnsecs
(file  )Done in 4 secs, 393 ms, 342 μs, and 6 hnsecs

Another curious thing is that this behaviour is not present when compiling the code with the --build=profile option.

For reference:

➜ ldc2 --version
LDC - the LLVM D compiler (1.24.0):
  based on DMD v2.094.1 and LLVM 11.0.1
import std.file;
import core.thread.osthread;
import std.conv;
import std.concurrency;
import std.json;
import std.stdio;
import std.encoding;
import std.datetime.systime : Clock;
import std.process;
import std.functional;
import std.algorithm;
import std.bitmanip;



void parseInThread(string[] lines)
{
    writefln("(file %s) (thread id %s) starting processing file", "", thisThreadID);

    auto startTime = Clock.currTime;

    foreach (line; lines)
    {
        line.parseJSON;
    }

    writefln("(file %s )Done in %s", "", Clock.currTime - startTime);
}

class T
{
    Thread t_;
    string _filename;
    string[] _lines;

    this(string[] lines)
    {
        _lines = lines.dup;
        t_ = new Thread(() { parseInThread(_lines); });
    }

    void opCall()
    {
        t_.start;
    }

    void join()
    {
        t_.join;
    }
}

int main(string[] args)
{

    T[] threads;

    string filenameBase = "./file";
    foreach (k; 1 .. args[1].to!int + 1)
    {
        auto v = filenameBase ~ k.to!string;

        auto newFile = File(v ~ "", "r");

        string[] lines;

        foreach (line; newFile.byLine)
        {
            lines ~= (line.to!string);
        }
        newFile.close;

        threads ~= new T(lines);
    }

    foreach (thread; threads)
    {
        thread();
    }

    foreach (thread; threads)
    {
        thread.join;
    }

    return 0;
}

Thanks in advance, this has been annoying me for a couple of days and I have no idea what might be the problem. Strangely enough I also have the same problem when using vibe-d json library for parsing.

June 20, 2023

The program does 3 times more work and gets it done in 3 times more time: 1.5*3=4.5

June 20, 2023

On Tuesday, 20 June 2023 at 10:29:04 UTC, Kagamin wrote:

>

The program does 3 times more work and gets it done in 3 times more time: 1.5*3=4.5

Thanks for your reply.
I am using threads so the work should get done in the same time that it takes for one file to get processed, since its distributed across cores.

I even wrote a similar C++ program just to be sure, that performs as expected.

June 20, 2023

On Tuesday, 20 June 2023 at 10:39:42 UTC, Alexandre Bourquelot wrote:

>

On Tuesday, 20 June 2023 at 10:29:04 UTC, Kagamin wrote:

>

The program does 3 times more work and gets it done in 3 times more time: 1.5*3=4.5

Thanks for your reply.
I am using threads so the work should get done in the same time that it takes for one file to get processed, since its distributed across cores.

I even wrote a similar C++ program just to be sure, that performs as expected.

try preallocating the memory you need.
it might very well be that the GC allocation lock slows you down.

June 20, 2023

On Tuesday, 20 June 2023 at 09:31:57 UTC, Alexandre Bourquelot wrote:

>

Hello everyone. We have some D code running in production that reads files containing lines of JSON data, that we would like to parse and process.

These files can be processed in parallel, so we create one thread for processing each file. However I noticed significant slowdowns when processing multiple files in parallel, as opposed to processing only one file.

...

Thanks in advance, this has been annoying me for a couple of days and I have no idea what might be the problem. Strangely enough I also have the same problem when using vibe-d json library for parsing.

Yeah if you look with perf record, you will see that the program spends approximately all its runtime in the garbage collector. JSON parsing is very memory hungry. So you get no parallelization because the allocator takes a lock, and you also get the overhead of lots and lots of lock waits.

I recommend using a streaming JSON parser like std_data_json https://github.com/dlang-community/std_data_json and loading into a well-typed data structure directly, to keep the amount of unnecessary allocations to a minimum.

June 20, 2023

On Tuesday, 20 June 2023 at 09:31:57 UTC, Alexandre Bourquelot wrote:

>

Hello everyone. We have some D code running in production that reads files containing lines of JSON data, that we would like to parse and process.
Thanks in advance, this has been annoying me for a couple of days and I have no idea what might be the problem. Strangely enough I also have the same problem when using vibe-d json library for parsing.

Btw if you want really fast solution, I can recommend to try ASDF library.
There is also Mir-ION successor, but I haven't tried it.

With help of ASDF I was able to prepare almost the best solution for JSON serde problem. Also with low memory consumption!
https://programming-language-benchmarks.vercel.app/problem/json-serde

June 20, 2023

On 6/20/23 5:31 AM, Alexandre Bourquelot wrote:

>

Thanks in advance, this has been annoying me for a couple of days and I have no idea what might be the problem. Strangely enough I also have the same problem when using vibe-d json library for parsing.

The issue, undoubtedly, is memory allocation. Your JSON parsers (both std.json and vibe-d) allocate an AA for each object, and parse the entire string into a DOM structure. The D GC has a single global lock to allocate memory -- even memory that might be on a free list. So the threads are all bottlenecked on waiting their turn for the lock.

Depending on what you want to do, like others here, I'd recommend a stream-based json parser. And then you also don't have to split it into lines.

If the goal is to build a huge representation of all the data, then there's not much else to be done, unless you want to pre-allocate. But you may end up having to drive that yourself.

I can possibly recommend, in addition to what others have mentioned, my jsoniopipe library.

-Steve

June 21, 2023

On Wednesday, 21 June 2023 at 00:35:42 UTC, Steven Schveighoffer wrote:

>

The D GC has a single global lock to allocate memory -- even memory that might be on a free list. So the threads are all bottlenecked on waiting their turn for the lock.

This would be something that's important enough to list on the spec page for the GC: https://dlang.org/spec/garbage.html

It's only mentioned in passing here: https://dlang.org/articles/d-array-article.html#caching
in the sentence "not to mention acquiring the global GC lock".

In theory the GC is replaceable but I think we should document the behavior of the default one.

I'll submit an issue for it.

June 22, 2023

On Wednesday, 21 June 2023 at 00:35:42 UTC, Steven Schveighoffer wrote:

>

The issue, undoubtedly, is memory allocation. Your JSON parsers (both std.json and vibe-d) allocate an AA for each object, and parse the entire string into a DOM structure. The D GC has a single global lock to allocate memory -- even memory that might be on a free list. So the threads are all bottlenecked on waiting their turn for the lock.

This makes a lot of sense. I ended up using asdf and it works great.

Thank you everyone for your insight.