Thread overview
Issues with std.net.curl on Win 10 x64
Mar 25, 2019
cptgrok
Mar 25, 2019
Andre Pany
Mar 25, 2019
cptgrok
Mar 25, 2019
Seb
Mar 25, 2019
Boris Carvajal
March 25, 2019
I need to review syslogs for over 160 systems monthly, and I am trying to write a utility to automate bulk downloads from a custom web service where they are hosted. I need to calculate a date range for the prior month, add start and end date and a serial number to the query string for each system, which is easy, and in a foreach(system; systems) loop in main() I call a function passing the string url in to download and write a log to file. For a small number of systems, it works.

My trouble is, using std.net.curl, if I use get(URL) to get the entire text in a single call and write to file, memory usage spirals out of control immediately and within 20 or so calls, gets to about 1.3 GB and the program crashes. If I use byLineAsync(URL), then foreach(line; range) write the lines to file one at a time the memory usage never gets above 5MB but it just hangs always at the 51st call in the loop regardless of what parameters are in the query string, or how much data I have downloaded. The program never terminates, even after hours, but I can't see ANY activity on the process, CPU, mem or network. I can break my download jobs into <=50 systems (and it seems to work), but that seems like sweeping something under the rug, probably leading to future issues.

I'm using the 32bit binary from libcurl-7.64.0-WinSSL-zlib-x86-x64.zip on the release archive, and DMD 2.085.0. I've tried curl 7.63 and 7.57 but the behavior is the same.

Am I doing something wrong or is there some issue with curl or something else? I'm pretty new to D and I'm not sure if I need to go right down to raw sockets and re-invent the wheel or if there is some other library that can help. If I get this working, it could potentially save myself and many others hours per week.
March 25, 2019
On Monday, 25 March 2019 at 16:25:37 UTC, cptgrok wrote:
> I need to review syslogs for over 160 systems monthly, and I am trying to write a utility to automate bulk downloads from a custom web service where they are hosted. I need to calculate a date range for the prior month, add start and end date and a serial number to the query string for each system, which is easy, and in a foreach(system; systems) loop in main() I call a function passing the string url in to download and write a log to file. For a small number of systems, it works.
>
> [...]

First idea, please switch to x86_64 if possible. This will also be the default of Dub in the next dmd release or the release after.

Kind regards
Andrew
March 25, 2019
On Monday, 25 March 2019 at 16:44:12 UTC, Andre Pany wrote:
> First idea, please switch to x86_64 if possible. This will also be the default of Dub in the next dmd release or the release after.
>
> Kind regards
> Andrew

Figured out --arch=x86_64, thanks! Sadly I don't see any change. I'm not having luck finding known curl issues similar to what I am experiencing. I have a sneaking suspicion that the web service I am using is doing some nonsense in the background. Might try a packet capture to better see what's up.
March 25, 2019
On Monday, 25 March 2019 at 19:02:18 UTC, cptgrok wrote:
> On Monday, 25 March 2019 at 16:44:12 UTC, Andre Pany wrote:
>> First idea, please switch to x86_64 if possible. This will also be the default of Dub in the next dmd release or the release after.
>>
>> Kind regards
>> Andrew
>
> Figured out --arch=x86_64, thanks! Sadly I don't see any change. I'm not having luck finding known curl issues similar to what I am experiencing. I have a sneaking suspicion that the web service I am using is doing some nonsense in the background. Might try a packet capture to better see what's up.

Alternatively, you could always give requests a shot:

https://code.dlang.org/packages/requests

It's the unofficial successor of std.net.curl.
March 25, 2019
On Monday, 25 March 2019 at 16:25:37 UTC, cptgrok wrote:
> Am I doing something wrong or is there some issue with curl or something else? I'm pretty new to D and I'm not sure if I need to go right down to raw sockets and re-invent the wheel or if there is some other library that can help. If I get this working, it could potentially save myself and many others hours per week.

There is a limit of 50 concurrent messages per thread [1] in byLineAsync also the transmitBuffers argument takes part in.
So using multiple byLineAsync at same time/thread is going to block the process, I'm not sure if this is a bug or is by design.

You could use download() in a parallel foreach, something like this:

import std.stdio;
import std.parallelism;
import std.net.curl;
import std.typecons;

void main()
{
    auto connections = 3; // 3 parallel downloads
    defaultPoolThreads(connections - 1);
    auto retries = 4; // try up to 4 times if it fails
    auto logList = [
                tuple("dlang.org", "log1.txt"), tuple("dlang.org", "log2.txt"),
                tuple("dlang.org", "log3.txt"), tuple("dlang.org", "log4.txt"),
                tuple("dlang.org", "log5.txt"), tuple("dlang.org", "log6.txt")];

    foreach (log; parallel(logList, 1))
    {
        HTTP conn = HTTP();

        foreach (i; 0 .. retries)
        {
            try
            {
                writeln("Downloading ", log[0]);
                download(log[0], log[1], conn);

                if(conn.statusLine.code == 200)
                {
                    writeln("File ", log[1], " created.");
                    break;
                }
            }
            catch (CurlException e)
            {
                writeln("Retrying ", log[0]);
            }
        }
    }
}

[1] https://github.com/dlang/phobos/blob/master/std/net/curl.d#L1679