On Fri, Jun 9, 2017 at 9:34 AM, uncorroded via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com> wrote:
Hi guys,

I am a beginner in D. As a project, I converted a log-parsing script in Python which we use at work, to D. This link was helpful - ( https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ ) I compiled it with dmd and ldc. The log file is 52 MB. With dmd (not release build), it takes 1.1 sec and with ldc, it takes 0.3 sec.

The Python script (run with system python, not Pypy) takes 0.75 sec. The D and Python functions are here and on pastebin ( D - https://pastebin.com/SeUR3wFP , Python - https://pastebin.com/F5JbfBmE ).

Basically, i am reading a line, checking for 2 constants. If either one is found, some processing is done on line and stored to an array for later analysis. I tried reading the file entirely in one go using std.file : readText and using std.algorithm : splitter for lazily splitting newline but there is no difference in speed, so I used the byLine approach mentioned in the linked blog. Is there a better way of doing this in D?

There is no difference in speed because you do not process your data lazily, so you make many allocations, so this is main reason why it is so slow. I could improve that, but I will need to see some example data, which you are trying to parse.

But some rules,
1.) instead of ~= you shoud use std.array.appender
2.) instead of std.string.split you could use std.algorithm.splitter or std.algorithm.findSplit
3.) instead of indexOf I would use std.algorithm.startsWith (in case it is on the begining of the line)