how to be faster than perl? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » how to be faster than perl?

Thread overview

how to be faster than perl?
Jan 30, 2007 Boris Bukowski
Jan 30, 2007 Frits van Bommel
Jan 30, 2007 Boris Bukowski
Jan 30, 2007 Frits van Bommel
Jan 30, 2007 mario pernici
Jan 31, 2007 Boris Bukowski
Jan 30, 2007 mario pernici
Jan 31, 2007 Unknown W. Brackets
Jan 31, 2007 Derek Parnell
Jan 31, 2007 David Medlock
Feb 01, 2007 Dejan Lekic
Feb 01, 2007 Frits van Bommel
Feb 01, 2007 David Medlock

January 30, 2007

how to be faster than perl?

Posted by Boris Bukowski

Boris Bukowski

Hi,

currently I am testing D for log processing.
My perl script is more than ten times faster than my D Prog.
How can I get Lines faster from a File?

Boris

---snip---
private import std.stream;
private import std.stdio;
private import std.string;

void main (char[][] args) {
        int c;
        Stream file = new BufferedFile(args[1]);
        foreach(ulong n, char[] line; file) {
                if(std.regexp.find(line, "horizontal") > -1){
                        c++;
                }
        }

        writefln("%d", c);

}
---snip---

#!/usr/bin/perl

while($line=<>) {
        if ($line=~/horizontal/) {
                $c++;
        }
}

print "$c\n";

---snip---

January 30, 2007

Re: how to be faster than perl?

Posted by Frits van Bommel
in reply to Boris Bukowski

Frits van Bommel

Posted in reply to Boris Bukowski

Boris Bukowski wrote:
> currently I am testing D for log processing.
> My perl script is more than ten times faster than my D Prog.
> How can I get Lines faster from a File?

I don't think the file handling is your problem.

> ---snip---
> private import std.stream;
> private import std.stdio;
> private import std.string;
> 
> void main (char[][] args) {
>         int c;
>         Stream file = new BufferedFile(args[1]);
>         foreach(ulong n, char[] line; file) {
>                 if(std.regexp.find(line, "horizontal") > -1){
>                         c++;
>                 }
>         }
> 
>         writefln("%d", c);
> 
> }

std.regexp.find instantiates a RegExp object, compiles the regex and uses it once, then deletes it. This is fine for one-time searches, but if you're using it for each line of a file, you're allocating and deleting an object for each line and performing unnecessary work to recompile the same regex over and over.

Try something like this instead:
-----
// (untested code)
auto re = new RegExp("horizontal");
foreach (ulong n, char[] line; file) {
    if (re.find(line) > -1) {
// ...
-----
as the start of your foreach loop.
That should be faster.

I don't know how fast it'll be compared to Perl; I don't know anything about the relative performance of D vs. Perl regexes. (In fact, I hardly ever use regexes, and have never used Perl)

January 30, 2007

Re: how to be faster than perl?

Posted by Boris Bukowski
in reply to Frits van Bommel

Boris Bukowski

Posted in reply to Frits van Bommel

> // (untested code)
> auto re = new RegExp("horizontal");
> foreach (ulong n, char[] line; file) {
>      if (re.find(line) > -1) {
> // ...
> -----
> as the start of your foreach loop.
> That should be faster.
> 
> I don't know how fast it'll be compared to Perl; I don't know anything about the relative performance of D vs. Perl regexes. (In fact, I hardly ever use regexes, and have never used Perl)

buko01@dizit:~/d$ time ./lineio.pl access.log
1087

real    0m0.105s
user    0m0.092s
sys     0m0.012s
buko01@dizit:~/d$ time ./lineio2 access.log
1087

real    0m1.547s
user    0m1.528s
sys     0m0.020s

still 15 times slower :-(
Perl strings/IO must be somehow black magic.
Looks like I have to write my own lineReader.

Boris

January 30, 2007

Re: how to be faster than perl?

Posted by Frits van Bommel
in reply to Boris Bukowski

Frits van Bommel

Posted in reply to Boris Bukowski

Boris Bukowski wrote:
> buko01@dizit:~/d$ time ./lineio.pl access.log
> 1087
> 
> real    0m0.105s
> user    0m0.092s
> sys     0m0.012s
> buko01@dizit:~/d$ time ./lineio2 access.log
> 1087
> 
> real    0m1.547s
> user    0m1.528s
> sys     0m0.020s
> 
> still 15 times slower :-(
> Perl strings/IO must be somehow black magic.
> Looks like I have to write my own lineReader.

Some obvious questions:

Did you use -O -inline? If not, try those. I don't think they'll make much difference.

Do you actually search for "horizontal" (or a similar fixed string) ? To search for a non-regex string, std.string.find will likely be faster.

Other than that, I'm out of ideas.

IIRC Perl compiles regexes inline, presumably optimizing them along with the rest of the code, so that might explain why it's faster. This sort of stuff is what Perl was designed for...

January 30, 2007

Re: how to be faster than perl?

Posted by mario pernici
in reply to Boris Bukowski

mario pernici

Posted in reply to Boris Bukowski

Boris Bukowski Wrote:

> 
> > // (untested code)
> > auto re = new RegExp("horizontal");
> > foreach (ulong n, char[] line; file) {
> >      if (re.find(line) > -1) {
> > // ...
> > -----
> > as the start of your foreach loop.
> > That should be faster.
> > 
> > I don't know how fast it'll be compared to Perl; I don't know anything about the relative performance of D vs. Perl regexes. (In fact, I hardly ever use regexes, and have never used Perl)
> 
> buko01@dizit:~/d$ time ./lineio.pl access.log
> 1087
> 
> real    0m0.105s
> user    0m0.092s
> sys     0m0.012s
> buko01@dizit:~/d$ time ./lineio2 access.log
> 1087
> 
> real    0m1.547s
> user    0m1.528s
> sys     0m0.020s
> 
> still 15 times slower :-(
> Perl strings/IO must be somehow black magic.
> Looks like I have to write my own lineReader.
> 
> Boris

Hello,
on my PC the D example is faster than the Perl one, with the data produced by
the Python script

f = open('data','w')
for j in range(1087):
  for i in range(100):
    f.write("%d\n" % i)
  f.write("horizontal\n")
  for i in range(100):
    f.write("%d\n" % i)
f.close()

the Perl example takes on my PC  0.068s,  the D example with auto re takes 0.068s.

Bye
   Mario

January 30, 2007

Re: how to be faster than perl?

Posted by mario pernici
in reply to Boris Bukowski

mario pernici

Posted in reply to Boris Bukowski

Boris Bukowski Wrote:

> 
> > // (untested code)
> > auto re = new RegExp("horizontal");
> > foreach (ulong n, char[] line; file) {
> >      if (re.find(line) > -1) {
> > // ...
> > -----
> > as the start of your foreach loop.
> > That should be faster.
> > 
> > I don't know how fast it'll be compared to Perl; I don't know anything about the relative performance of D vs. Perl regexes. (In fact, I hardly ever use regexes, and have never used Perl)
> 
> buko01@dizit:~/d$ time ./lineio.pl access.log
> 1087
> 
> real    0m0.105s
> user    0m0.092s
> sys     0m0.012s
> buko01@dizit:~/d$ time ./lineio2 access.log
> 1087
> 
> real    0m1.547s
> user    0m1.528s
> sys     0m0.020s
> 
> still 15 times slower :-(
> Perl strings/IO must be somehow black magic.
> Looks like I have to write my own lineReader.
> 
> Boris


CORRECTION:
on my PC the D example is faster than the Perl one, with the data produced by
the Python script

f = open('data','w')
for j in range(1087):
  for i in range(100):
    f.write("%d\n" % i)
  f.write("horizontal\n")
  for i in range(100):
    f.write("%d\n" % i)
f.close()

the Perl example takes on my PC  0.148s,  the D example with auto re takes 0.068s.

Bye
   Mario

January 31, 2007

Re: how to be faster than perl?

Posted by Unknown W. Brackets
in reply to Boris Bukowski

Unknown W. Brackets

Posted in reply to Boris Bukowski

I'm a bit tired, but does BufferedFile's opApply use a fixed buffer?  I doubt it does.  In this case, the foreach method is going to be a lot slower than reading lines into a buffer.

Check on the other methods of BufferedFile.

Sorry, I'd give a code example but I'm just doing a drive by.

-[Unknown]


> Hi,
> 
> currently I am testing D for log processing.
> My perl script is more than ten times faster than my D Prog.
> How can I get Lines faster from a File?
> 
> Boris 
> 
> ---snip---
> private import std.stream;
> private import std.stdio;
> private import std.string;
> 
> void main (char[][] args) {
>         int c;
>         Stream file = new BufferedFile(args[1]);
>         foreach(ulong n, char[] line; file) {
>                 if(std.regexp.find(line, "horizontal") > -1){
>                         c++;
>                 }
>         }
> 
>         writefln("%d", c);
> 
> }
> ---snip---
> 
> #!/usr/bin/perl
> 
> while($line=<>) {
>         if ($line=~/horizontal/) {
>                 $c++;
>         }
> }
> 
> print "$c\n";
> 
> ---snip---
>

January 31, 2007

Re: how to be faster than perl?

Posted by Derek Parnell
in reply to Boris Bukowski

Derek Parnell

Posted in reply to Boris Bukowski

On Tue, 30 Jan 2007 13:21:53 +0100, Boris Bukowski wrote:

> currently I am testing D for log processing.
> My perl script is more than ten times faster than my D Prog.
> How can I get Lines faster from a File?

Your example code seemed to be trying to count the number of times a certain string occurred in a file so I didn't bother with working with 'lines' as such. Anyhow, here is one way to do it...

// findtext.d ---------
private import std.file;
private import std.stdio;
private import std.string;

void main (char[][] args) {
        char[] lFileText; // Buffer for file contents.

        int lCnt;   // Number if hits
        int lPos;   // Found at position, or Not Found flag.
        int lFrom;  // Where in the file to look from.

        // Grab the whole file into RAM
        lFileText = cast(char[]) std.file.read(args[1]);

        // Start scanning for the substring.
        lFrom = 0;
        while(lFrom < lFileText.length)
        {
            lPos = std.string.find(lFileText[lFrom..$], args[2]);
            if (lPos != -1)
            {
                // Adjust next starting position.
                lFrom += lPos + args[2].length;
                // And count the hits, of course.
                lCnt++;
            }
            else
            {
                // Force end of scanning.
                lFrom = lFileText.length;
            }
        }

        writefln("Count of '%s' found in '%s': %d",
                        args[2], args[1], lCnt);
}

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
31/01/2007 7:18:25 PM

January 31, 2007

Re: how to be faster than perl?

Posted by Boris Bukowski
in reply to mario pernici

Boris Bukowski

Posted in reply to mario pernici

mario pernici wrote:

> Boris Bukowski Wrote:
> 
>> 
>> > // (untested code)
>> > auto re = new RegExp("horizontal");
>> > foreach (ulong n, char[] line; file) {
>> >      if (re.find(line) > -1) {
>> > // ...
>> > -----
>> > as the start of your foreach loop.
>> > That should be faster.
>> > 
>> > I don't know how fast it'll be compared to Perl; I don't know anything about the relative performance of D vs. Perl regexes. (In fact, I hardly ever use regexes, and have never used Perl)
>> 
>> buko01@dizit:~/d$ time ./lineio.pl access.log
>> 1087
>> 
>> real    0m0.105s
>> user    0m0.092s
>> sys     0m0.012s
>> buko01@dizit:~/d$ time ./lineio2 access.log
>> 1087
>> 
>> real    0m1.547s
>> user    0m1.528s
>> sys     0m0.020s
>> 
>> still 15 times slower :-(
>> Perl strings/IO must be somehow black magic.
>> Looks like I have to write my own lineReader.
>> 
>> Boris
> 
> Hello,
> on my PC the D example is faster than the Perl one, with the data produced
> by the Python script
> 
> f = open('data','w')
> for j in range(1087):
>   for i in range(100):
>     f.write("%d\n" % i)
>   f.write("horizontal\n")
>   for i in range(100):
>     f.write("%d\n" % i)
> f.close()
> 
> the Perl example takes on my PC  0.068s,  the D example with auto re takes 0.068s.

Hi,

with that generated data file D is faster, cause perl spends more time in
the loop.
I use a 20MB squid access log for testing.
looks like I have to write my own readline for this.

Boris

January 31, 2007

Re: how to be faster than perl?

Posted by David Medlock
in reply to Boris Bukowski

David Medlock

Posted in reply to Boris Bukowski

Boris Bukowski wrote:
> Hi,
> 
> currently I am testing D for log processing.
> My perl script is more than ten times faster than my D Prog.
> How can I get Lines faster from a File?
> 
> Boris 
> 
> ---snip---
> private import std.stream;
> private import std.stdio;
> private import std.string;
> 
> void main (char[][] args) {
>         int c;
>         Stream file = new BufferedFile(args[1]);
>         foreach(ulong n, char[] line; file) {
>                 if(std.regexp.find(line, "horizontal") > -1){
>                         c++;
>                 }
>         }
> 
>         writefln("%d", c);
> 
> }
> ---snip---
> 
> #!/usr/bin/perl
> 
> while($line=<>) {
>         if ($line=~/horizontal/) {
>                 $c++;
>         }
> }
> 
> print "$c\n";
> 
> ---snip---
> 

I am too lazy to look but does the regexp module cache any regexes passed to it?  Otherwise thats probably the major slowdown.

I am pretty sure all 'fixed' regexen in Perl are  pre-compiled into the AST so they aren't re-evaluated each time they are used.

I may be wrong though, I've only been using Perl about 7 months(and I despise it).

-DavidM

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation