How to accelerate this program? (page 2)

In article <e0slt4$2cki$1@digitaldaemon.com>, Li Jie says... > >In article <e0rcej$10it$1@digitaldaemon.com>, Wang Zhen says... > >>Two improvements based on your first D version: >>0. Output in a separate loop. >>1. Remove the "if(!(email in emails))" check. >> >>Code: >> >>while(!feof(fin)){ >> fgets(cast(char*)buffer, READ_SIZE, fin); >> emails[toString(buffer)] = 0; >>} >>foreach(char[] k, int v; emails) >> fputs(cast(char*)k, fout); > >Thanks. > >It takes 1080 ms on my system, it's not fast enough. >I think "cast(char*)(char[])" and "toString" called too much, and it's very >slowly. > That won't give the correct output because the buffer is overwritten with each fgets and a memcpy is *not* done somewhere in the background for the AA keys.

In article <e0ssod$2q8b$1@digitaldaemon.com>, Dave says... >Please send: > >- the output from 'dmd -v' lijie t # dmd -v Digital Mars D Compiler v0.150 Copyright (c) 1999-2006 by Digital Mars written by Walter Bright Documentation: www.digitalmars.com/d/index.html Usage: dmd files.d ... { -switch } .... >- output from 'll /usr/lib/libphobos.a'. lijie t # ll /usr/lib/libphobos.a -rw-r--r-- 1 root root 1157824 4æœˆ 4 13:05 /usr/lib/libphobos.a >- what type of system is 'your system' My system is gentoo linux 2006.0 Linux kernel version is 2.6.15 g++ version is 3.4.4 dmd version is 0.150 >- how large is the email file in bytes lijie t # ll email.txt -rw-r--r-- 1 root root 19388869 4æœˆ 3 09:27 email.txt >- which c++ compiler and flags are you using? I'm using -O2, it takes ~680 ms. And change to -O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4, it takes ~630 ms. >- dmd compiler flags for your test. dmd testd.d -O -release -inline some tests: ======================================= lijie t # g++ -o test7 test7.cpp -O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4 lijie t # ./test7 email.txt email-new.txt Total used: 637 ms. lijie t # ./test7 email.txt email-new.txt Total used: 628 ms. lijie t # ./test7 email.txt email-new.txt Total used: 629 ms. lijie t # ./test7 email.txt email-new.txt Total used: 639 ms. lijie t # ./test7 email.txt email-new.txt Total used: 627 ms. lijie t # dmd dave_test.d -O -release -inline gcc dave_test.o -o dave_test -lphobos -lpthread -lm lijie t # ./dave_test email.txt email-new.txt 3360 lijie t # ./dave_test email.txt email-new.txt 3315 lijie t # ./dave_test email.txt email-new.txt 3344 lijie t # ./dave_test email.txt email-new.txt 3333 lijie t # ./dave_test email.txt email-new.txt 3398 Thanks.

April 04, 2006

Re: How to accelerate this program?

Posted by Unknown W. Brackets
in reply to Li Jie

Permalink

Unknown W. Brackets

Posted in reply to Li Jie

Permalink

DMD 0.150 had some inlining bugs.  Try upgrading to 0.151.

Thanks,
-[Unknown]


> In article <e0ssod$2q8b$1@digitaldaemon.com>, Dave says...
>> Please send:
>>
>> - the output from 'dmd -v' 
> lijie t # dmd -v
> Digital Mars D Compiler v0.150
> Copyright (c) 1999-2006 by Digital Mars written by Walter Bright
> Documentation: www.digitalmars.com/d/index.html
> Usage:
> dmd files.d ... { -switch }
> .... 
> 
>> - output from 'll /usr/lib/libphobos.a'.
> lijie t # ll /usr/lib/libphobos.a
> -rw-r--r--  1 root root 1157824  4æœˆ  4 13:05 /usr/lib/libphobos.a
> 
>> - what type of system is 'your system'
> My system is gentoo linux 2006.0
> Linux kernel version is 2.6.15
> g++ version is 3.4.4
> dmd version is 0.150
> 
>> - how large is the email file in bytes
> lijie t # ll email.txt
> -rw-r--r--  1 root root 19388869  4æœˆ  3 09:27 email.txt
> 
>> - which c++ compiler and flags are you using?
> I'm using -O2, it takes ~680 ms.
> And change to -O3 -fomit-frame-pointer -funroll-loops -mtune=pentium4, it takes
> ~630 ms.
> 
>> - dmd compiler flags for your test.
> dmd testd.d -O -release -inline
> 
> 
> some tests:
> =======================================
> lijie t # g++ -o test7 test7.cpp -O3 -fomit-frame-pointer -funroll-loops
> -mtune=pentium4
> lijie t # ./test7 email.txt email-new.txt
> Total used: 637 ms.
> lijie t # ./test7 email.txt email-new.txt
> Total used: 628 ms.
> lijie t # ./test7 email.txt email-new.txt
> Total used: 629 ms.
> lijie t # ./test7 email.txt email-new.txt
> Total used: 639 ms.
> lijie t # ./test7 email.txt email-new.txt
> Total used: 627 ms.
> 
> lijie t # dmd dave_test.d -O -release -inline
> gcc dave_test.o -o dave_test -lphobos -lpthread -lm
> lijie t # ./dave_test email.txt email-new.txt
> 3360
> lijie t # ./dave_test email.txt email-new.txt
> 3315
> lijie t # ./dave_test email.txt email-new.txt
> 3344
> lijie t # ./dave_test email.txt email-new.txt
> 3333
> lijie t # ./dave_test email.txt email-new.txt
> 3398
> 
> 
> Thanks.
> 
>

Dave wrote: > From what I've seen, the bottleneck is probably in I/O. Use BufferedFile and a > buffer for each readline. C's FILE and fopen also create a buffered stream, so I doubt that will make a difference. But why the sort?? Try this instead: #import std.stdio; #import std.perf; #import std.stream; # #int main(char[][] argv) #{ # if (argv.length < 3) # { # writefln("Wrong arguments"); # return 1; # } # # char[8192] bufr; # int[char[]] emails; # char[] email; # # PerformanceCounter counter = new PerformanceCounter(); # counter.start(); # # BufferedFile bsi = new BufferedFile(argv[1]); # BufferedFile bso = new BufferedFile(argv[2],FileMode.Out); # while(!bsi.eof) # { # email = bsi.readLine(bufr); // bufr is key to perf. # if (!(email in emails)) # { # emails[email.dup] = 0; // Note .dup # bso.writeLine(email); # } # } # bso.close; # bsi.close; # # counter.stop(); # writefln(counter.milliseconds()); # # return 0; #}

In article <e0tkil$kkn$1@digitaldaemon.com>, Lionello Lunesu says... > >Dave wrote: >> From what I've seen, the bottleneck is probably in I/O. Use BufferedFile and a buffer for each readline. > >C's FILE and fopen also create a buffered stream, so I doubt that will make a difference. Yea, but it does make about a 15% difference (with readLine(bufr) faster). I/O is definately not the bottleneck though - my test data was screwed up in that it was only doing 4 AA inserts, which is the actual bottleneck. > >But why the sort?? Try this instead: You're right that is not needed. I saw the sort in the C++ version (but didn't look closely enough at what it was doing). > >#import std.stdio; >#import std.perf; >#import std.stream; ># >#int main(char[][] argv) >#{ ># if (argv.length < 3) ># { ># writefln("Wrong arguments"); ># return 1; ># } ># ># char[8192] bufr; ># int[char[]] emails; ># char[] email; ># ># PerformanceCounter counter = new PerformanceCounter(); ># counter.start(); ># ># BufferedFile bsi = new BufferedFile(argv[1]); ># BufferedFile bso = new BufferedFile(argv[2],FileMode.Out); ># while(!bsi.eof) ># { ># email = bsi.readLine(bufr); // bufr is key to perf. ># if (!(email in emails)) ># { ># emails[email.dup] = 0; // Note .dup ># bso.writeLine(email); ># } ># } ># bso.close; ># bsi.close; ># ># counter.stop(); ># writefln(counter.milliseconds()); ># ># return 0; >#}

In article <e0svke$2tok$1@digitaldaemon.com>, Li Jie says... > >In article <e0ssod$2q8b$1@digitaldaemon.com>, Dave says... >>Please send: >> Something was screwy - my test data! <g> My test file wasn't even close to what it should've been - the bottleneck is in the AA operations. I apologize for any wasted time... - Dave

Dave wrote: > Lionello Lunesu says... >> Dave wrote: >> >>> From what I've seen, the bottleneck is probably in I/O. Use >>> BufferedFile and a buffer for each readline. >> >> C's FILE and fopen also create a buffered stream, so I doubt that >> will make a difference. > > Yea, but it does make about a 15% difference (with readLine(bufr) > faster). I/O is definately not the bottleneck though - my test data > was screwed up in that it was only doing 4 AA inserts, which is the > actual bottleneck. If I'd do this program, the first thing I'd look for is the "sector size" of the media from which the file is read. Then I'd have the buffer that size.

Forums