Jump to page: 1 2
Thread overview
[Issue 4721] New: compilation slow when compiling unittests on dcollections
Aug 24, 2010
Andrej Mitrovic
Aug 24, 2010
Andrej Mitrovic
Aug 24, 2010
Andrej Mitrovic
Aug 26, 2010
Walter Bright
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721

           Summary: compilation slow when compiling unittests on
                    dcollections
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: schveiguy@yahoo.com


--- Comment #0 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-24 06:44:06 PDT ---
When compiling dcollections unit tests, it takes over 1 minute to compile the small library.  I am unsure what the exact trigger is to make it slow.  At the request of Walter, I profiled the compiler to try and see where the slowdown was.  Here are the first few lines of the results using gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 77.76      6.68     6.68     2952     2.26     2.26  elf_findstr(Outbuffer*,
char const*, char const*)
  2.10      6.86     0.18     4342     0.04     0.04  searchfixlist
  1.28      6.97     0.11   663755     0.00     0.00  ScopeDsymbol::search(Loc,
Identifier*, int)
  1.05      7.06     0.09  2623497     0.00     0.00  isType(Object*)
  0.76      7.12     0.07   911667     0.00     0.00  match(Object*, Object*,
TemplateDeclaration*, Scope*)
  0.76      7.19     0.07   656268     0.00     0.00  _aaGetRvalue(AA*, void*)
  0.58      7.24     0.05  2507041     0.00     0.00  isTuple(Object*)

Walter said it looks like elf_findstr is a problem area and asked me to file
this bug.  The relevant discussion is here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=116007

All I can say to duplicate this is to download the latest svn of dcollections and compile unit tests.  If you are running the profiled version, it takes a long long time to complete (I gave up after 5 minutes).  I suggest compiling just one class, HashMap.  The compile time for this takes 4 seconds.  The appropriate commands to run are:

svn co http://svn.dsource.org/projects/dcollections/branches/d2 dcollections
cd dcollections
dmd -unittest dcollections/HashMap.d dcollections/Hash.d
dcollections/Iterators.d dcollections/model/*

The compilation will fail, because there is no main function, but it will demonstrate the problem well enough.

If you want to compile the entire unit test suite, use the command:

./build-lib-linux.sh unittest

Which builds and runs the unit tests for the whole library.  This takes over a minute to compile.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721


Andrej Mitrovic <andrej.mitrovich@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich@gmail.com


--- Comment #1 from Andrej Mitrovic <andrej.mitrovich@gmail.com> 2010-08-24 07:31:19 PDT ---
I don't know about Linux (I might try it in a VM if you want me to), but on Windows I can compile the collection in 27 seconds:

C:\d2\dcollections>timeit build-lib-win32.bat unittest unittest

C:\d2\dcollections>dmd -unittest unit_test.d  dcollections\ArrayList.d
dcollections\DefaultAllocator
.d dcollections\DefaultFunctions.d dcollections\Hash.d dcollections\HashMap.d
dcollections\HashMulti
set.d dcollections\HashSet.d dcollections\Iterators.d dcollections\Link.d
dcollections\LinkList.d dc
ollections\RBTree.d dcollections\TreeMap.d dcollections\TreeMultiset.d
dcollections\TreeSet.d dcolle
ctions\model\Addable.d dcollections\model\Iterator.d dcollections\model\Keyed.d
dcollections\model\L
ist.d dcollections\model\Map.d dcollections\model\Multiset.d
dcollections\model\Set.d
running unit tests...

Version Number:   Windows NT 5.1 (Build 2600)
Exit Time:        4:29 pm, Tuesday, August 24 2010
Elapsed Time:     0:00:27.328
Process Time:     0:00:00.015
System Calls:     161445
Context Switches: 36381
Page Faults:      165602
Bytes Read:       10062897
Bytes Written:    5792330
Bytes Other:      131570

This is with DMD 2.046 as stated in the Readme. Do you need my system specs?


With the newer 2.048 I get errors:

C:\d2>build-lib-win32.bat unittest
unittest

C:\d2>dmd -unittest unit_test.d  dcollections\ArrayList.d
dcollections\DefaultAllocator.d
ctions\Hash.d dcollections\HashMap.d dcollections\HashMultiset.d
dcollections\HashSet.d d
nk.d dcollections\LinkList.d dcollections\RBTree.d dcollections\TreeMap.d
dcollections\Tr
llections\model\Addable.d dcollections\model\Iterator.d
dcollections\model\Keyed.d dcolle
Map.d dcollections\model\Multiset.d dcollections\model\Set.d
std.contracts has been scheduled for deprecation. Please use std.exception
instead.
dcollections\HashSet.d(28): Error: template std.algorithm.find(alias pred = "a
== b",R,E)
ryFun!(pred)(haystack.front,needle)) : bool)) does not match any function
template declar
dcollections\HashSet.d(28): Error: template std.algorithm.find(alias pred = "a
== b",R,E)
ryFun!(pred)(haystack.front,needle)) : bool)) cannot deduce template function
from argume
),ubyte)
dcollections\HashSet.d(301): Error: template instance
dcollections.HashSet.rangeEqual!(ub
dcollections\HashSet.d(996):        instantiated from here: HashSet!(ubyte)
running unit tests...
'.\unit_test' is not recognized as an internal or external command,
operable program or batch file.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #2 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-24 07:49:15 PDT ---
It might be a linux-only problem, I'm not sure.  I also have an older machine, maybe yours is twice as fast?

27 seconds still seems quite long for a small library.

Re the errors, I'm working on them now, I just downloaded 2.048.  Apparently there is some issue with AssumeSorted not working as a range anymore...  It's not critical to the library, but the unit tests use it.  I'm just removing the assumeSorted for now (as unit test performance isn't important, it was more the "correct" thing to do), asked about it in d.learn.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #3 from Andrej Mitrovic <andrej.mitrovich@gmail.com> 2010-08-24 09:55:25 PDT ---
(In reply to comment #2)
> It might be a linux-only problem, I'm not sure.  I also have an older machine, maybe yours is twice as fast?
> 
> 27 seconds still seems quite long for a small library.
> 
> Re the errors, I'm working on them now, I just downloaded 2.048.  Apparently there is some issue with AssumeSorted not working as a range anymore...  It's not critical to the library, but the unit tests use it.  I'm just removing the assumeSorted for now (as unit test performance isn't important, it was more the "correct" thing to do), asked about it in d.learn.

Yeah I guess it could be rather long, there's not that many size and they're mostly small from what I can tell.

Btw your updated d2.0b library now compiles in 11 seconds on my system with DMD 2.048.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #4 from Andrej Mitrovic <andrej.mitrovich@gmail.com> 2010-08-24 09:55:55 PDT ---
* not that many files, not size.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #5 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-24 10:13:43 PDT ---
(In reply to comment #3)
y small from what I can tell.
> 
> Btw your updated d2.0b library now compiles in 11 seconds on my system with DMD 2.048.

Still over a minute for me on Linux 2.048.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #6 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-24 14:10:26 PDT ---
More testing, I did some printf debugging.

According to the comments and the code of the function, it's a *linear* search through the symbol table for a match to a given symbol + suffix.

The symbol table is a null-separated single buffer.  So not only is it linear through the strings, but it's linear on the *characters*.  That is, when comparing the current symbol and it's a mismatch, the code must iterate through all the characters anyways to find the next null character.

I added some printouts to determine the eventual size of the symbol table, and the number of times the function is called.  I also added printouts to show the number of matches found (those should theoretically not do a linear search through the entire table, but likely would search through most of it).

The eventual numbers for dcollections:

symbol table size: 4,253,953
number of calls:   12,677
number of matches: 648

doing the math, thats probably conservatively about 20 billion loop iterations -- way unacceptable for something that should be done via a hash lookup, or at least a tree/trie/binary search.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721


Walter Bright <bugzilla@digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |bugzilla@digitalmars.com
         Resolution|                            |FIXED


--- Comment #7 from Walter Bright <bugzilla@digitalmars.com> 2010-08-25 23:13:11 PDT ---
Changed to a hash table.

http://www.dsource.org/projects/dmd/changeset/628

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #8 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-26 06:22:33 PDT ---
This helps, but only reduces it to 20 seconds (but a 66% reduction is pretty good!).  I ran another round of profiling, and found we have a new bottleneck:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 75.79      6.51     6.51     8103     0.80     0.80
TemplateDeclaration::toJsonBuffer(OutBuffer*)
  3.14      6.78     0.27  1668093     0.00     0.00
StructDeclaration::semantic(Scope*)
  2.10      6.96     0.18        1   180.00   180.00  do32bit(FL, evc*, int)
  1.98      7.13     0.17    15445     0.01     0.01
EnumDeclaration::toJsonBuffer(OutBuffer*)
  0.70      7.19     0.06   656268     0.00     0.00
Port::isSignallingNan(long double)
  0.47      7.23     0.04   915560     0.00     0.00
StructDeclaration::toCBuffer(OutBuffer*, HdrGenState*)
  0.47      7.27     0.04                             Dsymbol::searchX(Loc,
Scope*, Identifier*)

This time, I question whether toJsonBuffer should be called at all, as I'm not outputting any JSON data?

Walter, reopen if you think this could still use improvement.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4721



--- Comment #9 from Steven Schveighoffer <schveiguy@yahoo.com> 2010-08-26 11:42:02 PDT ---
(In reply to comment #8)
> Flat profile:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  75.79      6.51     6.51     8103     0.80     0.80
> TemplateDeclaration::toJsonBuffer(OutBuffer*)
>   3.14      6.78     0.27  1668093     0.00     0.00
> StructDeclaration::semantic(Scope*)

That table of functions is invalid -- I somehow compiled dmd wrong when running that profile.  I think I used a unit test build.  However, that was only the profiled version, the non-profiled version still takes 20 seconds to compile dcollections.

In any case, here is the correctly profiled version's heavy hitters:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 80.31     11.99    11.99    19000     0.63     0.63  searchfixlist
  0.67     12.09     0.10   203173     0.00     0.00  StringTable::search(char
const*, unsigned int)
  0.60     12.18     0.09   369389     0.00     0.00  Lexer::scan(Token*)
  0.54     12.26     0.08   953613     0.00     0.00  ScopeDsymbol::search(Loc,
Identifier*, int)
  0.47     12.33     0.07  1449798     0.00     0.00  calccodsize

Note, this profile is different from the first in that I was compiling the *entire* dcollections library, not just HashMap (which is now bearable due to the improved performance!).  This is why the # calls is much higher on this one compared to the original.

So the last significant performance problem here is searchfixlist. Looking back, it was 2% of the runtime before, now it's jumped to 80%.

Looking at the function in backend/cod3.c, it appears to be another linear search through a linked list.  I understand much less about this function than the other, so I'm not sure how/if it should be solved.  Perhaps another hash map?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2