Thread overview
[Issue 2331] New: Enum hashes many times slower than normal hashes
Sep 03, 2008
d-bugmail
Oct 11, 2008
d-bugmail
Oct 11, 2008
d-bugmail
Oct 11, 2008
d-bugmail
Oct 11, 2008
d-bugmail
Sep 22, 2010
Mitch Hayenga
Sep 22, 2010
nfxjfg@gmail.com
September 03, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2331

           Summary: Enum hashes many times slower than normal hashes
           Product: D
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: andrei@metalanguage.com


This is quite incredible. Consider this map of stopwords for the English language:

bool[string] stopWords;

static this() {
    stopWords = [
    "a"[]:1, "b":1, "c":1, "d":1, "e":1, "f":1, "g":1, "h":1, "i":1, "j":1,
"k":1, "l":1, "m":1, "n":1, "o":1, "p":1, "q":1, "r":1, "s":1, "t":1, "u":1,
"v":1, "w":1, "x":1, "y":1, "z":1,
    "the":1, "a":1,
    "about":1, "above":1, "across":1, "afterwards":1, "after":1, "again":1,
"against":1, "ago":1, "almost":1, "along":1,
    "already":1, "always":1, "among":1, "anywhere":1, "around":1, "as":1,
"at":1, "away":1,
    "back":1, "before":1, "behind":1, "beside":1, "between":1, "beyond":1,
"by":1,
    "down":1, "downstairs":1, "during":1,
    "else":1, "enough":1, "every":1, "everywhere":1,
    "far":1, "for":1, "from":1,
    "here":1,
    "in":1, "inside":1, "into":1,
    "just":1,
    "last":1, "lot":1, "lots":1,
    "many":1, "middle":1, "much":1,
    "near":1, "next":1, "never":1, "not":1, "now":1, "nowhere":1,
    "of":1, "off":1, "often":1, "on":1, "once":1, "over":1, "outside":1,
"over":1,
    "quite":1,
    "rather":1, "recently":1, "rarely":1, "round":1,
    "seldom":1, "sometimes":1, "somewhere":1,
    "there":1, "through":1, "till":1, "today":1, "to":1, "tomorrow":1,
"tonight":1, "too":1, "towards":1,
    "until":1, "up":1, "upstairs":1, "usually":1, "under":1,
    "very":1,
    "while":1, "with":1, "within":1, "without":1,
    "yes":1, "yesterday":1, "yet":1,
    "you":1, "he":1, "she":1, "it":1, "we":1, "they":1,
    "me":1, "him":1, "her":1, "it":1, "us":1, "them":1,
    "myself":1, "yourself":1, "himself":1, "herself":1, "itself":1,
"ourselves":1, "yourselves":1,
    "themselves":1, "oneself":1,
    "my":1, "your":1, "its":1, "our":1, "their":1,
    "mine":1, "yours":1, "hers":1, "ours":1, "theirs":1,
    "this":1, "these":1, "those":1,
    "some":1, "any":1, "no":1, "none":1,
    "other":1, "another":1, "every":1, "all":1, "others":1, "each":1,
"whole":1, "both":1,
    "neither":1, "none":1,
    "someone":1, "somebody":1, "something":1,
    "anyoneanybodyanything":1,
    "nobody":1, "nothing":1,
    "everyone":1, "everybody":1, "everything":1,
    "and":1, "or":1, "but":1, "because":1, "if":1,
    "as":1, "like":1, "such":1,
    "how":1, "who":1, "why":1, "what":1, "where":1, "whose":1, "when":1,
"whom":1, "which":1,
"bye":1, "hello":1,
    "be":1, "was":1, "been":1, "am":1, "is":1, "are":1, "were":1,
    "can":1, "could":1,
    "come":1, "came":1, "comes":1,
    "do":1, "did":1, "done":1, "does":1,
    "get":1, "got":1, "gets":1,
    "have":1, "had":1, "has":1, "having":1,
    "may":1, "might":1,
    "must":1,
    "shall":1, "should":1,
    "ought":1,
    "take":1, "took":1, "taken":1, "takes":1, "taking":1,
    "use":1, "uses":1, "used":1,
    "will":1, "would":1,
    "aren't":1,
    "cannot":1,
    "can't":1,
    "couldn't":1,
    "didn't":1,
    "doesn't":1,
    "don't":1,
    "wasn't":1,
    "wouldn't":1,
    "hadn't":1,
    "isn't":1,
    "one":1, "two":1, "three":1, "four":1, "five":1, "six":1, "seven":1,
    "eight":1, "nine":1, "ten":1, "nought":1, "zero":1,
    "first":1, "second":1, "third":1, "fourth":1, "fifth":1, "sixth":1,
"seventh":1,
    "eighth":1, "ninth":1, "tenth":1,
    "ii":1, "iii":1, "iv":1, "vi":1, "vii":1, "viii":1, "ix":1,
    "sunday":1, "monday":1, "tuesday":1, "wednesday":1, "thursday":1,
"friday":1, "saturday":1,
    "january":1, "february":1, "march":1, "april":1, "may":1, "june":1,
    "julyaugustseptember":1, "october":1, "november":1, "december":1,
    "date":1, "false":1, "e.geg":1, "i.e":1, "ie":1, "etc":1, "example":1,
"examples":1,
    "jrmiss":1, "thing":1, "things":1, "true":1, "year":1,
    "badbig":1, "close":1, "difficult":1, "easy":1, "empty":1, "full":1,
"good":1,
    "little":1, "long":1, "ready":1, "open":1, "short":1, "tall":1
];
}

The map can be used in help for parsing English text. Now say we make this change:

enum bool[string] stopWords = [ ... same contents ... ];

Amazingly enough, looking up this new map (with "word in stopWords") is many many times slower than looking up the old map. What's happening?


-- 

September 03, 2008
On Tue, Sep 2, 2008 at 11:18 PM, <d-bugmail@puremagic.com> wrote:

> http://d.puremagic.com/issues/show_bug.cgi?id=2331
>
>           Summary: Enum hashes many times slower than normal hashes
>           Product: D
>           Version: unspecified
>          Platform: PC
>        OS/Version: Linux
>            Status: NEW
>          Severity: normal
>          Priority: P2
>         Component: DMD
>        AssignedTo: bugzilla@digitalmars.com
>        ReportedBy: andrei@metalanguage.com
>
>
> This is quite incredible. Consider this map of stopwords for the English language:
>
> bool[string] stopWords;
>
> static this() {
>    stopWords = [
>    "a"[]:1, "b":1, "c":1, "d":1, "e":1, "f":1, "g":1, "h":1, "i":1, "j":1,
> "k":1, "l":1, "m":1, "n":1, "o":1, "p":1, "q":1, "r":1, "s":1, "t":1,
> "u":1,
> "v":1, "w":1, "x":1, "y":1, "z":1,
>    "the":1, "a":1,
>    "about":1, "above":1, "across":1, "afterwards":1, "after":1, "again":1,
> "against":1, "ago":1, "almost":1, "along":1,
>    "already":1, "always":1, "among":1, "anywhere":1, "around":1, "as":1,
> "at":1, "away":1,
>    "back":1, "before":1, "behind":1, "beside":1, "between":1, "beyond":1,
> "by":1,
>    "down":1, "downstairs":1, "during":1,
>    "else":1, "enough":1, "every":1, "everywhere":1,
>    "far":1, "for":1, "from":1,
>    "here":1,
>    "in":1, "inside":1, "into":1,
>    "just":1,
>    "last":1, "lot":1, "lots":1,
>    "many":1, "middle":1, "much":1,
>    "near":1, "next":1, "never":1, "not":1, "now":1, "nowhere":1,
>    "of":1, "off":1, "often":1, "on":1, "once":1, "over":1, "outside":1,
> "over":1,
>    "quite":1,
>    "rather":1, "recently":1, "rarely":1, "round":1,
>    "seldom":1, "sometimes":1, "somewhere":1,
>    "there":1, "through":1, "till":1, "today":1, "to":1, "tomorrow":1,
> "tonight":1, "too":1, "towards":1,
>    "until":1, "up":1, "upstairs":1, "usually":1, "under":1,
>    "very":1,
>    "while":1, "with":1, "within":1, "without":1,
>    "yes":1, "yesterday":1, "yet":1,
>    "you":1, "he":1, "she":1, "it":1, "we":1, "they":1,
>    "me":1, "him":1, "her":1, "it":1, "us":1, "them":1,
>    "myself":1, "yourself":1, "himself":1, "herself":1, "itself":1,
> "ourselves":1, "yourselves":1,
>    "themselves":1, "oneself":1,
>    "my":1, "your":1, "its":1, "our":1, "their":1,
>    "mine":1, "yours":1, "hers":1, "ours":1, "theirs":1,
>    "this":1, "these":1, "those":1,
>    "some":1, "any":1, "no":1, "none":1,
>    "other":1, "another":1, "every":1, "all":1, "others":1, "each":1,
> "whole":1, "both":1,
>    "neither":1, "none":1,
>    "someone":1, "somebody":1, "something":1,
>    "anyoneanybodyanything":1,
>    "nobody":1, "nothing":1,
>    "everyone":1, "everybody":1, "everything":1,
>    "and":1, "or":1, "but":1, "because":1, "if":1,
>    "as":1, "like":1, "such":1,
>    "how":1, "who":1, "why":1, "what":1, "where":1, "whose":1, "when":1,
> "whom":1, "which":1,
> "bye":1, "hello":1,
>    "be":1, "was":1, "been":1, "am":1, "is":1, "are":1, "were":1,
>    "can":1, "could":1,
>    "come":1, "came":1, "comes":1,
>    "do":1, "did":1, "done":1, "does":1,
>    "get":1, "got":1, "gets":1,
>    "have":1, "had":1, "has":1, "having":1,
>    "may":1, "might":1,
>    "must":1,
>    "shall":1, "should":1,
>    "ought":1,
>    "take":1, "took":1, "taken":1, "takes":1, "taking":1,
>    "use":1, "uses":1, "used":1,
>    "will":1, "would":1,
>    "aren't":1,
>    "cannot":1,
>    "can't":1,
>    "couldn't":1,
>    "didn't":1,
>    "doesn't":1,
>    "don't":1,
>    "wasn't":1,
>    "wouldn't":1,
>    "hadn't":1,
>    "isn't":1,
>    "one":1, "two":1, "three":1, "four":1, "five":1, "six":1, "seven":1,
>    "eight":1, "nine":1, "ten":1, "nought":1, "zero":1,
>    "first":1, "second":1, "third":1, "fourth":1, "fifth":1, "sixth":1,
> "seventh":1,
>    "eighth":1, "ninth":1, "tenth":1,
>    "ii":1, "iii":1, "iv":1, "vi":1, "vii":1, "viii":1, "ix":1,
>    "sunday":1, "monday":1, "tuesday":1, "wednesday":1, "thursday":1,
> "friday":1, "saturday":1,
>    "january":1, "february":1, "march":1, "april":1, "may":1, "june":1,
>    "julyaugustseptember":1, "october":1, "november":1, "december":1,
>    "date":1, "false":1, "e.geg":1, "i.e":1, "ie":1, "etc":1, "example":1,
> "examples":1,
>    "jrmiss":1, "thing":1, "things":1, "true":1, "year":1,
>    "badbig":1, "close":1, "difficult":1, "easy":1, "empty":1, "full":1,
> "good":1,
>    "little":1, "long":1, "ready":1, "open":1, "short":1, "tall":1
> ];
> }
>
> The map can be used in help for parsing English text. Now say we make this change:
>
> enum bool[string] stopWords = [ ... same contents ... ];
>
> Amazingly enough, looking up this new map (with "word in stopWords") is
> many
> many times slower than looking up the old map. What's happening?
>
>
> --
>
>
It may (and this is a big "may", you'd have to look at the disassembly to know) be related to http://d.puremagic.com/issues/show_bug.cgi?id=2237.  In short, I had a constant array (D1, same as an enum array in D2) which, for some reason, DMD was re-constructing the array from scratch upon _every access to it_ instead of putting it in the static data segment.  It's possible that the compiler is making the same stupid mistake here.


October 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2331





------- Comment #1 from bugzilla@digitalmars.com  2008-10-11 02:26 -------
What's happening is that the static this() constructor builds the hash table once. The enum version builds it every time it is used, as the enum name is replaced with its initializer.


-- 

October 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2331





------- Comment #2 from 2korden@gmail.com  2008-10-11 06:47 -------
I believe enum should be smarter than that. If not, it is no different than C-style define macro.


-- 

October 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2331





------- Comment #3 from andrei@metalanguage.com  2008-10-11 09:06 -------
(In reply to comment #1)
> What's happening is that the static this() constructor builds the hash table once. The enum version builds it every time it is used, as the enum name is replaced with its initializer.
> 

I hope this is an explanation of the bug's cause, not a justification that this is not a bug. :o)


-- 

October 11, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=2331





------- Comment #4 from jarrett.billingsley@gmail.com  2008-10-11 16:23 -------
Sounds eerily similar to http://d.puremagic.com/issues/show_bug.cgi?id=2237

Coincidence?


-- 

September 22, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=2331


Mitch Hayenga <mitch.hayenga@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mitch.hayenga@gmail.com


--- Comment #5 from Mitch Hayenga <mitch.hayenga@gmail.com> 2010-09-22 10:22:20 PDT ---
I recently hit this performance issue myself while trying to use a lookup table, rather than branching on logic for a function.  It can be avoided by declaring the field as invariant, but I had originally used Enum as thats one of the ways suggested by TDPL for doing CTFE.


pseudocode:

bool[256] generate_lookup_table(); // function declaration

// Performance = terrible here
enum lookup_as_enum = generate_lookup_table();

// Performance = great here
invariant lookup_as_enum = generate_lookup_table();

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 22, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=2331


nfxjfg@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg@gmail.com


--- Comment #6 from nfxjfg@gmail.com 2010-09-22 10:51:38 PDT ---
(In reply to comment #1)
> What's happening is that the static this() constructor builds the hash table once. The enum version builds it every time it is used, as the enum name is replaced with its initializer.

That's quite hilarious. There are now half a dozen of bugs related to dmd being stupid with static data construction. E.g. see bug 4397, bug 2526, bug 2356, bug 4881. They possibly are all caused by the same underlying issue. Walter, don't you think your users are finally annoyed enough so that you could look into fixing it?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
September 22, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=2331


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #7 from bearophile_hugs@eml.cc 2010-09-22 11:40:48 PDT ---
(In reply to comment #6)
> Walter, don't you think your users are finally annoyed enough so that you could look into fixing it?

Be more gentle, please.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------