Thread overview
Notes 5, GC performance
Apr 16, 2008
bearophile
Apr 16, 2008
Sean Kelly
Apr 16, 2008
Bill Baxter
Apr 17, 2008
Janice Caron
Apr 17, 2008
bearophile
April 16, 2008
1) A more uniform syntax may be better, so the following can both work:
s.sizeof
s.typeof
So later typeof() function can then be removed.



2) If I want to create an AA of simple structs I think I have to do the following:


import std.stdio: putr = writefln;
import std.string: cmp;

void main() {
  auto s1a = "hello".dup;
  auto s1b = "hello".dup;
  int[string] aa1;
  aa1[s1a] = 10;
  aa1[s1b] = 20;
  putr(aa1);


  struct S2 {
    string s;
    string toString() { return "<" ~ this.s ~ ">"; }
  }
  auto s2a = S2("hello".dup);
  auto s2b = S2("hello".dup);
  int[S2] aa2;
  aa2[s2a] = 10;
  aa2[s2b] = 20;
  putr(aa2); // bug


  struct S3 {
    string s;

    uint toHash() {
      // return s.hash; // not possible yet
      // One-at-a-Time hash by Bob Jenkins
      // slower than built-in hash, but more uniform
      uint h = 0;
      foreach (c; cast(ubyte[])s) {
        h += c;
        h += (h << 10);
        h ^= (h >> 6);
      }
      h += (h << 3);
      h ^= (h >> 11);
      h += (h << 15);
      return h;
    }

    int opEquals(S3* other) {
      return this.s == other.s;
    }

    int opCmp(S3* other) {
      return cmp(this.s, other.s);
    }

    string toString() { return "<" ~ this.s ~ ">"; }
  }

  auto s3a = S3("hello".dup);
  auto s3b = S3("hello".dup);
  int[S3] aa3;
  aa3[s3a] = 10;
  aa3[s3b] = 20;
  putr(aa3);
}

(Strings already have a hashing functions defined, but how can I use that hash value to avoid such One-at-a-Time hash function?)
Even better: what's the rationale behind the current (IHMO wrong/corner-case) behavior of structs? I think they have to define an automatic toString and hashing too. Better is to have a .hash or .toHash or tohash attribute (property) on any built-in, or to have a global function hash() that gives the has of anything (calling toHash when necessary, to find it recursively).



3) In thousands of functions/classes in my D libs I have code like the following; all those functions/classes scan/use keys of the given AAs (because an AA key is much more useful, with a key you can find the value, while with a value you generally can't quickly find the key).

static if (IsAA!(TyItems)) {
  foreach (key, val; items) {
    // key processing...
  }
} else {
  foreach (el; items) {
    // el processing...
  }
}

So far I haven't found a way to avoid such silly code duplication.
Note IsAA!() is template true if the given type is an AA.
So I think I'd like the D foreach with 1 variable to scan by default the keys of a given AA:
foreach(key; someAA) {...}


4) The syntax of a language changes the typical usage of its constructs, and it's better for such constructs to be very fit to their typical usage. In D AAs can be defined and used with a simple & short syntax, so I think people (expecially people used to scripting languages) use them quite often, even in situations where there are only few keys (while in C++ the usage of hash maps is less easy, so they are used in situations where the programmer wants to put more keys). So I think D AAs have to be quick even when there are few keys (like 6-25 keys). I have written few benchmarks for such situations (that I think are common), and now I think D AAs aren't much optimized for them.
I already know that scanning a short array is generally faster than looking in an AA, but I think D AA implementation may be optimized a bit for the situation with few keys.
In my benchmarks the scan of the array (to see if an item is present) is faster than the "in" AA if the arrays is less than about 18 items long, if the retrieval probability is about 1/2 and items/keys are integers.
Adding an element to a dynamic array, with a scan to be sure it's not already present, is faster than adding an item to an AA if the number of items is less than about 180, and this is a lot!
There are various places around too look for ideas regarding possible ways to speed up the AAs for the situations with few keys, one of such places is the implementation of Python dicts (in Python dicts are used everywhere (each variable lookup is one or more dict accesses!) so they are tuned for the situation for few keys too). If you want to look at Python dicts implementation you can look inside its C sources, in the directory Objects, the files dictnotes.txt and dictobject.c (now I can't give direct URLS). Python C sources are quite more tidy than Perl ones :-)



5) I think I may have found another performance problem of the GC used by DMD, this is a small benchmark I have created reducing a program of mine:

import std.file: read;
static import std.gc;
static import std.c.time;

double clock() {
  return std.c.time.clock() / cast(double)std.c.time.CLOCKS_PER_SEC;
}

struct xsplit {
  string str;

  int opApply(int delegate(ref string) dg) {
    int result;
    size_t i;
    size_t istart = 0;
    bool inword = false;
    size_t str_length = str.length;
    string part;

    for (i = 0; i < str_length; i++) {
      switch (str[i]) {
        case ' ', '\t', '\f', '\r', '\n', '\v':
          if (inword) {
            part = str[istart .. i];
            result = dg(part);
            if (result) goto END;
            inword = false;
          }
          break;

        default:
          if (!inword) {
            istart = i;
            inword = true;
          }
        break;
      }
    }
    if (inword) {
      part = str[istart .. i];
      result = dg(part);
      if (result) goto END;
    }

    END:
      return result;
  }
}

void main() {
  float t0 = clock();
  //string filename = "dictionary.txt"; // 503_000 distinct words, 6.3 MB
  auto words = xsplit(cast(string)read(filename));
  std.gc.disable();
  int[string] parts;
  foreach(w; words)
    parts[w] = 10;
  float t1 = clock();
  printf("%f\n", t1-t0);
}

(The std.gc.disable is nearly necessary, otherwise it burns more than 5 seconds to create the AA).
The timings on 503_000 distinct words, 6.3 MB are:
  t1-t0 = 1.76 s
  Total running time: 3.57 s
I think it's strange that it takes more time to destroy an AA than to find all the hashing valees and create it.

------------------------

Few little things I have recently read that may be interesting:

6) Is this problem possible in D too? If the answer is positive, how can the language (help) avoid such bugs? http://cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html


7) About OcaML:
>OcaML saves us [...] reversing the default behavior that you see in C++/Java/Perl/Ruby: instead of variables being nullable by default, you have to explicitly declare that this variable you’re using is possibly null (called "None").<


8) About const in C++:
>I actually meant I would have liked C++ to have const inference. I guess I mean I would have liked some sort of system where I didn't have to type the keyword "const" everywhere, but the compiler would look for inconsistencies just as ML finds type inconsistencies without needing manifest type declarations everywhere. The flip side of that argument is that if the compiler did the checking without declarations, I wouldn't have been forced to wade through the code, thinking about it carefully. It's a good exercise... once...<



9) Something class-related I have read regarding the "Fan" programming language:

>
Mixins

Building software is often a modeling problem - figuring out how to map a domain model into code. In an object oriented language, this typically means modeling via classes and interfaces. Both Java and C# use a similar approach: classes support single inheritance of both type and implementation. Interfaces support multiple inheritance of type, but do not support inheritance of implementation.

Anyone who has worked in Java or C# knows that choosing between a class or an interface is often a decision that haunts you. Because once you choose a class you've burned your only chance for implementation inheritance. If you have a complicated domain model, then interfaces become a necessary burden - but often end up resulting in a lot of busy work if you need them to have common implementation code. Interfaces are also fraught with peril when it comes to versioning because you can't add a method without breaking all of the implementing code.

There are plenty of good reasons why Java and C# ended up using the class/interface model. Multiple inheritance offers lots of power, but comes at the expense of complexity and some pretty nasty pitfalls. Fan takes a middle of the road approach called mixins. Mixins are essentially Java or C# interfaces that can have method implementations. To avoid some of the pitfalls of true multiple inheritance, mixins restrict features such as fields which store state. Mixins are a very nice feature in the Fan toolbox when it comes to designing your object oriented models.
<

Bye,
bearophile
April 16, 2008
== Quote from bearophile (bearophileHUGS@lycos.com)'s article
...
> (The std.gc.disable is nearly necessary, otherwise it burns more than 5 seconds to create the AA).
> The timings on 503_000 distinct words, 6.3 MB are:
>   t1-t0 = 1.76 s
>   Total running time: 3.57 s
> I think it's strange that it takes more time to destroy an AA than to find all the hashing valees and create it.

The Phobos GC is not aware of types contained in AAs, so it will walways scan AA nodes regardless of
what's in them.  Tango's GC is a bit smarter (and I have a ticket pending to improve it further), but it
will still not be perfect because the compiler only passes in TypeInfo for the key, not the value, and
both are stored in the same memory block so if one must be scanned then the other will be also.
This could be the cause of the slowdown you're seeing.


Sean
April 16, 2008
bearophile wrote:
> 1) A more uniform syntax may be better, so the following can both work:
> s.sizeof
> s.typeof
> So later typeof() function can then be removed.

I wouldn't mind .typeof for variables.  But typeof() can be applied to an expression too.
So would you recommend we do (a+b).typeof in that case?

> 3) In thousands of functions/classes in my D libs I have code like the following; all those functions/classes scan/use keys of the given AAs (because an AA key is much more useful, with a key you can find the value, while with a value you generally can't quickly find the key).
> 
> static if (IsAA!(TyItems)) {
>   foreach (key, val; items) {
>     // key processing...
>   }
> } else {
>   foreach (el; items) {
>     // el processing...
>   }
> }
> 
> So far I haven't found a way to avoid such silly code duplication.
> Note IsAA!() is template true if the given type is an AA.
> So I think I'd like the D foreach with 1 variable to scan by default the keys of a given AA:
> foreach(key; someAA) {...}

Agreed.  This is how Python does it too.  Makes a lot more sense to me.

I think Walter's argument was that an AA is just an array with generalized indices.  foreach(x; array) doesn't iterate over keys (the numbers 0..array.length-1), so therefore foreach(x; someAA) shouldn't either.

I disagree, but that's what Walters's argued in the past.  They are similar syntactically, but the use cases are quite distinct.  I don't think the syntactic similarity should be put ahead of utility.

But it's a good example you bring up -- because I think it directly counter's Walter's argument.  His argument is that the syntactic similarity should be honored over functionality concerns because of "generic code".   And yet here you have an example of generic code where you are saying that adherence to syntactic similarities is actually causing you headaches.  So that's good evidence from real-world code that the generic code argument is bogus.


> 4) The syntax of a language changes the typical usage of its constructs, and it's better for such constructs to be very fit to their typical usage. In D AAs can be defined and used with a simple & short syntax, so I think people (expecially people used to scripting languages) use them quite often, even in situations where there are only few keys (while in C++ the usage of hash maps is less easy, so they are used in situations where the programmer wants to put more keys). So I think D AAs have to be quick even when there are few keys (like 6-25 keys). I have written few benchmarks for such situations (that I think are common), and now I think D AAs aren't much optimized for them.
> I already know that scanning a short array is generally faster than looking in an AA, but I think D AA implementation may be optimized a bit for the situation with few keys.
> In my benchmarks the scan of the array (to see if an item is present) is faster than the "in" AA if the arrays is less than about 18 items long, if the retrieval probability is about 1/2 and items/keys are integers.
> Adding an element to a dynamic array, with a scan to be sure it's not already present, is faster than adding an item to an AA if the number of items is less than about 180, and this is a lot!
> There are various places around too look for ideas regarding possible ways to speed up the AAs for the situations with few keys, one of such places is the implementation of Python dicts (in Python dicts are used everywhere (each variable lookup is one or more dict accesses!) so they are tuned for the situation for few keys too). If you want to look at Python dicts implementation you can look inside its C sources, in the directory Objects, the files dictnotes.txt and dictobject.c (now I can't give direct URLS). Python C sources are quite more tidy than Perl ones :-)
> 

Agreed.  Smart people have spent a lot of time benchmarking and improving the performance of dicts in Python.  And Python's license is quite liberal.
http://www.python.org/psf/license/
Basically you can do anything you want with it, but you'd probably have to tack on a "Portions copyright PSF" in any place where you list DMD's copyright info.

--bb
April 17, 2008
On 16/04/2008, bearophile <bearophileHUGS@lycos.com> wrote:
>  3) In thousands of functions/classes in my D libs I have code like the following; all those functions/classes scan/use keys of the given AAs (because an AA key is much more useful, with a key you can find the value, while with a value you generally can't quickly find the key).
>
>  static if (IsAA!(TyItems)) {
>   foreach (key, val; items) {
>     // key processing...
>   }
>  } else {
>   foreach (el; items) {
>     // el processing...
>   }
>  }
>
>  So far I haven't found a way to avoid such silly code duplication.

What's wrong with

    foreach (key, val; items) {...}

? That works for AAs, regular arrays, and any class which overloads opApply with two variables.
April 17, 2008
Janice Caron:
>? That works for AAs, regular arrays, and any class which overloads opApply with two variables.<

Generally you can't be sure an iterable class/struct supports two indexes too (for example my hyper-fast primes generator doesn't offer the index argument too to avoid code duplication).

Bye,
bearophile