View mode: basic / threaded / horizontal-split · Log in · Help
May 04, 2009
Associative Arrays and Interior Pointers
I've been playing around with associative array and it's led to two pretty
interesting questions for other D users:

1.  I have a prototype associative array implementation that's designed with
D's GC implementation in mind.  It is much easier on the GC (allocates much
less frequently, causes almost no false pointers) than the current
implementation.  However, this comes at the price of worse cache performance
that can cause lookups to take up to twice as long as the current
implementation for large AAs.  On the other hand, building these large AAs is
an order of magnitude faster.  Furthermore, my implementation only allocates
occasionally, meaning in multithreaded mode, inserting an element will seldom
require a lock.  Therefore, the question is, how much cache/lookup performance
are you willing to give up for an implementation that's more GC friendly,
allows faster insertions, and requires less locking?  Also, in your opinion,
should D's builtin AA be designed around the GC implementation, or is this
kind of coupling unacceptable?

2.  Other than its abysmal interaction with the current GC and the lack of
ability to iterate using ranges, the current AA implementation actually seems
pretty good.  One way to remedy a large portion of the GC issues without a
massive overhaul of the GC would be to introduce a feature into the GC where a
block of memory can be flagged as NO_INTERIOR.  This would mean that it could
only be kept alive by pointers to the base address.  If this feature were
implemented and used in memory blocks allocated by the current AA
implementation, it would drastically reduce the false pointer issue.  (What
seems to happen is, first of all, the current implementation generates a lot
of false pointers, and secondly, it allocates so many blocks of memory that a
few of them are guaranteed to be kept around by false pointers, leading to
massive heap fragmentation.)  I've submitted a patch that implements this
NO_INTERIOR feature.  (See bugzilla 2927).  Do you believe that this patch
belongs in the GC?
May 05, 2009
Re: Associative Arrays and Interior Pointers
dsimcha:

I can see you are working a lot to try to improve the implementation of the language. Thank you for this.

> this comes at the price of worse cache performance
> that can cause lookups to take up to twice as long as the current
> implementation for large AAs.  On the other hand, building these large AAs is
> an order of magnitude faster.  Furthermore, my implementation only allocates
> occasionally, meaning in multithreaded mode, inserting an element will seldom
> require a lock.

The built-in AAs have to be flexible and not too much bad for most purposes.
This implementation of yours has different trade offs compared to the built-in one. One way to answer what implementation is to prefer is to collect some amount of typical D1 code, and benchmark it with the two different AA implementations, and keep the AA implementation that leads to an average smaller total running time.
Then one or more different implementations may be added to Phobos for special usages (even Python has a second kind of associative array in the std lib, named defaultdict(), it's based on the same hashing machinery, but has some differences).


> One way to remedy a large portion of the GC issues without a
> massive overhaul of the GC would be to introduce a feature into the GC where a
> block of memory can be flagged as NO_INTERIOR.

Improving current things in a gradual way like this seems a good starting point (this is essentially the same answer I have given to Don regarding complex numbers).

Bye,
bearophile
May 10, 2009
Re: Associative Arrays and Interior Pointers
dsimcha wrote:
> 
> 2.  Other than its abysmal interaction with the current GC and the lack of
> ability to iterate using ranges, the current AA implementation actually seems
> pretty good.  One way to remedy a large portion of the GC issues without a
> massive overhaul of the GC would be to introduce a feature into the GC where a
> block of memory can be flagged as NO_INTERIOR.

Neat idea.  Some GCs (like the Boehm GC) can be set NO_INTERIOR 
globally, but it never crossed my mind to do this per block.  For 
certain data structures, this might be pretty useful.
May 10, 2009
Re: Associative Arrays and Interior Pointers
== Quote from Sean Kelly (sean@invisibleduck.org)'s article
> dsimcha wrote:
> >
> > 2.  Other than its abysmal interaction with the current GC and the lack of
> > ability to iterate using ranges, the current AA implementation actually seems
> > pretty good.  One way to remedy a large portion of the GC issues without a
> > massive overhaul of the GC would be to introduce a feature into the GC where a
> > block of memory can be flagged as NO_INTERIOR.
> Neat idea.  Some GCs (like the Boehm GC) can be set NO_INTERIOR
> globally, but it never crossed my mind to do this per block.  For
> certain data structures, this might be pretty useful.

I get the impression that D's GC will always have a significant degree of
conservatism.  Of course, there's always unions, but unioning a pointer type w/ a
non-pointer type is such an edge case that, if this is the only thing that's
conservative, then the GC can be considered precise for all practical purposes.
For now, I'd love to see this added, because it would be an extremely "cheap" way
to solve a lot of annoying problems.

A few questions:
1.  (For Leonardo, especially):  Is the GC likely to get precise enough in the
near future that something like this would end up being considered a piece of cruft?
2.  Other than that, is there any reason this should not go in?
3.  Sean, if you're seriously thinking of putting this in in the near future, let
me know so I can fix that patch, too.  I did it the same way as my other GC patch,
with no whitespace, so the line numbers might be screwy here, too.
May 10, 2009
Re: Associative Arrays and Interior Pointers
On 2009-05-10 20:04:40 +0200, dsimcha <dsimcha@yahoo.com> said:

> == Quote from Sean Kelly (sean@invisibleduck.org)'s article
>> dsimcha wrote:
>>> 
>>> 2.  Other than its abysmal interaction with the current GC and the lack of
>>> ability to iterate using ranges, the current AA implementation actually seems
>>> pretty good.  One way to remedy a large portion of the GC issues without a
>>> massive overhaul of the GC would be to introduce a feature into the GC where a
>>> block of memory can be flagged as NO_INTERIOR.
>> Neat idea.  Some GCs (like the Boehm GC) can be set NO_INTERIOR
>> globally, but it never crossed my mind to do this per block.  For
>> certain data structures, this might be pretty useful.
> 
> I get the impression that D's GC will always have a significant degree of
> conservatism.  Of course, there's always unions, but unioning a pointer 
> type w/ a
> non-pointer type is such an edge case that, if this is the only thing that's
> conservative, then the GC can be considered precise for all practical purposes.
> For now, I'd love to see this added, because it would be an extremely 
> "cheap" way
> to solve a lot of annoying problems.
> 
> A few questions:
> 1.  (For Leonardo, especially):  Is the GC likely to get precise enough in the
> near future that something like this would end up being considered a 
> piece of cruft?
> 2.  Other than that, is there any reason this should not go in?
> 3.  Sean, if you're seriously thinking of putting this in in the near 
> future, let
> me know so I can fix that patch, too.  I did it the same way as my 
> other GC patch,
> with no whitespace, so the line numbers might be screwy here, too.

Yes the idea of NO_INTERIOR was floated before, and it is a good idea 
and works for 99% of the usages of AA and similar objects.

Normally it is not very different from having a GC managed wrapper 
object that alloc non GC managed memory, which for example is what I do 
with multidimensional arrays.
For these there isn't a big advantage in NO_INTERIOR,because basically 
always you have a wrapper object, and for these wrapper objects one can 
explicitly say that internal pointers are valid only if a pointer to 
the AA is kept

For the AA usage these is still the 0.1% usage in which one might lose 
the pointer to the AA, but still have a pointer to some of its 
contents. At the moment this is legal (I think) your change would 
disallow it. This might still be ok, but one should be aware of it

Fawzi
May 10, 2009
Re: Associative Arrays and Interior Pointers
dsimcha, el 10 de mayo a las 18:04 me escribiste:
> == Quote from Sean Kelly (sean@invisibleduck.org)'s article
> > dsimcha wrote:
> > >
> > > 2.  Other than its abysmal interaction with the current GC and the lack of
> > > ability to iterate using ranges, the current AA implementation actually seems
> > > pretty good.  One way to remedy a large portion of the GC issues without a
> > > massive overhaul of the GC would be to introduce a feature into the GC where a
> > > block of memory can be flagged as NO_INTERIOR.
> > Neat idea.  Some GCs (like the Boehm GC) can be set NO_INTERIOR
> > globally, but it never crossed my mind to do this per block.  For
> > certain data structures, this might be pretty useful.
> 
> I get the impression that D's GC will always have a significant degree of
> conservatism.  Of course, there's always unions, but unioning a pointer type w/ a
> non-pointer type is such an edge case that, if this is the only thing that's
> conservative, then the GC can be considered precise for all practical purposes.
> For now, I'd love to see this added, because it would be an extremely "cheap" way
> to solve a lot of annoying problems.

I really like the idea of NO_INTERIOR too. I don't think it's possible to
be the default because if that is the case you can't have array slices
and/or pointers to a member of a struct/class. But I think it's great for
very special cases (specially to implement high performance data
structures) to be able to instruct the GC to discard interior pointers.

> A few questions:
> 1.  (For Leonardo, especially):  Is the GC likely to get precise enough in the
> near future that something like this would end up being considered a piece of cruft?

I'm sorry, but I finally decided to add concurrency to the current GC. The
main idea is to reduce (almost vanish) pauses. If is not too hard, and
time help, I'll try to add some preciseness if it only involves changes in
the internal runtime, but I honestly don't think I'll have the time.

I'm talking about finishing my thesis here. When it's finished I think and
hope to keep working on the D GC...

> 2.  Other than that, is there any reason this should not go in?

I don't see any reason, is a backward compatible change.

> 3.  Sean, if you're seriously thinking of putting this in in the near future, let
> me know so I can fix that patch, too.  I did it the same way as my other GC patch,
> with no whitespace, so the line numbers might be screwy here, too.

I think it would be great to have a centralized place where to put this
improvements. This is another situation where I think Tango vs. Phobos
issue is killing D. When I started my work in the thesis I had to decide
whether to work with Phobos or Tango. I finally decided for Tango, because
is the only option for LDC and because is way better organized (and more
receptive to patches). But I hate knowing that my work will be available
(in the best case) only for people using Tango.

I hate to see the D "community" fragmented.

-- 
Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/
----------------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------------
Estamos cantando a la sombra de nuestra parra
una canción que dice que uno sólo conserva lo que no amarra
May 10, 2009
Re: Associative Arrays and Interior Pointers
== Quote from Leandro Lucarella (llucax@gmail.com)'s article
> I really like the idea of NO_INTERIOR too. I don't think it's possible to
> be the default because if that is the case you can't have array slices
> and/or pointers to a member of a struct/class. But I think it's great for
> very special cases (specially to implement high performance data
> structures) to be able to instruct the GC to discard interior pointers.

Of course, I don't think any reasonable person thinks it should be the default.
The idea is that it would be an unsafe optimization for people who really know
what they're doing or are really having serious trouble with false pointers.  It
should be thought of kind of like using manual memory management in D.

> > A few questions:
> > 1.  (For Leonardo, especially):  Is the GC likely to get precise enough in the
> > near future that something like this would end up being considered a piece of
cruft?
> I'm sorry, but I finally decided to add concurrency to the current GC. The
> main idea is to reduce (almost vanish) pauses. If is not too hard, and
> time help, I'll try to add some preciseness if it only involves changes in
> the internal runtime, but I honestly don't think I'll have the time.
> I'm talking about finishing my thesis here. When it's finished I think and
> hope to keep working on the D GC...

Ok, cool.  I just wanted to make sure that, if we took the time to put this in,
you weren't likely to come back in 6 months and say "I've made the GC almost
completely precise.  All false pointer stuff is now irrelevant and just a bunch of
cruft."  On the other hand, I certainly wouldn't mind if the GC became more
precise instead.

> > 2.  Other than that, is there any reason this should not go in?
> I don't see any reason, is a backward compatible change.
> > 3.  Sean, if you're seriously thinking of putting this in in the near future, let
> > me know so I can fix that patch, too.  I did it the same way as my other GC patch,
> > with no whitespace, so the line numbers might be screwy here, too.
> I think it would be great to have a centralized place where to put this
> improvements. This is another situation where I think Tango vs. Phobos
> issue is killing D. When I started my work in the thesis I had to decide
> whether to work with Phobos or Tango. I finally decided for Tango, because
> is the only option for LDC and because is way better organized (and more
> receptive to patches). But I hate knowing that my work will be available
> (in the best case) only for people using Tango.
> I hate to see the D "community" fragmented.

You're right, this is unfortunate.  Basically every contribution I've made to D is
D2-oriented and completely irrelevant to D1.
Top | Discussion index | About this forum | D home