On heap segregation, GC optimization and @nogc relaxing - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » On heap segregation, GC optimization and @nogc relaxing

Thread overview

On heap segregation, GC optimization and @nogc relaxing
Nov 12, 2014 deadalnix
Nov 12, 2014 Orvid King
Nov 12, 2014 Rikki Cattermole
Nov 12, 2014 deadalnix
Nov 12, 2014 Marc Schütz
Nov 13, 2014 Marc Schütz
Nov 13, 2014 deadalnix
Nov 12, 2014 Walter Bright
Nov 12, 2014 deadalnix
Nov 12, 2014 Ola Fosheim Grøstad
Nov 12, 2014 deadalnix
Nov 12, 2014 Ola Fosheim Grøstad
Nov 12, 2014 Paulo Pinto
Nov 12, 2014 ponce
Nov 12, 2014 Ola Fosheim Grøstad
Nov 12, 2014 ponce
Nov 12, 2014 Ola Fosheim Grøstad
Nov 12, 2014 Marc Schütz
Nov 12, 2014 deadalnix
Nov 13, 2014 Marc Schütz
Nov 13, 2014 Ola Fosheim Grøstad
Nov 13, 2014 deadalnix
Nov 14, 2014 Marc Schütz
Nov 14, 2014 deadalnix
Nov 14, 2014 Ola Fosheim Grøstad
Nov 15, 2014 deadalnix
Nov 15, 2014 Ola Fosheim Grøstad
Nov 15, 2014 deadalnix
Nov 18, 2014 Ola Fosheim Grøstad
Nov 18, 2014 deadalnix
Nov 21, 2014 Guillaume Chatelet
Nov 16, 2014 Marc Schütz
Nov 17, 2014 Paulo Pinto
Nov 12, 2014 Dmitry Olshansky
Nov 12, 2014 deadalnix
Nov 14, 2014 Dmitry Olshansky
Nov 14, 2014 Marc Schütz
Nov 14, 2014 deadalnix
Nov 14, 2014 Walter Bright
Nov 14, 2014 deadalnix
Nov 17, 2014 Walter Bright
Nov 17, 2014 deadalnix
Nov 18, 2014 Walter Bright
Nov 18, 2014 deadalnix
Dec 04, 2014 Dicebot
Dec 04, 2014 deadalnix
Dec 05, 2014 Dicebot

November 12, 2014

On heap segregation, GC optimization and @nogc relaxing

Posted by deadalnix

deadalnix

Hi all,

I want to get back on the subject of ownership, lifetime and propose some solution, but before, propose to state the problem in a way that haven't seen before (even if I have no doubt some have came to the same conclusion in the past).

The problem at hand is double: memory management and thread safety. Number one has been a hot topic for ages, and number 2 has become very over the past years, to the widespreading of multicores CPU.

The problem at hand here is ownership of data. There are 3 roads you can go about it:
 - immutability and GC. Effectively, these 2 technique allow you to get rid of ownership. There are advantages and drawbacks i'm going to discuss later.
 - Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer.
 - Annotations. This is the Rust road. It also come a great cost for the developer, as some schemes may be non trivial to express granted the type system, but, contrary to the C++ road, is safe.

These approach all have some very nice things going on for them, but also some killer scenarios.

Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created. Most text processing apps fall into this category and this is why D is that good at them. Another big goodies is that many lock free algorithm become possible. Once you remove the need for bookkeeping of ownership many operations can be implemented in an atomic manner. Additionally, it is possible to implement various GC optimization on immutable heap, which make the GC generally more efficient. But the cost is also real. For some use case, this mean having a large amount of garbage generated (Carmack wrote a piece on haskell were he mention the disastrous effect that having a framebuffer immutable would have: you'd have to clone it everytime you draw in it, which is a no go). GC also tend to cause unpredictable runtime characteristics, which programs with real time constraint can have hard time to deal with.

Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to.

Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations.

Ideally, we want to map with what the hardware does. So what does the hardware do ?

Multicore CPU have various cores, each of them having layers of cache. Cache is organized in cache line and each cache line can be in various modes. Actual system are quite complex and deal with problems we are not very interesting here (like writeback) but the general idea is that every cache line is owned with different modes.

Either the cache line is owned by a single core and can be written to, or the cache line shared by several cores, each of them having a local copy of the line, but none of them can write to. There is an internal bus where cores can exchange cache line with each other and messages to acquire cache line in read or read/write mode. That mean CPU are good at thread local read/write, shared immutable and transfer of ownership from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order).

In that world, D has a bizaro position were it use a combination of annotations (immutable, shared) and GC. Ultimately, this is a good solution. Using annotation for common cases, fallback on GC/unsafe code when these annotations fall short.

Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC:
 - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads.
 - Immutable heap can be collected 100% concurrently without any synchronization with the program.
 - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway.

Various ML family languages (like OCaml) have adopted segregated heap strategy and get great benefice out of it. For instance, OCaml's GC is known to outperform Java's in most scenarios.

We are sitting on a huge GC goldmine here, but 3 things prevent us to exploit it:
 - Exceptions. They can bubble from one thread to the other and create implicit sharing.
 - Uniqueness (as it is defined now) as it allow for unique object to be merged with any heap.
 - message passing. Ownership transfert is not possible and so unsafe casting ensue.

 * It has to be noted that delegate allow as well for this kind of stunt, but this is recognized as a bug by now and hopefully it is gonna be fixed.

D has a type qualifier system for which we pay a big price. Getting everything const correct is difficult. We'd want to get the most bang for the buck. One of the bang we are not far to be able to get is segregating the heap. That mean shitty GC and unsafe code.

Let's present a concrete exemple using ownership:
pure Object foo() { ... }
immutable o = foo();

This is valid code. However, foo can do arbitrary manipulation to come up with the object. These include various allocations. These allocation are mutable into foo, which makes it impossible to allocate them on the immutable heap (as a GC relying on this immutability could mess up things pretty bad). They also cannot be allocated on the TL heap as once promoted to immutable, the data become shared as well.

On the other hand, ownership means that the compiler can know when things go out of scope and free them explicitly. Which is a plus as generating less garbage is always a way to improve garbage collection. The most efficient work there is is the one that do not need to be done.

I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC.

We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data.

Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff.

The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island.

owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared).

owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed.

That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error).

On an implementation level, a call to a pure function that return an owned could look like this :

{
  IslandID __saved = gc_switch_new_island();
  scope(exit) gc_restore_island(__saved);

  call_pure_function();
}

This allow us to rely much less on the GC and allow for a better GC implementation.

@nogc . Remember ? It was in the title. What does a @nogc function look like ? a no gc function o not produce any garbage or trigger the collection cycle. there is no reason per se to prevent the @nogc code to allocate on the GC as long as you know it won't produce garbage. That mean the only operation you need to ban are the one that merge the owned things into TL, shared or immutable heap.

This solves the problem of the @nogc + Exception. As Exception are isolated, they can be allocated, throw and catched into @nogc code without generating garbage. They can safely bubble out of the @nogc section of the code and still be safe.

The same way, it open the door for a LOT of code that is not @nogc to be. If the code allocate memory in an owned island and return it, then it is now up to the caller to decide whether is want's it garbage collected or keep it as owned (and/or make it reference counted for instance).

The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as @nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by Orvid King
in reply to deadalnix

Orvid King

Posted in reply to deadalnix

On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:
> Hi all,
>
> I want to get back on the subject of ownership, lifetime and propose some solution, but before, propose to state the problem in a way that haven't seen before (even if I have no doubt some have came to the same conclusion in the past).
>
> The problem at hand is double: memory management and thread safety. Number one has been a hot topic for ages, and number 2 has become very over the past years, to the widespreading of multicores CPU.
>
> The problem at hand here is ownership of data. There are 3 roads you can go about it:
>  - immutability and GC. Effectively, these 2 technique allow you to get rid of ownership. There are advantages and drawbacks i'm going to discuss later.
>  - Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer.
>  - Annotations. This is the Rust road. It also come a great cost for the developer, as some schemes may be non trivial to express granted the type system, but, contrary to the C++ road, is safe.
>
> These approach all have some very nice things going on for them, but also some killer scenarios.
>
> Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created. Most text processing apps fall into this category and this is why D is that good at them. Another big goodies is that many lock free algorithm become possible. Once you remove the need for bookkeeping of ownership many operations can be implemented in an atomic manner. Additionally, it is possible to implement various GC optimization on immutable heap, which make the GC generally more efficient. But the cost is also real. For some use case, this mean having a large amount of garbage generated (Carmack wrote a piece on haskell were he mention the disastrous effect that having a framebuffer immutable would have: you'd have to clone it everytime you draw in it, which is a no go). GC also tend to cause unpredictable runtime characteristics, which programs with real time constraint can have hard time to deal with.
>
> Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to.
>
> Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations.
>
> Ideally, we want to map with what the hardware does. So what does the hardware do ?
>
> Multicore CPU have various cores, each of them having layers of cache. Cache is organized in cache line and each cache line can be in various modes. Actual system are quite complex and deal with problems we are not very interesting here (like writeback) but the general idea is that every cache line is owned with different modes.
>
> Either the cache line is owned by a single core and can be written to, or the cache line shared by several cores, each of them having a local copy of the line, but none of them can write to. There is an internal bus where cores can exchange cache line with each other and messages to acquire cache line in read or read/write mode. That mean CPU are good at thread local read/write, shared immutable and transfer of ownership from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order).
>
> In that world, D has a bizaro position were it use a combination of annotations (immutable, shared) and GC. Ultimately, this is a good solution. Using annotation for common cases, fallback on GC/unsafe code when these annotations fall short.
>
> Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC:
>  - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads.
>  - Immutable heap can be collected 100% concurrently without any synchronization with the program.
>  - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway.
>
> Various ML family languages (like OCaml) have adopted segregated heap strategy and get great benefice out of it. For instance, OCaml's GC is known to outperform Java's in most scenarios.
>
> We are sitting on a huge GC goldmine here, but 3 things prevent us to exploit it:
>  - Exceptions. They can bubble from one thread to the other and create implicit sharing.
>  - Uniqueness (as it is defined now) as it allow for unique object to be merged with any heap.
>  - message passing. Ownership transfert is not possible and so unsafe casting ensue.
>
>  * It has to be noted that delegate allow as well for this kind of stunt, but this is recognized as a bug by now and hopefully it is gonna be fixed.
>
> D has a type qualifier system for which we pay a big price. Getting everything const correct is difficult. We'd want to get the most bang for the buck. One of the bang we are not far to be able to get is segregating the heap. That mean shitty GC and unsafe code.
>
> Let's present a concrete exemple using ownership:
> pure Object foo() { ... }
> immutable o = foo();
>
> This is valid code. However, foo can do arbitrary manipulation to come up with the object. These include various allocations. These allocation are mutable into foo, which makes it impossible to allocate them on the immutable heap (as a GC relying on this immutability could mess up things pretty bad). They also cannot be allocated on the TL heap as once promoted to immutable, the data become shared as well.
>
> On the other hand, ownership means that the compiler can know when things go out of scope and free them explicitly. Which is a plus as generating less garbage is always a way to improve garbage collection. The most efficient work there is is the one that do not need to be done.
>
> I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC.
>
> We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data.
>
> Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff.
>
> The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island.
>
> owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared).
>
> owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed.
>
> That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error).
>
> On an implementation level, a call to a pure function that return an owned could look like this :
>
> {
>   IslandID __saved = gc_switch_new_island();
>   scope(exit) gc_restore_island(__saved);
>
>   call_pure_function();
> }
>
> This allow us to rely much less on the GC and allow for a better GC implementation.
>
> @nogc . Remember ? It was in the title. What does a @nogc function look like ? a no gc function o not produce any garbage or trigger the collection cycle. there is no reason per se to prevent the @nogc code to allocate on the GC as long as you know it won't produce garbage. That mean the only operation you need to ban are the one that merge the owned things into TL, shared or immutable heap.
>
> This solves the problem of the @nogc + Exception. As Exception are isolated, they can be allocated, throw and catched into @nogc code without generating garbage. They can safely bubble out of the @nogc section of the code and still be safe.
>
> The same way, it open the door for a LOT of code that is not @nogc to be. If the code allocate memory in an owned island and return it, then it is now up to the caller to decide whether is want's it garbage collected or keep it as owned (and/or make it reference counted for instance).
>
> The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as @nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.

I think a combination of the C++'s standard library's approach and Rust's approach would actually be the best possible. If we were to follow C++'s strategy, I think it would be important to make sure that it wouldn't require specifically adding template parameters and constraints, and instead allow the use of a concept-like system. I think that being able to default the allocator parameter to the GC, provided the current method is not @nogc, would also be a good idea. I think that if C++'s approach were taken it would also be very beneficial to allow a syntax such as `auto obj = new MyClass() with allocator`, and `delete obj with allocator`. I do think that the definition of @nogc would have to be slightly expanded though, so mean that any values that are allocated with a given allocator are also freed with the given allocator before returning. To connect back with your proposal and allow even more to be @nogc, an owned(MyClass, allocator) object would be allowable to be returned in an @nogc function. This would allow transfer of the ownership of the data and responsibility of deletion to the caller, provided that the caller is @nogc. If the caller is @nogc and fails to free the memory, DMD should produce an error. If the caller is not @nogc, then DMD would say nothing, and assume that the allocator used to allocate the object will do the cleanup.

This would allow far more situations to be accounted for by the allocation system without needing a GC, while still allowing programs that want to to use the GC. @nogc would simply mean that no garbage is produced, it would not make a guarantee of what allocator was used to perform the allocation. @nogc would also mean that no allocators, other than the ones passed in by the user, would be used to perform the allocations. This allows the current definition of @nogc to still be present, while opening the scope of @nogc up for use in a much larger variety of situations.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by Rikki Cattermole
in reply to deadalnix

Rikki Cattermole

Posted in reply to deadalnix

On 12/11/2014 3:34 p.m., deadalnix wrote:
> Hi all,
>
> I want to get back on the subject of ownership, lifetime and propose
> some solution, but before, propose to state the problem in a way that
> haven't seen before (even if I have no doubt some have came to the same
> conclusion in the past).
>
> The problem at hand is double: memory management and thread safety.
> Number one has been a hot topic for ages, and number 2 has become very
> over the past years, to the widespreading of multicores CPU.
>
> The problem at hand here is ownership of data. There are 3 roads you can
> go about it:
>   - immutability and GC. Effectively, these 2 technique allow you to get
> rid of ownership. There are advantages and drawbacks i'm going to
> discuss later.
>   - Being unsafe and rely on convention. This is the C++ road (and a
> possible road in D). It allow to implement almost any wanted scheme, but
> come at great cost for the developer.
>   - Annotations. This is the Rust road. It also come a great cost for
> the developer, as some schemes may be non trivial to express granted the
> type system, but, contrary to the C++ road, is safe.
>
> These approach all have some very nice things going on for them, but
> also some killer scenarios.
>
> Immutability+GC allow to have safety while keeping interfaces simple.
> That is of great value. It also come with some nice goodies, in the
> sense that is it easy and safe to shared data without bookkeeping,
> allowing one to fit more in cache, and reduce the amount of garbage
> created. Most text processing apps fall into this category and this is
> why D is that good at them. Another big goodies is that many lock free
> algorithm become possible. Once you remove the need for bookkeeping of
> ownership many operations can be implemented in an atomic manner.
> Additionally, it is possible to implement various GC optimization on
> immutable heap, which make the GC generally more efficient. But the cost
> is also real. For some use case, this mean having a large amount of
> garbage generated (Carmack wrote a piece on haskell were he mention the
> disastrous effect that having a framebuffer immutable would have: you'd
> have to clone it everytime you draw in it, which is a no go). GC also
> tend to cause unpredictable runtime characteristics, which programs with
> real time constraint can have hard time to deal with.
>
> Relying on convention has the advantage that any scheme can be
> implemented without constraint, while keeping interface simple. The
> obvious drawback is that it is time consuming and error prone. It also
> make a lot of things unclear, and dev choose the better safe than sorry
> road. That mean excessive copying to make sure one own the data, which
> is wasteful (in term of work for the copy itself, garbage generation and
> cache pressure). If this must be an option locally for system code, it
> doesn't seems like this is the right option at program scale and we do
> it in C++ simply because we have to.
>
> Finally, annotations are a great way to combine safety and speed, but
> generally come at a great cost when implenting uncommon ownership
> strategies where you ends up having to express complex lifetime and
> ownership relations.
>
> Ideally, we want to map with what the hardware does. So what does the
> hardware do ?
>
> Multicore CPU have various cores, each of them having layers of cache.
> Cache is organized in cache line and each cache line can be in various
> modes. Actual system are quite complex and deal with problems we are not
> very interesting here (like writeback) but the general idea is that
> every cache line is owned with different modes.
>
> Either the cache line is owned by a single core and can be written to,
> or the cache line shared by several cores, each of them having a local
> copy of the line, but none of them can write to. There is an internal
> bus where cores can exchange cache line with each other and messages to
> acquire cache line in read or read/write mode. That mean CPU are good at
> thread local read/write, shared immutable and transfer of ownership from
> one core to the other. They are bad at shared writable data (as
> effectively, the cache line will have to bounce back and forth between
> cores, and all memory access will need to be serialized instead of
> performed out of order).
>
> In that world, D has a bizaro position were it use a combination of
> annotations (immutable, shared) and GC. Ultimately, this is a good
> solution. Using annotation for common cases, fallback on GC/unsafe code
> when these annotations fall short.
>
> Before going into why it is fallign short, a digression on GC and the
> benefits of segregating the heap. In D, the heap is almost segregated in
> 3 groups: thread local, shared and immutable. These group are very
> interesting for the GC:
>   - Thread local heap can be collected while disturbing only one thread.
> It should be possible to use different strategy in different threads.
>   - Immutable heap can be collected 100% concurrently without any
> synchronization with the program.
>   - Shared heap is the only one that require disturbing the whole
> program, but as a matter of good practice, this heap should be small
> anyway.
>
> Various ML family languages (like OCaml) have adopted segregated heap
> strategy and get great benefice out of it. For instance, OCaml's GC is
> known to outperform Java's in most scenarios.
>
> We are sitting on a huge GC goldmine here, but 3 things prevent us to
> exploit it:
>   - Exceptions. They can bubble from one thread to the other and create
> implicit sharing.
>   - Uniqueness (as it is defined now) as it allow for unique object to
> be merged with any heap.
>   - message passing. Ownership transfert is not possible and so unsafe
> casting ensue.
>
>   * It has to be noted that delegate allow as well for this kind of
> stunt, but this is recognized as a bug by now and hopefully it is gonna
> be fixed.
>
> D has a type qualifier system for which we pay a big price. Getting
> everything const correct is difficult. We'd want to get the most bang
> for the buck. One of the bang we are not far to be able to get is
> segregating the heap. That mean shitty GC and unsafe code.
>
> Let's present a concrete exemple using ownership:
> pure Object foo() { ... }
> immutable o = foo();
>
> This is valid code. However, foo can do arbitrary manipulation to come
> up with the object. These include various allocations. These allocation
> are mutable into foo, which makes it impossible to allocate them on the
> immutable heap (as a GC relying on this immutability could mess up
> things pretty bad). They also cannot be allocated on the TL heap as once
> promoted to immutable, the data become shared as well.
>
> On the other hand, ownership means that the compiler can know when
> things go out of scope and free them explicitly. Which is a plus as
> generating less garbage is always a way to improve garbage collection.
> The most efficient work there is is the one that do not need to be done.
>
> I'd argue for the introduction of a basic ownership system. Something
> much simpler than rust's, that do not cover all uses cases. But the good
> thing is that we can fallback on GC or unsafe code when the system show
> its limits. That mean we rely less on the GC, while being able to
> provide a better GC.
>
> We already pay a cost at interface with type qualifier, let's make the
> best of it ! I'm proposing to introduce a new type qualifier for owned
> data.
>
> Now it means that throw statement expect a owned(Throwable), that pure
> function that currently return an implicitly unique object will return
> owned(Object) and that message passing will accept to pass around owned
> stuff.
>
> The GC heap can be segregated into island. We currently have 3 types of
> islands : Thread local, shared and immutable. These are builtin island
> with special characteristics in the language. The new qualifier
> introduce a new type of island, the owned island.
>
> owned island can only refers to other owned island and to immutable.
> they can be merged in any other island at any time (that is why they
> can't refers to TL or shared).
>
> owned(T) can be passed around as function parameter or returned, or
> stored as fields. When doing so they are consumed. When an owned is not
> consumed and goes out of scope, the whole island is freed.
>
> That means that owned(T) can implicitly decay into T, immutable(T),
> shared(T) at any time. When doing so, a call to the runtime is done to
> merge the owned island to the corresponding island. It is passed around
> as owned, then the ownership is transferred and all local references to
> the island are invalidated (using them is an error).
>
> On an implementation level, a call to a pure function that return an
> owned could look like this :
>
> {
>    IslandID __saved = gc_switch_new_island();
>    scope(exit) gc_restore_island(__saved);
>
>    call_pure_function();
> }
>
> This allow us to rely much less on the GC and allow for a better GC
> implementation.
>
> @nogc . Remember ? It was in the title. What does a @nogc function look
> like ? a no gc function o not produce any garbage or trigger the
> collection cycle. there is no reason per se to prevent the @nogc code to
> allocate on the GC as long as you know it won't produce garbage. That
> mean the only operation you need to ban are the one that merge the owned
> things into TL, shared or immutable heap.
>
> This solves the problem of the @nogc + Exception. As Exception are
> isolated, they can be allocated, throw and catched into @nogc code
> without generating garbage. They can safely bubble out of the @nogc
> section of the code and still be safe.
>
> The same way, it open the door for a LOT of code that is not @nogc to
> be. If the code allocate memory in an owned island and return it, then
> it is now up to the caller to decide whether is want's it garbage
> collected or keep it as owned (and/or make it reference counted for
> instance).
>
> The solution of passing a policy at compile for allocation is close to
> what C++'s stdlib is doing, and even if the proposed approach by Andrei
> is better, I don't think this is a good one. The proposed approach allow
> for a lot of code to be marked as @nogc and allow for the caller to
> decide. That is ultimately what we want libraries to look like.

Humm.

import std.stdio;

struct TypeOwnerShip(T) {
	T value;
	alias value this;

	this(T)(T value) {
		this.value = value;	
	}
	
	// implict casts to immutable, shared?
	// on cast to immutable, shared change islands
}

T owned(T)(T value) {
	return TypeOwnerShip!T(value);	
}


class Bar {
	int x;
	
	this(int x) pure {
		this.x = x;	
	}
}

Bar foo() pure {
	return owned(new Bar(5));
}

struct IslandAllocationStrategy {
	this(ubyte v = 0) {
	}
	
	void opWithIn() {
		writeln("opWithIn");
		// thread local overriding
	}
	
	void opWithOut() {
		import std.stdio;
		writeln("opWithOut");
		// reset thread local overriding
	}
}

@property IslandAllocationStrategy island() {
	return IslandAllocationStrategy();
}

void main() {
	writeln("start");
	with(island) {
		opWithIn;
		writeln("{");
		
		Bar myValue = foo();
		writeln(myValue.x);
		
		
		writeln("}");
		opWithOut;
	}
	writeln("end");
}

I feel like I've suggested this, just without the CS theory.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by Walter Bright
in reply to deadalnix

Walter Bright

Posted in reply to deadalnix

On 11/11/2014 6:34 PM, deadalnix wrote:
> [...]

Thanks for an excellent summary of the problem. I can't just read your solution and know it works, it'll take some time.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by deadalnix
in reply to Walter Bright

deadalnix

Posted in reply to Walter Bright

On Wednesday, 12 November 2014 at 06:16:34 UTC, Walter Bright wrote:
> On 11/11/2014 6:34 PM, deadalnix wrote:
> > [...]
>
> Thanks for an excellent summary of the problem. I can't just read your solution and know it works, it'll take some time.

That is quite difficult to explain with drawing. Maybe I can discuss that with Andrei when he has some time with a whiteboard around.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by deadalnix
in reply to Rikki Cattermole

deadalnix

Posted in reply to Rikki Cattermole

On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki Cattermole wrote:
> [...]

yes and no. The ideas is similar, but it is not doable at library level if we want to get safety and the full benefit out of it, as it would require for the compiler to introduce some call to the runtime at strategic places and it does interact with @nogc.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by Ola Fosheim Grøstad
in reply to deadalnix

Ola Fosheim Grøstad

Posted in reply to deadalnix

On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:
> The problem at hand here is ownership of data.

"ownership of data" is one possible solution, but not the problem.

We are facing 2 problems:

1. A performance problem: Concurrency in writes (multiple writers, one writer, periodical locking during clean up etc).

2. A structural problem: Releasing resources correctly.

I suggest that the ownership focus is on the latter, to support solid non-GC implementations. Then rely on conventions for multi-threading.

>  - Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer.

All performant solutions are going to be "unsafe" in the sense that you need to select a duplication/locking level that are optimal for the characteristics of the actual application. Copying data when you have no writers is too inefficient in real applications.

Hardware support for transactional memory is going to be the easy approach for speeding up locking.

>  - Annotations. This is the Rust road. It also come a great

I think Rust's approach would favour a STM approach where you create thread local copies for processing then merge the result back into the "shared" memory.

> Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created.

How does GC fit more data in the cache? A GC usually has overhead and would typically generate more cache-misses due to unreachable in-cache ("hot") memory not being available for reallocation.

> Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to.
>
> Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations.

The core problem is that if you are unhappy with single-threaded applications then you are looking for high throughput using multi-threading. And in that case sacrificing performance by not using the optimal strategy becomes problematic.

The optimal strategy is entirely dependent on the application and the dataset.

Therefore you need to support multiple approaches:

- per data structure GC
- thread local GC
- lock annotations of types or variables
- speculative lock optimisations (transactional memory)

And in the future you also will need to support the integration of GPU/Co-processors into mainstream CPUs. Metal and OpenCL is only a beginning…

> Ideally, we want to map with what the hardware does. So what does the hardware do ?

That changes over time. The current focus in upcoming hardware is on:

1. Heterogenous architecture with high performance co-processors

2. Hardware support for transactional memory

Intel CPUs might have buffered transactional memory within 5 years.

> from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order).

This will vary a lot. On x86 you can write to a whole cache line (buffered) without reading it first and it uses a convenient cache coherency protocol (so that reads/write ops are in order). This is not true for all CPUs.

I agree with others that say that a heterogeneous approach, like C++, is the better alternative.  If parity with C++ is important then D needs to look closer at OpenMP, but that probably goes beyond what D can achieve in terms of implementation.

Some observations:

1. If you are not to rely on conventions for sync'ing threads then you need a pretty extensive framework if you want good performance.

2. Safety will harm performance.

3. Safety with high performance levels requires a very complicated static analysis that will probably not work very well for larger programs.

4. For most applications performance will come through co-processors (GPGPU etc).

5. If hardware progresses faster than compiler development, then you will never reach the performance frontier…

I think D needs to cut down on implementation complexity and ensure that the implementation time can catch up with hardware developments. The way to do it is:

1. Accept that generally performant multi-threaded code is unsafe and application/hardware optimized.

2. Focus on making @nogc single-threaded code robust and fast. And I agree that ownership is key.

3. Use semantic analysis to automatically generate a tailored runtime with application-optimized allocators.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by deadalnix
in reply to Ola Fosheim Grøstad

deadalnix

Posted in reply to Ola Fosheim Grøstad

On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim Grøstad wrote:
> That changes over time. The current focus in upcoming hardware is on:
>
> 1. Heterogenous architecture with high performance co-processors
>
> 2. Hardware support for transactional memory
>
> Intel CPUs might have buffered transactional memory within 5 years.
>

I'm sorry to be blunt, but there is nothing actionable in your comment. You are just throwing more and more into the pot until nobody know what there is in. But ultimately, the crux of the problem is the thing quoted above.

 1. No that do not change that much over time. The implementations details are changing, recent schemes become more complex to accommodate heterogeneous chips, but it is irrelevant here. What I've mentioned is true for all of them, and has been for at least 2 decades by now. There is no sign that this is gonna change.
 2. The transactional memory thing is completely orthogonal to the subject at hand so, as the details of implementation of modern chip, this doesn't belong here. In addition, the whole CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked.

There is only 2 way to achieve good design. You remove useless things until there is obviously nothing wrong, or you add more and more until there is nothing obviously wrong. I won't follow you down the second road, so please stay on track.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by Ola Fosheim Grøstad
in reply to deadalnix

Ola Fosheim Grøstad

Posted in reply to deadalnix

On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:
> I'm sorry to be blunt, but there is nothing actionable in your comment. You are just throwing more and more into the pot until nobody know what there is in. But ultimately, the crux of the problem is the thing quoted above.

My point is that you are making too many assumptions about both
applications and hardware.

>  2. The transactional memory thing is completely orthogonal to the subject at hand so, as the details of implementation of modern chip, this doesn't belong here. In addition, the whole CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked.

STM is used quite a bit. Hardware backed TM is used by IBM.

For many computationally intensive applications high levels of
parallelism is achieved using speculative computation. TM
supports that.

> There is only 2 way to achieve good design. You remove useless things until there is obviously nothing wrong, or you add more and more until there is nothing obviously wrong. I won't follow you down the second road, so please stay on track.

Good design is achieved by understanding different patterns of
concurrency in applications and how it can reach peak performance
in the environment (hardware).

If D is locked to a narrow memory model then you can only reach
high performance on a subset of applications.

If D wants to support system level programming then it needs to taken an open approach to the memory model.

November 12, 2014

Re: On heap segregation, GC optimization and @nogc relaxing

Posted by Paulo Pinto
in reply to deadalnix

Paulo Pinto

Posted in reply to deadalnix

On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:
> On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim
>In addition, the whole
> CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked.

Given the support on Haskell, Clojure and C++ I am not sure if they are really backpedaling on it.

The Haskell bugs are supposed to have been fixed in the next generation.

And there is PowerPC A2 as well.

Not that I have any use for it, though.

--
Paulo

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation