December 11, 2012
On 11-12-2012 20:09, Dmitry Olshansky wrote:
> 12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
>> http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/
>>
>>
>>
>> Destroy.
>>
>
> Aside from the fact that I can use slices without GC just fine? :)
> The base pointers would then be either counted, released manually or
> implicitly as part of stack unwinding.

Yes, in theory. But that's not how most idiomatic D code written today works.

>
> I personally think that managed VMs are going to have to emulate slices
> and pointers as an array object + one or pair of offsets. In fact it
> could be implemented as an abstract object with implementation depending
> on where you did get that pointer from.
>


-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
December 11, 2012
12/11/2012 11:23 PM, Alex Rønne Petersen пишет:
> On 11-12-2012 20:09, Dmitry Olshansky wrote:
>> 12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
>>> http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/
>>>
>>>
>>>
>>>
>>> Destroy.
>>>
>>
>> Aside from the fact that I can use slices without GC just fine? :)
>> The base pointers would then be either counted, released manually or
>> implicitly as part of stack unwinding.
>
> Yes, in theory. But that's not how most idiomatic D code written today
> works.
>

I'd mention that the most of idiomatic D code is agnostic with respect to the origin of slice. The major reason to use slices is to avoid allocations and thus the allocation scheme is not important up to the point of explicit copy.

And at that point e.g. Phobos plays it safe and does everything that has to copy or incrementally build via GC. And it gets bashed for it every once in a while. To put simply it's because there is no concept of allocators in idiomatic D code _yet_.

And separating slices and allocation mechanism behind them is the key of usability of slices as they stand. If we add stuff that makes them 50% more bulky and helps only a certain scheme of GC memory allocation we are screwed.

Also what would direct operations with ptr field translate to in your scheme e.g.:
arr.ptr = arr.ptr+x;

where ptr+x is hidden by some function and not obvious to the compiler?

Same question with slicing a raw pointer - what will the base contain? (The pointer _might_ have been interior.)

>>
>> I personally think that managed VMs are going to have to emulate slices
>> and pointers as an array object + one or pair of offsets. In fact it
>> could be implemented as an abstract object with implementation depending
>> on where you did get that pointer from.
>>


-- 
Dmitry Olshansky
December 11, 2012

On 11.12.2012 18:25, Alex Rønne Petersen wrote:
> On 11-12-2012 08:29, Rainer Schuetze wrote:
>>
>> On 11.12.2012 01:04, Alex Rønne Petersen wrote:
>>> http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/
>>>
>>
>>  > This is clearly a huge problem for type-precise garbage collection.
>>
>> I don't see problems here. If a memory block is referenced, all of it
>> contents remains in memory, so they are scanned with their full type
>> info. Or do you want to chop off unreferenced parts of the memory block?
>
> No, the problem I was getting at is:
>
> Suppose we have a field int* p; somewhere in the GC heap. With the
> current state of affairs, we have to consider that this field can hold a
> value that is either:
>
> a) null (we don't care)
> b) a pointer into C memory (we don't care)
> c) a base pointer into the GC heap (unlikely but possible if "new int"
> was used somewhere)
> d) an interior pointer into the GC heap (much more likely; a pointer to
> a field of another object)
>
> So we have to look at the pointer and first figure out what kind of
> memory block it is /actually/ pointing to before we have any kind of
> type info available (just the knowledge that it's of type int* is not
> particularly useful by itself other than knowing that it could be a
> pointer at all).

At least for the D GC, the major work is to figure out if the pointer is pointing to GC memory or not. Once that is done (i.e. a pool of contiguous memory is found that contains the addressed memory) it's just a table lookup for the size and corresponding address alignment to get the base of the referenced GC memory block.

>
> With my scheme, the possibilities would be:
>
> a) null (we don't care)
> b) a pointer into C memory (we don't care)
> c) a base pointer into the GC heap where the memory block is of type int*
>
> Notice how we did not have to do any significant work to figure out what
> we're dealing with; we immediately know what kind of typed memory the
> pointer is pointing to.

This stores the type info with the reference, not with the memory block, but it does not make a big difference. (Actually it does: if the reference only is a reference a base class of the actual instance, type info is lost.)

>
> This becomes more of an advantage with aggregates. Suppose we have:
>
> struct A
> {
>      // ... more fields ...
> }
>
> And we have a field A* p; somewhere in the GC heap. We can now look at
> it and immediately tell whether it's a case of a, b, or c above and can
> trivially continue scanning into the pointed-to memory (if needed).
>
> So the TL;DR is: We avoid extra work to figure out the actual type of
> the memory something is pointing to by simply making such cases illegal.
>
> Whether that is practical, I do not know, and I don't plan to push for
> it anytime soon at least. But it has to be done for D to ever run on the
> CLI.

I understand that the CLI forbids interior pointers, but that seems an implementation detail of its GC.

>
>>
>>  From your post, it seems these are restrictions imposed by the .NET GC,
>> not by slices in general. If you take a pointer to a field inside a
>> struct, you will again get interior pointer. Do you want "fat pointers"
>> for this as well?
>
> Sure, there's nothing wrong with slices if we assume all GCs that'll be
> running in a D implementation support interior pointers. But if we make
> this assumption, D can never run on the CLI.
>
> Interior pointers are OK in the stack and registers, so taking pointers
> to fields inside aggregates should be fine so long as they are not
> stored in the heap.
>

I don't think we should introduce pretty strange semantics that introduce different kind of pointers and targets depending on whether they live on the heap or the stack.

The best that could be done for a .NET target build would be to let the compiler create fat pointers that always store the base of the memory block and an offset, not just for slices.

BTW I was also thinking whether "instrumented" pointers should be used to support a GC that works without "stopping the world". E.g. they would allow to keep track of references to each memory block continuously, or to remember which references were changed since the last scan in the hope to do incremental/generational scans.
December 11, 2012
On 11-12-2012 21:24, Dmitry Olshansky wrote:
> 12/11/2012 11:23 PM, Alex Rønne Petersen пишет:
>> On 11-12-2012 20:09, Dmitry Olshansky wrote:
>>> 12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
>>>> http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Destroy.
>>>>
>>>
>>> Aside from the fact that I can use slices without GC just fine? :)
>>> The base pointers would then be either counted, released manually or
>>> implicitly as part of stack unwinding.
>>
>> Yes, in theory. But that's not how most idiomatic D code written today
>> works.
>>
>
> I'd mention that the most of idiomatic D code is agnostic with respect
> to the origin of slice. The major reason to use slices is to avoid
> allocations and thus the allocation scheme is not important up to the
> point of explicit copy.
>
> And at that point e.g. Phobos plays it safe and does everything that has
> to copy or incrementally build via GC. And it gets bashed for it every
> once in a while. To put simply it's because there is no concept of
> allocators in idiomatic D code _yet_.
>
> And separating slices and allocation mechanism behind them is the key of
> usability of slices as they stand. If we add stuff that makes them 50%
> more bulky and helps only a certain scheme of GC memory allocation we
> are screwed.

Then our current slice design is broken.

int[] arr;
arr.length = 1024; // guess where this memory comes from?

>
> Also what would direct operations with ptr field translate to in your
> scheme e.g.:
> arr.ptr = arr.ptr+x;
>
> where ptr+x is hidden by some function and not obvious to the compiler?

Exactly what you wrote. Remember, the ptr field doesn't change meaning.

>
> Same question with slicing a raw pointer - what will the base contain?
> (The pointer _might_ have been interior.)

Of course you have to take care not to slice an interior pointer and let the base pointer go out of scope.

>
>>>
>>> I personally think that managed VMs are going to have to emulate slices
>>> and pointers as an array object + one or pair of offsets. In fact it
>>> could be implemented as an abstract object with implementation depending
>>> on where you did get that pointer from.
>>>
>
>


-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
December 11, 2012
On 11-12-2012 21:24, Rainer Schuetze wrote:
>
>
> On 11.12.2012 18:25, Alex Rønne Petersen wrote:
>> On 11-12-2012 08:29, Rainer Schuetze wrote:
>>>
>>> On 11.12.2012 01:04, Alex Rønne Petersen wrote:
>>>> http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/
>>>>
>>>>
>>>
>>>  > This is clearly a huge problem for type-precise garbage collection.
>>>
>>> I don't see problems here. If a memory block is referenced, all of it
>>> contents remains in memory, so they are scanned with their full type
>>> info. Or do you want to chop off unreferenced parts of the memory block?
>>
>> No, the problem I was getting at is:
>>
>> Suppose we have a field int* p; somewhere in the GC heap. With the
>> current state of affairs, we have to consider that this field can hold a
>> value that is either:
>>
>> a) null (we don't care)
>> b) a pointer into C memory (we don't care)
>> c) a base pointer into the GC heap (unlikely but possible if "new int"
>> was used somewhere)
>> d) an interior pointer into the GC heap (much more likely; a pointer to
>> a field of another object)
>>
>> So we have to look at the pointer and first figure out what kind of
>> memory block it is /actually/ pointing to before we have any kind of
>> type info available (just the knowledge that it's of type int* is not
>> particularly useful by itself other than knowing that it could be a
>> pointer at all).
>
> At least for the D GC, the major work is to figure out if the pointer is
> pointing to GC memory or not. Once that is done (i.e. a pool of
> contiguous memory is found that contains the addressed memory) it's just
> a table lookup for the size and corresponding address alignment to get
> the base of the referenced GC memory block.

I see. That probably does make it less of a problem for D's GC.

>
>>
>> With my scheme, the possibilities would be:
>>
>> a) null (we don't care)
>> b) a pointer into C memory (we don't care)
>> c) a base pointer into the GC heap where the memory block is of type int*
>>
>> Notice how we did not have to do any significant work to figure out what
>> we're dealing with; we immediately know what kind of typed memory the
>> pointer is pointing to.
>
> This stores the type info with the reference, not with the memory block,
> but it does not make a big difference. (Actually it does: if the
> reference only is a reference a base class of the actual instance, type
> info is lost.)

This got me thinking a bit.

In the current on-going precise GC work, what is type info actually used for? It seems to me that given the current GC semantics, the only thing it's useful for is figuring out what parts of memory do /not/ contain pointers, and nothing else.

>
>>
>> This becomes more of an advantage with aggregates. Suppose we have:
>>
>> struct A
>> {
>>      // ... more fields ...
>> }
>>
>> And we have a field A* p; somewhere in the GC heap. We can now look at
>> it and immediately tell whether it's a case of a, b, or c above and can
>> trivially continue scanning into the pointed-to memory (if needed).
>>
>> So the TL;DR is: We avoid extra work to figure out the actual type of
>> the memory something is pointing to by simply making such cases illegal.
>>
>> Whether that is practical, I do not know, and I don't plan to push for
>> it anytime soon at least. But it has to be done for D to ever run on the
>> CLI.
>
> I understand that the CLI forbids interior pointers, but that seems an
> implementation detail of its GC.

It's standardized: Ecma 335, I.8.2.1.1.

And it's something we need to deal with if we care about D on the CLI.

>
>>
>>>
>>>  From your post, it seems these are restrictions imposed by the .NET GC,
>>> not by slices in general. If you take a pointer to a field inside a
>>> struct, you will again get interior pointer. Do you want "fat pointers"
>>> for this as well?
>>
>> Sure, there's nothing wrong with slices if we assume all GCs that'll be
>> running in a D implementation support interior pointers. But if we make
>> this assumption, D can never run on the CLI.
>>
>> Interior pointers are OK in the stack and registers, so taking pointers
>> to fields inside aggregates should be fine so long as they are not
>> stored in the heap.
>>
>
> I don't think we should introduce pretty strange semantics that
> introduce different kind of pointers and targets depending on whether
> they live on the heap or the stack.

I don't think those semantics are particularly strange. It's how most all virtual machines work.

>
> The best that could be done for a .NET target build would be to let the
> compiler create fat pointers that always store the base of the memory
> block and an offset, not just for slices.

Perhaps, but realistically, we can't do this because most code assumes pointers are the same size as the machine's word size (i.e. (void*).sizeof == size_t.sizeof).

There's also the problem that, strictly speaking, I.8.2.1.1 says that interior pointers are outright forbidden in the heap regardless of whether a live base pointer exists...

>
> BTW I was also thinking whether "instrumented" pointers should be used
> to support a GC that works without "stopping the world". E.g. they would
> allow to keep track of references to each memory block continuously, or
> to remember which references were changed since the last scan in the
> hope to do incremental/generational scans.

I'm not familiar with anything like that so I can't comment on it. Sounds interesting, though.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
December 11, 2012

On 11.12.2012 22:08, Alex Rønne Petersen wrote:
> On 11-12-2012 21:24, Rainer Schuetze wrote:
>>
>> This stores the type info with the reference, not with the memory block,
>> but it does not make a big difference. (Actually it does: if the
>> reference only is a reference a base class of the actual instance, type
>> info is lost.)
>
> This got me thinking a bit.
>
> In the current on-going precise GC work, what is type info actually used
> for? It seems to me that given the current GC semantics, the only thing
> it's useful for is figuring out what parts of memory do /not/ contain
> pointers, and nothing else.

Yes, it is only interested in pointers. The current implementation creates a bitmap from type introspection at compile time, where each bit specifies whether the respective word of an instance is a pointer. When a memory block is allocated, the bitmap is copied (with some complications) from the TypeInfo object into a memory bitmap that is used for GC scanning later. This seems slightly inefficient with respect to memory usage, but it allows to scan faster, as the complications have to be dealt with only once, not every time when scanning. It also allows changing the scanning information of only a part later e.g. to integrate emplace!T with precise scanning (though this isn't implemented yet).

>>
>> I understand that the CLI forbids interior pointers, but that seems an
>> implementation detail of its GC.
>
> It's standardized: Ecma 335, I.8.2.1.1.

I still read that restriction as driven by an implementation detail ("For performance reasons..."), not by some design necessity.

>
> And it's something we need to deal with if we care about D on the CLI.
>

December 11, 2012
On Tuesday, 11 December 2012 at 18:11:32 UTC, Robert Jacques wrote:
> On Tue, 11 Dec 2012 11:25:44 -0600, Alex Rønne Petersen wrote:
>> Interior pointers are OK in the stack and registers, so taking pointers to fields inside aggregates should be fine so long as they are not stored in the heap.
>
> So what about unions?

 The pointer & lengths won't work well together if you mix them. Consider.

  struct S {
    union {
      int[] i;
      byte[] b;
    }
  }

  S s;

  s.i.length = 4;
  assert(s.i.length == 4);
  assert(s.b.length == 16); //fails
  assert(s.b.length == 4);  //the implementation

  s.b = cast(byte[]) s.i;
  assert(s.b.length == 16); //true
  assert(s.i.length == 4);  //fails
  assert(s.i.length == 16); //the implementation (last twelve Sigfaults probably)

 The only way to properly use that is to have one of the data types you always convert from/to, but the GC wouldn't know and might try them all; Although only the base pointer might be considered so...
December 12, 2012
12/12/2012 12:59 AM, Alex Rønne Petersen пишет:
[snip]
>>
>> I'd mention that the most of idiomatic D code is agnostic with respect
>> to the origin of slice. The major reason to use slices is to avoid
>> allocations and thus the allocation scheme is not important up to the
>> point of explicit copy.
>>
>> And at that point e.g. Phobos plays it safe and does everything that has
>> to copy or incrementally build via GC. And it gets bashed for it every
>> once in a while. To put simply it's because there is no concept of
>> allocators in idiomatic D code _yet_.
>>
>> And separating slices and allocation mechanism behind them is the key of
>> usability of slices as they stand. If we add stuff that makes them 50%
>> more bulky and helps only a certain scheme of GC memory allocation we
>> are screwed.
>
> Then our current slice design is broken.
>
> int[] arr;
> arr.length = 1024; // guess where this memory comes from?
>
Nice one ;)
Guess this point was destroyed.

-- 
Dmitry Olshansky
December 12, 2012
On 12-12-2012 09:30, Dmitry Olshansky wrote:
> 12/12/2012 12:59 AM, Alex Rønne Petersen пишет:
> [snip]
>>>
>>> I'd mention that the most of idiomatic D code is agnostic with respect
>>> to the origin of slice. The major reason to use slices is to avoid
>>> allocations and thus the allocation scheme is not important up to the
>>> point of explicit copy.
>>>
>>> And at that point e.g. Phobos plays it safe and does everything that has
>>> to copy or incrementally build via GC. And it gets bashed for it every
>>> once in a while. To put simply it's because there is no concept of
>>> allocators in idiomatic D code _yet_.
>>>
>>> And separating slices and allocation mechanism behind them is the key of
>>> usability of slices as they stand. If we add stuff that makes them 50%
>>> more bulky and helps only a certain scheme of GC memory allocation we
>>> are screwed.
>>
>> Then our current slice design is broken.
>>
>> int[] arr;
>> arr.length = 1024; // guess where this memory comes from?
>>
> Nice one ;)
> Guess this point was destroyed.
>

Just to clarify: I'm not saying you're wrong. I think the fact that that particular slice feature is tied to the GC is actually a pretty bad thing.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
December 12, 2012
On 11-12-2012 22:38, Rainer Schuetze wrote:
>
>
> On 11.12.2012 22:08, Alex Rønne Petersen wrote:
>> On 11-12-2012 21:24, Rainer Schuetze wrote:
>>>
>>> This stores the type info with the reference, not with the memory block,
>>> but it does not make a big difference. (Actually it does: if the
>>> reference only is a reference a base class of the actual instance, type
>>> info is lost.)
>>
>> This got me thinking a bit.
>>
>> In the current on-going precise GC work, what is type info actually used
>> for? It seems to me that given the current GC semantics, the only thing
>> it's useful for is figuring out what parts of memory do /not/ contain
>> pointers, and nothing else.
>
> Yes, it is only interested in pointers. The current implementation
> creates a bitmap from type introspection at compile time, where each bit
> specifies whether the respective word of an instance is a pointer. When
> a memory block is allocated, the bitmap is copied (with some
> complications) from the TypeInfo object into a memory bitmap that is
> used for GC scanning later. This seems slightly inefficient with respect
> to memory usage, but it allows to scan faster, as the complications have
> to be dealt with only once, not every time when scanning. It also allows
> changing the scanning information of only a part later e.g. to integrate
> emplace!T with precise scanning (though this isn't implemented yet).

OK, makes sense, and point taken. The D GC would not benefit from getting rid of interior pointers in any significant way. The only advantage would then be for VMs like the CLI.

>
>>>
>>> I understand that the CLI forbids interior pointers, but that seems an
>>> implementation detail of its GC.
>>
>> It's standardized: Ecma 335, I.8.2.1.1.
>
> I still read that restriction as driven by an implementation detail
> ("For performance reasons..."), not by some design necessity.

Oh, sure, all I'm saying is that since it is in the standard (and both MS.NET and Mono require it), we have to deal with it one way or another.

>
>>
>> And it's something we need to deal with if we care about D on the CLI.
>>
>

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
1 2
Next ›   Last »