May 26, 2017
On Fri, May 26, 2017 at 06:06:42PM +0000, Mike B Johnson via Digitalmars-d-learn wrote:
> On Friday, 26 May 2017 at 14:05:34 UTC, ag0aep6g wrote:
> > On 05/26/2017 10:15 AM, realhet wrote:
> > > But hey, the GC knows that is should not search for any pointers in those large blocks.  And the buffer is full of 0-s at the start, so there can't be any 'false pointers' in it. And I think the GC will not search in it either.
> > 
> > The issue is not that the block contains a false pointer, but that there's a false pointer elsewhere that points into the block. The bigger the block, the more likely it is that something (e.g. an int on the stack) is mistaken for a pointer into it.
> 
> Wow, if that is the case then the GC has some real issues. The GC should be informed about all pointers and an int is not a pointer.

Unfortunately, it can't, because (1) D interfaces with C code, and you don't have this kind of information from a C object file, and (2) you can turn a pointer into an int with a cast or a union in @system code, and since the GC cannot assume @safe for all code, it needs to be conservative and assume any int-like data could potentially be a pointer.

You could improve GC performance by giving it type info from @safe code so that it skips over blocks that *definitely* have no pointers (it already does this to some extent, e.g., data in an int[] will never be scanned for pointers because the GC knows it can't contain any). But you can't make the GC fully non-conservative because it may crash the program when it wrongly assumes a memory block is dead when it's actually still live. All it takes is one pointer on the stack that's wrongly assumed to be just int, and you're screwed.


T

-- 
Dogs have owners ... cats have staff. -- Krista Casada
May 27, 2017
On Friday, 26 May 2017 at 18:19:48 UTC, H. S. Teoh wrote:
> On Fri, May 26, 2017 at 06:06:42PM +0000, Mike B Johnson via Digitalmars-d-learn wrote:
>> On Friday, 26 May 2017 at 14:05:34 UTC, ag0aep6g wrote:
>> > On 05/26/2017 10:15 AM, realhet wrote:
>> > > But hey, the GC knows that is should not search for any pointers in those large blocks.  And the buffer is full of 0-s at the start, so there can't be any 'false pointers' in it. And I think the GC will not search in it either.
>> > 
>> > The issue is not that the block contains a false pointer, but that there's a false pointer elsewhere that points into the block. The bigger the block, the more likely it is that something (e.g. an int on the stack) is mistaken for a pointer into it.
>> 
>> Wow, if that is the case then the GC has some real issues. The GC should be informed about all pointers and an int is not a pointer.
>
> Unfortunately, it can't, because (1) D interfaces with C code, and you don't have this kind of information from a C object file, and (2) you can turn a pointer into an int with a cast or a union in @system code, and since the GC cannot assume @safe for all code, it needs to be conservative and assume any int-like data could potentially be a pointer.
>
> You could improve GC performance by giving it type info from @safe code so that it skips over blocks that *definitely* have no pointers (it already does this to some extent, e.g., data in an int[] will never be scanned for pointers because the GC knows it can't contain any). But you can't make the GC fully non-conservative because it may crash the program when it wrongly assumes a memory block is dead when it's actually still live. All it takes is one pointer on the stack that's wrongly assumed to be just int, and you're screwed.
>
>

And what if one isn't interfacing to C? All pointers should be known. You can't access memory by and int or any other non-pointer type! Hence, when pointers are created or ints are cast to pointers, the GC should be informed and then handle them appropriately(then, instead of scanning a 100MB block of memory for "pointers" it should scan the list of possible pointers(which will generally be much much lower).

Therefor, in a true D program(no outsourcing) with no pointers used, the GC should never have to scan anything.

It seems the GC can be smarter than it is instead of just making blanket assumptions about the entire program(which rarely hold), which is generally always a poor choice when it comes to performance...

In fact, When interfacing with C or other programs, memory could be partitioned and any memory that may escape D is treated differently than the memory used only by D code.

After all, if we truly want to be safe, why not scan the entire memory of the system? Who knows, some pointer externally might be peeping in on our hello world program.


May 27, 2017
On Saturday, 27 May 2017 at 17:57:03 UTC, Mike B Johnson wrote:

> And what if one isn't interfacing to C? All pointers should be known. You can't access memory by and int or any other non-pointer type! Hence, when pointers are created or ints are cast to pointers, the GC should be informed and then handle them appropriately

Eh? So *every* cast from and to a pointer should become a call into the runtime, poking the GC? Or rather, every variable declaration should somehow be made magically known to the GC without any runtime cost?

> (then, instead of scanning a 100MB block of memory for "pointers" it should scan the list of possible pointers(which will generally be much much lower).

That's precisely what it does, it scans the possible suspects, nothing more. That is, the stack (it has no idea what's there, it's just a block of untyped memory), memory it itself allocated *only if* it needs to (e.g. you allocated a typed array, and the type has pointers), memory you've specifically asked it to scan. It won't scan that block of 500k ints the OP allocated, unless told to do so. It would scan it if it was a void[] block though.

> Therefor, in a true D program(no outsourcing) with no pointers used, the GC should never have to scan anything.

No pointers used? No arrays, no strings, no delegates?.. That's a rather limited program. But thing is, you're right, in such a program the GC will indeed never have to scan anything. If you never allocate, GC collection never occurs either.

> It seems the GC can be smarter than it is instead of just making blanket assumptions about the entire program(which rarely hold), which is generally always a poor choice when it comes to performance...

Unnecessary interaction with the GC, e.g. informing it about every cast, is a poor choice for performance.

> After all, if we truly want to be safe, why not scan the entire memory of the system? Who knows, some pointer externally might be peeping in on our hello world program.

What?
May 27, 2017
On Saturday, 27 May 2017 at 17:57:03 UTC, Mike B Johnson wrote:

> And what if one isn't interfacing to C? All pointers should be known.

Apparently some people are (were?) working on semi-precise GC: https://github.com/dlang/druntime/pull/1603
That still scans the stack conservatively, though.

> Therefor, in a true D program(no outsourcing) with no pointers used, the GC should never have to scan anything.
All realistic programs (in any language) use a lot of pointers - for example, all slices in D have embedded pointers (slice.ptr), references are pointers, classes are references, etc.

> It seems the GC can be smarter than it is instead of just making blanket assumptions about the entire program(which rarely hold), which is generally always a poor choice when it comes to performance...
If you only have compile time information, making blanket assumptions is inevitable - after all, compiler can't understand how a nontrivial program actually works. The alternative is doing more work at runtime (marking pointers that changed since previous collection, etc), which is also not good for performance.

> Who knows, some pointer externally might be peeping in on our hello world program.
Of course, there is a pointer :)

void main()
{
    import std.stdio;

    writeln("hello world".ptr);
}
1 2
Next ›   Last »