December 03, 2015
On Thursday, 3 December 2015 at 14:24:52 UTC, Andrei Alexandrescu wrote:
> On 12/03/2015 06:13 AM, Vladimir Panteleev wrote:
>> You said that "at the end of the day there's documentation". I would
>> argue that at least in this case, it may not be enough. Consider, for
>> example, a hypothetical user type "Pack", which takes a tuple/struct and
>> automatically arranges the fields into a struct such that space is used
>> optimally (e.g. all bools are clumped together into a bitfield, enums
>> are only given as much bits as their .max needs, etc.). Pack only needs
>> to know one property of each field: how many bits it really needs, and
>> as such, it might elect to be agnostic of what a pointer is. If it uses
>> std.bitmanip.bitfields as its backend, it will happily pack a pointer at
>> its user's request, and the user will never see the pointer warning in
>> Pack's documentation. And yes, although it's easy to point the finger at
>> users and say "ha ha, it's your own fault, you did not RTFM", I think we
>> should strive for better than that.
>
> I understand how you feel but I really don't know what else to do, which makes the entire discussion somewhat theoretical.

Doesn't matter as long as it's explicitly opt-in, and there's more than one way to do that. Template parameter flag, an alternative declaration such as unsafeBitfield, disallowing pointers but allowing a shallow wrapper around them ("UnmanagedPtr" OSLT), etc.

December 03, 2015
On 12/3/15 9:28 AM, Andrei Alexandrescu wrote:
> On 12/03/2015 07:45 AM, Steven Schveighoffer wrote:
>>
>> taggedPointer and taggedClassRef are GC safe (despite the incorrect
>> warning listed in the docs). Your proposed mechanism is not.
>
> It can be restricted to support what tagged* do.

This is a possibility. Allowing higher bit manipulation is no good for the GC. Allowing lower bit manipulation that extends past a single element is no good also. These restrictions are enforced at compile time by the tagged functions.

>
>> IMO, we should keep those and close your enhancement, it doesn't add
>> anything useful. Seems to me something that can break very easily.
>
> Please leave it open, thanks.

I of course would not close it, that is not my place.

>> Phobos should in no way support such egregious casts implicitly. Even in
>> @system code.
>>
>> Do you have any rationale to prefer arbitrary bitfield pointers over GC
>> safe ones?
>
> 1. The less restricted version offers use of high-order bits as well.

Again, this is not GC-safe. But another thing taggedPointer and taggedClassRef do (that I think your proposal does not) is restrict the lower bits that can be manipulated based on the alignment of the target type.


> If
> we don't support that, those who need it will do that in client code
> with the usual liabilities.

The usual liabilities aren't mitigated by the proposal. I was under the impression that D should allow error-prone code, but shouldn't promote it.

> 2. There's no reason for taggedPointer and taggedClassRef to exist. They
> should be integrated within bitfields.

One departure from bitfields for tagged* is that the API does not allow invalid pointer/bitfield specifications (the number of bits reserved for the pointer is implied from the pointer type and arch). Your proposal uses an assert to verify the bits from the pointer source are zero, allowing a possible corruption to occur if you compile with -release. The tagged* types prove at compile time that you will ALWAYS see zero bits there (because of alignment).

As long as bitfields follows the same rules and API, then I think it could potentially be merged. But I don't see a huge value in this, seems like an unnecessary code break.

-Steve
December 03, 2015
On 12/3/15 8:02 AM, Steven Schveighoffer wrote:
> An interior pointer is a pointer that is *properly aligned* but does not
> point at the first byte of a piece of memory. taggedPointer and
> taggedClassRef create *interior pointers*, not *misaligned pointers*.
> Andrei's proposal will create *misaligned pointers*. There is a huge
> difference.
>

I need to correct this. Andrei's proposal does not create misaligned pointers (as he specifically calls for no shifting for such pointers), just pointers to memory that is unrelated to the referenced memory. The effect is the same -- you cannot rely on such pointers to keep the memory in the GC.

-Steve
December 04, 2015
First it check for alignement. Considering this :

On Thursday, 3 December 2015 at 09:11:12 UTC, Vladimir Panteleev wrote:
> True, assuming that:
>
> 1. The pointers are still aligned at machine word boundaries

No. The pointer needs to be aligned as per underlying data type expectation. If it isn't aligned, the operation that produced this unaligned pointer must be unsafe, not the bitfield capability.

> 2. The underlying storage type is one that the GC will scan for pointers (e.g. void or void*, not size_t/ubyte)

Yes, this one is not a problem currently, but can be (and in fact should be with a better GC). Hopefully, this is something I want to improve already and not a blocker considering the API, just an implementation detail.

> 3. The setters enforce that the discarded pointer bits were zero

If these bits aren't 0, the operation that set them to 1 is the one that is unsafe.

> 4. No more than 4 bits are reused (as the smallest GC object size is 16 bytes)
>

Not correct. Considering the pointer is of type T*, then it is safe as long as align(T*) <= sizeof(T) which is correct on all architectures I know of (tho I wouldn't be surprised that some weird arch not used since the 70s may break this constraint).

The only valid concern here is 2. , but currently not a problem with the GC we have, and simply an implementation issue.
December 04, 2015
On Thursday, 3 December 2015 at 12:45:10 UTC, Steven Schveighoffer wrote:
> Do you have any rationale to prefer arbitrary bitfield pointers over GC safe ones?
>

There are various valid use of this in HHVM for instance. One of the nasty trick that is used is to allocate the memory to JIT code in the the lower 32bits of memory, and then pad pointer with 0 to retrieve them.

Because of this, various datastructures can be compacted, and address of code can be cramed directly in the instruction stream (at least on x86) when it can't be on 64 bits.

There is a talk by Drew Parowski where he explains it (https://www.youtube.com/watch?v=XqK8Yuoq4ig I think, but not sure).

However, I agree with the sentiment. This is the kind of features you are looking for to get the last few percent and shouldn't be encouraged. That is highly non portable and probably doesn't belong in an std module.

NB: I considered adding this functionality when doing the taggedPointer thing, (x64 has 48bits of effective address space) but eventually decided against.
December 04, 2015
On Friday, 4 December 2015 at 01:35:45 UTC, deadalnix wrote:
> First it check for alignement. Considering this :
>
> On Thursday, 3 December 2015 at 09:11:12 UTC, Vladimir Panteleev wrote:
>> True, assuming that:
>>
>> 1. The pointers are still aligned at machine word boundaries
>
> No. The pointer needs to be aligned as per underlying data type expectation. If it isn't aligned, the operation that produced this unaligned pointer must be unsafe, not the bitfield capability.

You misunderstood. The bitfield must *store* the pointers at addresses that are aligned at machine word boundaries.

>> 3. The setters enforce that the discarded pointer bits were zero
>
> If these bits aren't 0, the operation that set them to 1 is the one that is unsafe.

Well, that depends on how many bits are discarded? And that's not log2(T.sizeof). `cast(size_t)ptr % T.sizeof` may not be 0 in all cases.

>> 4. No more than 4 bits are reused (as the smallest GC object size is 16 bytes)
>
> Not correct. Considering the pointer is of type T*, then it is safe as long as align(T*) <= sizeof(T) which is correct on all architectures I know of (tho I wouldn't be surprised that some weird arch not used since the 70s may break this constraint).

I realized this was off after posting but I don't understand your reasoning either. The size and alignment just put a bound on the number of bits, but without verification in the setter you can't be sure, right?

December 04, 2015
On Friday, 4 December 2015 at 10:31:19 UTC, Vladimir Panteleev wrote:
> I realized this was off after posting but I don't understand your reasoning either. The size and alignment just put a bound on the number of bits, but without verification in the setter you can't be sure, right?

If one of the bit within the alignment is not 0, that mean you did something unsafe previously to create that pointer. There should be no safe way (and I know of no safe way) to create such a pointer.

In fact, some hardware will outright fault if you try to manipulate unaligned data.

December 05, 2015
On Friday, 4 December 2015 at 23:38:08 UTC, deadalnix wrote:
> On Friday, 4 December 2015 at 10:31:19 UTC, Vladimir Panteleev wrote:
>> I realized this was off after posting but I don't understand your reasoning either. The size and alignment just put a bound on the number of bits, but without verification in the setter you can't be sure, right?
>
> If one of the bit within the alignment is not 0, that mean you did something unsafe previously to create that pointer.

But this only applies to ... pointers to pointers, right? In D, only pointer variables have to be aligned to maintain safety, and even then that only applies to GC pointers (a C function may return an "unaligned" pointer pointer). struct { align(1): ubyte a; int b; } is still quite safe, and so is interpreting a pointer into an array of ubytes as an uint.

December 05, 2015
On Saturday, 5 December 2015 at 00:33:15 UTC, Vladimir Panteleev wrote:
> On Friday, 4 December 2015 at 23:38:08 UTC, deadalnix wrote:
>> On Friday, 4 December 2015 at 10:31:19 UTC, Vladimir Panteleev wrote:
>>> I realized this was off after posting but I don't understand your reasoning either. The size and alignment just put a bound on the number of bits, but without verification in the setter you can't be sure, right?
>>
>> If one of the bit within the alignment is not 0, that mean you did something unsafe previously to create that pointer.
>
> But this only applies to ... pointers to pointers, right? In D, only pointer variables have to be aligned to maintain safety, and even then that only applies to GC pointers (a C function may return an "unaligned" pointer pointer). struct { align(1): ubyte a; int b; } is still quite safe, and so is interpreting a pointer into an array of ubytes as an uint.

No a pointer has some alignment that depends on whatever data it points to. You cannot steal any bits for a char* (and taggedPointer will reject it) you can steal one bit from a short*, and so on. This is checked statically.
December 04, 2015
On 12/4/15 10:57 PM, deadalnix wrote:
> On Saturday, 5 December 2015 at 00:33:15 UTC, Vladimir Panteleev wrote:
>> On Friday, 4 December 2015 at 23:38:08 UTC, deadalnix wrote:
>>> On Friday, 4 December 2015 at 10:31:19 UTC, Vladimir Panteleev wrote:
>>>> I realized this was off after posting but I don't understand your
>>>> reasoning either. The size and alignment just put a bound on the
>>>> number of bits, but without verification in the setter you can't be
>>>> sure, right?
>>>
>>> If one of the bit within the alignment is not 0, that mean you did
>>> something unsafe previously to create that pointer.
>>
>> But this only applies to ... pointers to pointers, right? In D, only
>> pointer variables have to be aligned to maintain safety, and even then
>> that only applies to GC pointers (a C function may return an
>> "unaligned" pointer pointer). struct { align(1): ubyte a; int b; } is
>> still quite safe, and so is interpreting a pointer into an array of
>> ubytes as an uint.
>
> No a pointer has some alignment that depends on whatever data it points
> to. You cannot steal any bits for a char* (and taggedPointer will reject
> it) you can steal one bit from a short*, and so on. This is checked
> statically.

I think what Vladimir is referring to is an align(1) struct:

struct Foo
{
   align(1):
   ubyte a;
   int b;
}

Foo foo;
int *ptr = &foo.b; // not pointing at aligned integer

I think we should identify that tagged* does not support such pointers, and probably the ctor should assert this situation isn't occurring.

-Steve