December 02, 2015
On 12/2/15 7:22 PM, Vladimir Panteleev wrote:
> On Thursday, 3 December 2015 at 00:08:17 UTC, Vladimir Panteleev wrote:
>> On Wednesday, 2 December 2015 at 19:59:14 UTC, Andrei Alexandrescu wrote:
>>> On 12/02/2015 02:54 PM, Vladimir Panteleev wrote:
>>>> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu
>>>> wrote:
>>>>> [...]
>>>>
>>>> Warning, this is very unsafe and incompatible with the GC.
>>>> Bit-twiddling
>>>> GC pointers can lead to memory corruption and very hard-to-track bugs.
>>>> Such a feature must be opt-in in a very explicit way.
>>>
>>> Well it's @system. -- Andrei
>>
>> Considering that @system is the default, I really don't think that's
>> enough.
>
> That just reminded me to file this:
>
> https://issues.dlang.org/show_bug.cgi?id=15399

Nice. What I'd say is that at the end of the day there's documentation. -- Andrei
December 03, 2015
On Wednesday, 2 December 2015 at 19:54:26 UTC, Vladimir Panteleev wrote:
> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu wrote:
>> Once done, this is a fantastic example of (a) the power of generative programming, and (b) the advantages of using library facilities instead of built-in features.
>>
>> https://issues.dlang.org/show_bug.cgi?id=15397
>>
>> Who would want to take it?
>
> Warning, this is very unsafe and incompatible with the GC. Bit-twiddling GC pointers can lead to memory corruption and very hard-to-track bugs. Such a feature must be opt-in in a very explicit way.

Iirc I'm the one that originally brought this up. There's no reason for lsb smuggling in pointers to be unsafe

I personally think tricks like these are important in advertising D as a systems language, as I'm often missing some low level features compared to GNU C.

Bye.
December 03, 2015
On Thursday, 3 December 2015 at 01:31:05 UTC, rsw0x wrote:
> On Wednesday, 2 December 2015 at 19:54:26 UTC, Vladimir Panteleev wrote:
>> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu wrote:
>>> Once done, this is a fantastic example of (a) the power of generative programming, and (b) the advantages of using library facilities instead of built-in features.
>>>
>>> https://issues.dlang.org/show_bug.cgi?id=15397
>>>
>>> Who would want to take it?
>>
>> Warning, this is very unsafe and incompatible with the GC. Bit-twiddling GC pointers can lead to memory corruption and very hard-to-track bugs. Such a feature must be opt-in in a very explicit way.
>
> Iirc I'm the one that originally brought this up. There's no reason for lsb smuggling in pointers to be unsafe

True, assuming that:

1. The pointers are still aligned at machine word boundaries
2. The underlying storage type is one that the GC will scan for pointers (e.g. void or void*, not size_t/ubyte)
3. The setters enforce that the discarded pointer bits were zero
4. No more than 4 bits are reused (as the smallest GC object size is 16 bytes)

Point 4 actually relies on the GC's current implementation, which could be an issue (it ties what is allowed to compile in code using the standard library with an implementation detail).

December 03, 2015
On Thursday, 3 December 2015 at 00:31:46 UTC, Andrei Alexandrescu wrote:
> Nice. What I'd say is that at the end of the day there's documentation. -- Andrei

Just to provide a bit of perspective...

Although memory corruption may not seem so scary in short-lived programs where it can be trivially reproduced with the same input, it can be an absolute nightmare when it occurs in long-running server processes, for developers, sysadmins and end-users alike.

A long time ago, a network service I wrote started having severe memory corruption issues. It went from crashing about once a day to once every few hours, and every time it crashed it pissed off a dozen users or more. Great ire and vitriol[1] was expressed towards the service, and everything I tried only seemed to make the situation worse.

It took months of studying and playing with D GC internals (incl. unsucessful attempts to use the GC's own debugging code and additional debugging GC proxies - see Diamond), and finally after three nights of replay debugging a virtual machine which recorded a crash on my home PC, and trying to infer meaning from memory dumps of GC control structures, I've tracked down the bug. The final result was the addition of InvalidMemoryOperationError to Druntime.

So, this is why I am on a war campaign against anything that might result in memory corruption in D. Another recent example was the controversy over readln (yes, std.stdio.File.readln, one of the basic operations in any programming language) corrupting memory: sure, this patch will make D programs no longer crash and burn in weird ways, but it will also make readln slower! Think of the benchmarks! Thankfully a solution was found which was both safe and acceptably fast.

You said that "at the end of the day there's documentation". I would argue that at least in this case, it may not be enough. Consider, for example, a hypothetical user type "Pack", which takes a tuple/struct and automatically arranges the fields into a struct such that space is used optimally (e.g. all bools are clumped together into a bitfield, enums are only given as much bits as their .max needs, etc.). Pack only needs to know one property of each field: how many bits it really needs, and as such, it might elect to be agnostic of what a pointer is. If it uses std.bitmanip.bitfields as its backend, it will happily pack a pointer at its user's request, and the user will never see the pointer warning in Pack's documentation. And yes, although it's easy to point the finger at users and say "ha ha, it's your own fault, you did not RTFM", I think we should strive for better than that.

I was recently close to having a repeat of the memory corruption situation (with the same service, too) due to the struct alignment issue. Luckily there was only one week of strife before I narrowed down the bug. Memory corruption may only manifest in certain conditions (e.g. release mode, after updating/switching compilers), and rollback/bisect are not always viable or useful options. Even if the issue I filed had been fixed at the time, it would not have helped in that particular case, since, well, D is unsafe by default, and it is situations such as these that make me occasionally glance in Rust's direction.

[1]: http://dump.thecybershadow.net/41a97cdab29f9cd56340ce8a5163d6f8/rage.txt

December 03, 2015
On Wednesday, 2 December 2015 at 23:38:33 UTC, deadalnix wrote:
> On Wednesday, 2 December 2015 at 23:04:16 UTC, ZombineDev wrote:
>> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu wrote:
>>> Once done, this is a fantastic example of (a) the power of generative programming, and (b) the advantages of using library facilities instead of built-in features.
>>>
>>> https://issues.dlang.org/show_bug.cgi?id=15397
>>>
>>> Who would want to take it?
>>>
>>>
>>> Andrei
>>
>> So, something like http://dlang.org/phobos/std_bitmanip.html#.taggedPointer?
>
> Yeah, that'd be great if we could remove these scary warning about the GC on these, this is only FUD. It works just fine with the GC.

With the current GC, yes. If we allow this, any future GC implementation will have to expect pointers to be misaligned. If a GC is type aware, it can use this information to reject false pointers without having to look them up. Anyway, I guess that will not affect performance much, so it's probably ok.
December 03, 2015
On 12/2/15 6:51 PM, Andrei Alexandrescu wrote:
> On 12/02/2015 06:04 PM, ZombineDev wrote:
>> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu wrote:
>>> Once done, this is a fantastic example of (a) the power of generative
>>> programming, and (b) the advantages of using library facilities
>>> instead of built-in features.
>>>
>>> https://issues.dlang.org/show_bug.cgi?id=15397
>>>
>>> Who would want to take it?
>>>
>>>
>>> Andrei
>>
>> So, something like
>> http://dlang.org/phobos/std_bitmanip.html#.taggedPointer?
>
> Sigh, yes. Both taggedPointer and taggedClassRef should be features of
> bitfields, not distinct names. One good thing to do would be to
> integrate those within bitfields, and then later perhaps undocumented.

taggedPointer and taggedClassRef are GC safe (despite the incorrect warning listed in the docs). Your proposed mechanism is not.

IMO, we should keep those and close your enhancement, it doesn't add anything useful. Seems to me something that can break very easily.

Phobos should in no way support such egregious casts implicitly. Even in @system code.

Do you have any rationale to prefer arbitrary bitfield pointers over GC safe ones?

-Steve
December 03, 2015
On 12/3/15 7:20 AM, Marc Schütz wrote:
> On Wednesday, 2 December 2015 at 23:38:33 UTC, deadalnix wrote:
>> On Wednesday, 2 December 2015 at 23:04:16 UTC, ZombineDev wrote:
>>> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu
>>> wrote:
>>>> Once done, this is a fantastic example of (a) the power of
>>>> generative programming, and (b) the advantages of using library
>>>> facilities instead of built-in features.
>>>>
>>>> https://issues.dlang.org/show_bug.cgi?id=15397
>>>>
>>>> Who would want to take it?
>>>>
>>>>
>>>> Andrei
>>>
>>> So, something like
>>> http://dlang.org/phobos/std_bitmanip.html#.taggedPointer?
>>
>> Yeah, that'd be great if we could remove these scary warning about the
>> GC on these, this is only FUD. It works just fine with the GC.
>
> With the current GC, yes. If we allow this, any future GC implementation
> will have to expect pointers to be misaligned. If a GC is type aware, it
> can use this information to reject false pointers without having to look
> them up. Anyway, I guess that will not affect performance much, so it's
> probably ok.

First I will say, there is confusion on what is valid and what is not. Misaligned pointers are pointers that are stored misaligned. In other words, they are stored not on a 4-byte or 8-byte boundary for 32 bits or 64 bits arch respectively.

An interior pointer is a pointer that is *properly aligned* but does not point at the first byte of a piece of memory. taggedPointer and taggedClassRef create *interior pointers*, not *misaligned pointers*. Andrei's proposal will create *misaligned pointers*. There is a huge difference.

I can make an interior pointer without casts on any type:

SomeType *pointer = ...;
void[] p = pointer[0..1];
p = p[1..$];

If the GC does not support this being the only pointer to a memory location, then the GC is not suitable for D. Period. Code will break in subtle ways if you use such a GC.

I can't see how a language with void* and/or unions could allow such a GC.

-Steve
December 03, 2015
On 12/03/2015 06:13 AM, Vladimir Panteleev wrote:
> You said that "at the end of the day there's documentation". I would
> argue that at least in this case, it may not be enough. Consider, for
> example, a hypothetical user type "Pack", which takes a tuple/struct and
> automatically arranges the fields into a struct such that space is used
> optimally (e.g. all bools are clumped together into a bitfield, enums
> are only given as much bits as their .max needs, etc.). Pack only needs
> to know one property of each field: how many bits it really needs, and
> as such, it might elect to be agnostic of what a pointer is. If it uses
> std.bitmanip.bitfields as its backend, it will happily pack a pointer at
> its user's request, and the user will never see the pointer warning in
> Pack's documentation. And yes, although it's easy to point the finger at
> users and say "ha ha, it's your own fault, you did not RTFM", I think we
> should strive for better than that.

I understand how you feel but I really don't know what else to do, which makes the entire discussion somewhat theoretical. At the end of the day Pack must document its own behavior and its users should have some notion of its characteristics. If Pack wishes to disallow pointers that can be easily done with static introspection. If it doesn't have enough information, it's poorly designed - it can't just take any bits and shove them in any way. -- Andrei
December 03, 2015
On Thursday, 3 December 2015 at 13:02:24 UTC, Steven Schveighoffer wrote:
> First I will say, there is confusion on what is valid and what is not. Misaligned pointers are pointers that are stored misaligned. In other words, they are stored not on a 4-byte or 8-byte boundary for 32 bits or 64 bits arch respectively.
>
> An interior pointer is a pointer that is *properly aligned* but does not point at the first byte of a piece of memory. taggedPointer and taggedClassRef create *interior pointers*, not *misaligned pointers*. Andrei's proposal will create *misaligned pointers*. There is a huge difference.
>
> I can make an interior pointer without casts on any type:
>
> SomeType *pointer = ...;
> void[] p = pointer[0..1];
> p = p[1..$];
>
> If the GC does not support this being the only pointer to a memory location, then the GC is not suitable for D. Period. Code will break in subtle ways if you use such a GC.
>
> I can't see how a language with void* and/or unions could allow such a GC.

Indeed, I was talking about interior pointers. But you're right, I missed the fact that void pointers (and some others) can be valid interior pointers even to unstructured values. So the optimization I had in mind is not applicable in D, anyway.

We should then just adjust the specification to specifically allow changing the LSBs.
December 03, 2015
On 12/03/2015 07:45 AM, Steven Schveighoffer wrote:
> On 12/2/15 6:51 PM, Andrei Alexandrescu wrote:
>> On 12/02/2015 06:04 PM, ZombineDev wrote:
>>> On Wednesday, 2 December 2015 at 19:39:47 UTC, Andrei Alexandrescu
>>> wrote:
>>>> Once done, this is a fantastic example of (a) the power of generative
>>>> programming, and (b) the advantages of using library facilities
>>>> instead of built-in features.
>>>>
>>>> https://issues.dlang.org/show_bug.cgi?id=15397
>>>>
>>>> Who would want to take it?
>>>>
>>>>
>>>> Andrei
>>>
>>> So, something like
>>> http://dlang.org/phobos/std_bitmanip.html#.taggedPointer?
>>
>> Sigh, yes. Both taggedPointer and taggedClassRef should be features of
>> bitfields, not distinct names. One good thing to do would be to
>> integrate those within bitfields, and then later perhaps undocumented.
>
> taggedPointer and taggedClassRef are GC safe (despite the incorrect
> warning listed in the docs). Your proposed mechanism is not.

It can be restricted to support what tagged* do.

> IMO, we should keep those and close your enhancement, it doesn't add
> anything useful. Seems to me something that can break very easily.

Please leave it open, thanks.

> Phobos should in no way support such egregious casts implicitly. Even in
> @system code.
>
> Do you have any rationale to prefer arbitrary bitfield pointers over GC
> safe ones?

1. The less restricted version offers use of high-order bits as well. If we don't support that, those who need it will do that in client code with the usual liabilities.

2. There's no reason for taggedPointer and taggedClassRef to exist. They should be integrated within bitfields.


Andrei