On 7 November 2013 18:05, Johannes Pfau <nospam@example.com> wrote:
Am Thu, 7 Nov 2013 16:14:59 +0000
schrieb Iain Buclaw <ibuclaw@ubuntu.com>:

> On 3 November 2013 10:20, Johannes Pfau <nospam@example.com> wrote:
>
> > Am Sun, 3 Nov 2013 02:10:20 +0000
> > schrieb Iain Buclaw <ibuclaw@ubuntu.com>:
> >
> > > last time I
> > > checked, returning 0 disables aliasing rules from taking effect.
> >
> > That should work. Alias set 0 is a special alias set which conflicts
> > with everything. I'll check if it works as expected.
> >
>
>
> This is taken from hunks in the 2.064 merge I'm testing:
>
> Pastebin link here:  http://pastebin.com/jxQQL68N
>

Some probably stupid questions about this patch:

// If the type is a dynamic array, use the alias set of the basetype.

What exactly does happen in that case? The Tarray type is the
two-field type consisting of length and ptr, right? Currently
TypeDArray->toCtype constructs a two_field_type with size_t and
typeof(Element)*. So according to the C aliasing rules, the TypeDArray
alias set does already conflict with size_t and Element*. It does not
conflict with Element. But I don't know why it should conflict with
Element if we're talking about the slice type here. It would
allow code like this to work: "char[] a; char* b = (cast(char*)&a)" but
I don't see why this should work, it's illegal anyway?


That would be seen as two distinct alias sets that would break strict aliasing in that example.

Though, will have to implement -Wstrict-aliasing in the front-end to get any feel for what could potentially be utterly wrong.  But the idea is that for dynamic arrays, telling gcc to not rely on structural equality to determine whether or not two dynamic arrays are part of the same alias set.

eg:
byte[] a, long[] b = *(cast (long[]*)&a) should be seen as being different alias sets, and so *will* be about breaking strict aliasing.

In contrast, string[] a, char[] b = *cast(string[]*)&a) should be seen as being part of the same alias set, and so the compiler must know that the two types (which are distinct structures to the backend) could potentially be referencing the same slice of memory, as to not cause any problems.

For people trying to work around the cast system for dynamic arrays, IMO they should be punished for it, and told to do it in the correct way that invokes _d_arraycopy, or do their unsafe work through unions.

 
Also, AFAICS it does not help with the problem in std.algorithm:
char[] a;
//cast(ubyte[])a generates:
*cast(ubyte[]*)&a;

Do you think this cast should be illegal in D?
I think if we want to support strict aliasing for the code above we'll
have to do what gcc does for pointers:
http://code.metager.de/source/xref/gnu/gcc/gcc/alias.c#819
Put all array slices - regardless of element type - into the same alias
set and make size_t and void* subsets of this alias set.


// Permit type-punning when accessing a union

Isn't that already guaranteed by GCC? See:
http://code.metager.de/source/xref/gnu/gcc/gcc/alias.c#982
Unions have all their member types added as subsets. So as long as the
reference is through the union GCC knows the union type and it'll
conflict with all member types.


There is no harm enforcing it in the front-end as well, even if it is just there to speed up the process of returning what the backend will no doubt return too.  There's also the (extremely) unlikely event that the guarantee by GCC might be removed in a later version.
 

But even if we make those changes to aliasing rules, we'll have to fix
many places in phobos. For example:
https://github.com/D-Programming-Language/phobos/blob/master/std/math.d#L1965
real value;
ushort* vu = cast(ushort*)&value;
AFAICS this will always be invalid with strict aliasing.


Yep, as it should be.  std.math is a danger point for type punning between pointers and reals, ensuring that type-punning/casting does not get DCE'd, etc...  This needs to be fixed.


https://github.com/D-Programming-Language/phobos/blob/master/std/uuid.d#L468
casts ubyte[16]* to size_t* also illegal, AFAICS.

Are there any statistics about the performance improvements with strict
aliasing? I'm not really sold on the idea of strict aliasing, right now
it looks to me as if it's mainly a way to introduce subtle, hard to
debug and often latent bugs (As whether you really see a problem
depends on optimization)

http://stackoverflow.com/questions/1225741/performance-impact-of-fno-strict-aliasing

Not that I'm aware of (other than Ada boasting it's use).  But I'd like to push the opinion of - although it isn't in the spec, D should be strict aliasing.  And people should be aware of the problems breaking strict aliasing (see: http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html )

But first... plan of attack:

- We first disable strict aliasing entirely (lang_hook.get_alias_set => 0)
- Implement -Wstrict-aliasing
- Start turning on strict aliasing for types guaranteed not to be referencing the same memory location as other types (eg: TypeBasic, TypeDelegate, TypeSArray, TypeVector).
- Identify gdc problems with implicit code generation that could break strict aliasing (these are our bugs).
- Identify frontend/library problems that could break strict aliasing (these are the dmd/phobos developer's bugs).
- Turn on strict aliasing for the remaining types.  For those that cause problems, we can define a TYPE_LANG_FLAG macro to allow us to tell the backend if the type can alias any other types.


I still stand by what I say on aliasing rules of D:
- Permit type-punning when accessing through a union
- Dynamic arrays of the same basetype (regardless of qualifiers) may alias each other/occupy the same slice of memory.

Other possible considerations:

- Most D code pretty much assumes that any object may be accessed via a void[] or void*.

- C standard allows aliasing between signed and unsigned variants.  It is therefore likely not unreasonable to do the same for convenience.

- Infact, for the consideration of std.math.  It we could go one step further and simply build up an alias set list based on the type size over type distinction.  In this model double/long/byte[8]/short[4]/int[2] would all be considered as types that could be referencing each other.

--
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';