June 06, 2012
An interesting situation, the current compiler happily will compile pure functions that accept shared data.

I believed when we relaxed purity rules, shared data should be taboo for pure functions, even weak-pure ones.  Note that at least at the time, Don agreed with me: http://forum.dlang.org/post/i7d60m$2smf$1@digitalmars.com

Now, technically, there's nothing really *horrible* about this, I mean you can't really have truly shared data inside a strong-pure function.  Any data that's marked as 'shared' will not be shared because a strong-pure function cannot receive any shared data.

So if you then were to call a weak-pure function that had shared parameters from a strong-pure function, you simply would be wasting cycles locking or using a memory-barrier on data that is not truly shared.  I don't really see a compelling reason to have weak-pure functions accept shared data explicitly.

*Except* that template functions which use IFTI have good reason to be able to be marked pure.

For example:

void inc(T)(ref T i) pure
{
   ++i;
}

Now, we have a template function that we know only will affect i, and the compiler enforces that.

But what happens here?

shared int x;

void main()
{
   x.inc();
}

here, T == shared int.

One solution (if shared isn't allowed on pure functions) is, don't mark inc pure, let it be inferred.  But then we are losing the contract to have the compiler help us enforce purity.

I'll also point out that inc isn't a valid function for data that is actually shared: ++i is not atomic.  So disallowing shared actually helps us in this regard, by refusing to compile a function that would be dangerous when used on shared data.

The compiler *currently* however, will simply compile this just fine.

I'm strongly leaning towards this being a bug, and needs to be fixed in the compiler.

Some background of why this got brought up: https://github.com/D-Programming-Language/druntime/pull/147

Opinions?

-Steve
June 06, 2012
On 06-06-2012 23:39, Steven Schveighoffer wrote:
> An interesting situation, the current compiler happily will compile pure
> functions that accept shared data.
>
> I believed when we relaxed purity rules, shared data should be taboo for
> pure functions, even weak-pure ones. Note that at least at the time, Don
> agreed with me: http://forum.dlang.org/post/i7d60m$2smf$1@digitalmars.com
>
> Now, technically, there's nothing really *horrible* about this, I mean
> you can't really have truly shared data inside a strong-pure function.
> Any data that's marked as 'shared' will not be shared because a
> strong-pure function cannot receive any shared data.
>
> So if you then were to call a weak-pure function that had shared
> parameters from a strong-pure function, you simply would be wasting
> cycles locking or using a memory-barrier on data that is not truly
> shared. I don't really see a compelling reason to have weak-pure
> functions accept shared data explicitly.
>
> *Except* that template functions which use IFTI have good reason to be
> able to be marked pure.
>
> For example:
>
> void inc(T)(ref T i) pure
> {
> ++i;
> }
>
> Now, we have a template function that we know only will affect i, and
> the compiler enforces that.
>
> But what happens here?
>
> shared int x;
>
> void main()
> {
> x.inc();
> }
>
> here, T == shared int.
>
> One solution (if shared isn't allowed on pure functions) is, don't mark
> inc pure, let it be inferred. But then we are losing the contract to
> have the compiler help us enforce purity.
>
> I'll also point out that inc isn't a valid function for data that is
> actually shared: ++i is not atomic. So disallowing shared actually helps
> us in this regard, by refusing to compile a function that would be
> dangerous when used on shared data.

Man, shared is such a mess.

(I'm going to slightly hijack a branch of your thread because I think we need to address the below concerns before we can make this decision properly.)

We need to be crystal clear on what we're talking about here. Usually, people refer to shared as being supposed to insert memory barriers. Others call operations on shared data atomic.

(And of course, neither is actually implemented in any compiler, and I doubt they ever will be.)

A memory barrier is what the x86 sfence, lfence, and mfence instructions represent. They simply make various useful guarantees about ordering of loads and stores. Nothing else.

Atomic operations are what the lock prefix is used for, for example the lock add operation, lock cmpxchg, etc. These operate on the most recent value at whatever memory location is being operated on, i.e. caches are circumvented.

Memory barriers and atomic operations are not the same thing, and we should avoid conflating them. Yes, they can be used together to write low-level, lock-free data structures, but the use of one does not include the other automatically.

(At this point, I probably don't need to point out how x86-biased and unportable shared is.....)

So, my question to the community is: What should shared *really* mean?

I don't think that having shared imply memory barriers is going to be terribly useful to anyone. In fact, I don't know how the compiler would even determine where to efficiently insert memory barriers. And *actually*, I think memory barriers is really not what people mean at *all* when they refer to shared's effect on code generation. I think what people *really* want is atomic operations.

Steven, in your particular case, I don't agree entirely. The operation can be atomic quite trivially by implementing inc() like so (for the shared int case):

void inc(ref shared int i) pure nothrow
{
    // just pretend the compiler emitted this
    asm
    {
        mov EDX, i;
        lock;
        inc [EDX];
    }
}

But I may be misunderstanding you. Of course, it gets a little more complex if you use the result of the ++ operation afterwards, but it's still not impossible to do atomically. What can *not* be done is doing the increment and loading the result in one purely atomic instruction (and I suspect this is what you may have been referring to). It's worth pointing out that most atomic operations can be implemented with a spin lock (which is exactly what core.atomic does for most binary operations), so while it cannot be done with an x86 instruction, it can be achieved through such a mechanism, and most real world atomic APIs do this (see InterlockedIncrement in Windows for example).

Further, if shared is going to be useful at all, stuff like this *has* to be atomic, IMO.

I'm still of the opinion that bringing atomicity and memory barriers into the type system is a horrible can of worms that we should never have opened, but now shared is there and we need to make up our minds already.

>
> The compiler *currently* however, will simply compile this just fine.
>
> I'm strongly leaning towards this being a bug, and needs to be fixed in
> the compiler.
>
> Some background of why this got brought up:
> https://github.com/D-Programming-Language/druntime/pull/147
>
> Opinions?
>
> -Steve

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 07, 2012
On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
> (And of course, neither is actually implemented in any compiler, and I
> doubt they ever will be.)

Why do you doubt shared semantics will be implemented?

Andrei

June 07, 2012
On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
> (At this point, I probably don't need to point out how x86-biased and
> unportable shared is.....)

I confess I'll need that spelled out. How is shared biased towards x86 and nonportable?

Thanks,

Andrei
June 07, 2012
On 07-06-2012 03:11, Andrei Alexandrescu wrote:
> On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
>> (At this point, I probably don't need to point out how x86-biased and
>> unportable shared is.....)
>
> I confess I'll need that spelled out. How is shared biased towards x86
> and nonportable?
>
> Thanks,
>
> Andrei

The issue lies in its assumption that the architecture being targeted supports atomic operations and/or memory barriers at all. Some architectures plain don't support these, others do, but for certain data sizes like 64-bit ints, they don't, etc. x86 is probably the architecture that has the best support for low-level memory control as far as atomicity and memory barriers go.

The problem is that shared is supposed to guarantee that operations on shared data *always* obeys whatever atomicity/memory barrier rules we end up defining for it (obviously we don't want generated code to have different semantics across architectures due to subtle issues like the lack of certain operations in the ISA). Right now, based on what I've read in the NG and on mailing lists, people seem to assume that shared will provide full-blown x86-level atomicity and/or memory barriers. Providing these features on e.g. ARM is a pipe dream at best (for instance, ARM has no atomic load for 64-bit values).

All this being said, shared could probably be implemented with plain old locks on these architectures if correctness is the only goal. But, from a more pragmatic point of view, this would completely butcher performance and adds potential for deadlocks, and all other issues associated with thread synchronization in general. We really shouldn't have such a core feature of the language fall back to a dirty hack like this on low-end/embedded architectures (where performance of this kind of stuff is absolutely critical), IMO.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 07, 2012
On 07-06-2012 03:07, Andrei Alexandrescu wrote:
> On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
>> (And of course, neither is actually implemented in any compiler, and I
>> doubt they ever will be.)
>
> Why do you doubt shared semantics will be implemented?
>
> Andrei
>

I think there are two fundamental issues making implementation difficult and unlikely to happen:

1) The x86 bias (I replied to your other post wrt this).
2) The overall complexity of generating correct code for shared.

If we ignore the portability issues that I pointed out in my other reply, point (1) is irrelevant. I'm fairly certain the shared semantics that people expect can be implemented just fine at ISA level on x86 without dirty hacks like locks. But if we do care about portability (which we ***really*** should - ARM and PowerPC, for example, are becoming increasingly important!), then we need to reconsider shared very carefully.

The thing about (2) is that literally every operation on shared data has to be special-cased in the compiler. This adds a crazy amount of complexity, since there are basically two code paths for every single part of the code generation phase: the unshared path and the shared path. This is mostly caused by the fact that shared is transitive and can be applied to virtually any type. But even if we ignore that complexity, we have the problem of certain operations that cannot be atomic (even though they may look like it):

* Would you expect an array indexing operation (where the array slice is shared) to index the array atomically? Would you expect the read of the value at the calculated memory location to be atomic?
* Would you expect a slicing operation to be atomic? (Slicing something involves reading two words of memory which cannot be done atomically even on x86.)
* Would you expect 'in' to be atomic? (It can only really, kinda-sorta be if you use locks inside the AA implementation...)

etc.

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
June 07, 2012
Funny, I always imagined 'shared' was just a type constructor like 'const'... meant to help you check the correctness of your code.

Not that it would actually /do/ something for you at runtime...
June 07, 2012
On 6/6/12 8:19 PM, Alex Rønne Petersen wrote:
> On 07-06-2012 03:11, Andrei Alexandrescu wrote:
>> On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
>>> (At this point, I probably don't need to point out how x86-biased and
>>> unportable shared is.....)
>>
>> I confess I'll need that spelled out. How is shared biased towards x86
>> and nonportable?
>>
>> Thanks,
>>
>> Andrei
>
> The issue lies in its assumption that the architecture being targeted
> supports atomic operations and/or memory barriers at all. Some
> architectures plain don't support these, others do, but for certain data
> sizes like 64-bit ints, they don't, etc. x86 is probably the
> architecture that has the best support for low-level memory control as
> far as atomicity and memory barriers go.

Actually x86 is one of the more forgiving architectures (most code works even when written without barriers). Indeed we assume the target architecture supports double-word atomic load.

> The problem is that shared is supposed to guarantee that operations on
> shared data *always* obeys whatever atomicity/memory barrier rules we
> end up defining for it (obviously we don't want generated code to have
> different semantics across architectures due to subtle issues like the
> lack of certain operations in the ISA). Right now, based on what I've
> read in the NG and on mailing lists, people seem to assume that shared
> will provide full-blown x86-level atomicity and/or memory barriers.
> Providing these features on e.g. ARM is a pipe dream at best (for
> instance, ARM has no atomic load for 64-bit values).

http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html mentions that there is a way to implement atomic load for 64-bit values.

> All this being said, shared could probably be implemented with plain old
> locks on these architectures if correctness is the only goal. But, from
> a more pragmatic point of view, this would completely butcher
> performance and adds potential for deadlocks, and all other issues
> associated with thread synchronization in general. We really shouldn't
> have such a core feature of the language fall back to a dirty hack like
> this on low-end/embedded architectures (where performance of this kind
> of stuff is absolutely critical), IMO.

That's how C++'s atomic<T> does things, by the way. But I sympathize with your viewpoint that there should be no hidden locks. We could define shared to refuse compilation on odd machines, and THEN provide an atomic template with the expected performance of a lock.


Andrei

June 07, 2012
On 6/6/12 8:32 PM, Alex Rønne Petersen wrote:
> On 07-06-2012 03:07, Andrei Alexandrescu wrote:
>> On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
>>> (And of course, neither is actually implemented in any compiler, and I
>>> doubt they ever will be.)
>>
>> Why do you doubt shared semantics will be implemented?
>>
>> Andrei
>>
>
> I think there are two fundamental issues making implementation difficult
> and unlikely to happen:
>
> 1) The x86 bias (I replied to your other post wrt this).
> 2) The overall complexity of generating correct code for shared.
>
> If we ignore the portability issues that I pointed out in my other
> reply, point (1) is irrelevant. I'm fairly certain the shared semantics
> that people expect can be implemented just fine at ISA level on x86
> without dirty hacks like locks. But if we do care about portability
> (which we ***really*** should - ARM and PowerPC, for example, are
> becoming increasingly important!), then we need to reconsider shared
> very carefully.

Agreed. I don't think (1) is irrelevant but I don't see it an impossible obstacle.

> The thing about (2) is that literally every operation on shared data has
> to be special-cased in the compiler.

Yes, like for volatile (no code motion etc) plus barriers.

> This adds a crazy amount of
> complexity, since there are basically two code paths for every single
> part of the code generation phase: the unshared path and the shared
> path. This is mostly caused by the fact that shared is transitive and
> can be applied to virtually any type. But even if we ignore that
> complexity, we have the problem of certain operations that cannot be
> atomic (even though they may look like it):
>
> * Would you expect an array indexing operation (where the array slice is
> shared) to index the array atomically? Would you expect the read of the
> value at the calculated memory location to be atomic?
> * Would you expect a slicing operation to be atomic? (Slicing something
> involves reading two words of memory which cannot be done atomically
> even on x86.)
> * Would you expect 'in' to be atomic? (It can only really, kinda-sorta
> be if you use locks inside the AA implementation...)

Operations with shared data are intentionally very limited. Indexing, slicing, and "in" should not compile for shared arrays and associative arrays.


Andrei


June 07, 2012
On 07-06-2012 04:00, Andrei Alexandrescu wrote:
> On 6/6/12 8:32 PM, Alex Rønne Petersen wrote:
>> On 07-06-2012 03:07, Andrei Alexandrescu wrote:
>>> On 6/6/12 6:01 PM, Alex Rønne Petersen wrote:
>>>> (And of course, neither is actually implemented in any compiler, and I
>>>> doubt they ever will be.)
>>>
>>> Why do you doubt shared semantics will be implemented?
>>>
>>> Andrei
>>>
>>
>> I think there are two fundamental issues making implementation difficult
>> and unlikely to happen:
>>
>> 1) The x86 bias (I replied to your other post wrt this).
>> 2) The overall complexity of generating correct code for shared.
>>
>> If we ignore the portability issues that I pointed out in my other
>> reply, point (1) is irrelevant. I'm fairly certain the shared semantics
>> that people expect can be implemented just fine at ISA level on x86
>> without dirty hacks like locks. But if we do care about portability
>> (which we ***really*** should - ARM and PowerPC, for example, are
>> becoming increasingly important!), then we need to reconsider shared
>> very carefully.
>
> Agreed. I don't think (1) is irrelevant but I don't see it an impossible
> obstacle.
>
>> The thing about (2) is that literally every operation on shared data has
>> to be special-cased in the compiler.
>
> Yes, like for volatile (no code motion etc) plus barriers.
>
>> This adds a crazy amount of
>> complexity, since there are basically two code paths for every single
>> part of the code generation phase: the unshared path and the shared
>> path. This is mostly caused by the fact that shared is transitive and
>> can be applied to virtually any type. But even if we ignore that
>> complexity, we have the problem of certain operations that cannot be
>> atomic (even though they may look like it):
>>
>> * Would you expect an array indexing operation (where the array slice is
>> shared) to index the array atomically? Would you expect the read of the
>> value at the calculated memory location to be atomic?
>> * Would you expect a slicing operation to be atomic? (Slicing something
>> involves reading two words of memory which cannot be done atomically
>> even on x86.)
>> * Would you expect 'in' to be atomic? (It can only really, kinda-sorta
>> be if you use locks inside the AA implementation...)
>
> Operations with shared data are intentionally very limited. Indexing,
> slicing, and "in" should not compile for shared arrays and associative
> arrays.

Are you suggesting that we define shared such that any 'basic' operation that can't be atomic on the target architecture is disallowed? Or that we figure out the lowest common denominator and go with that?

>
>
> Andrei
>
>

-- 
Alex Rønne Petersen
alex@lycus.org
http://lycus.org
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home