October 24, 2013
On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
> On 24 October 2013 08:18, Mike <none@none.com> wrote:
>> On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
>>>
>>> On 24 October 2013 06:37, Walter Bright <newshound2@digitalmars.com>
>>> wrote:
>>>>
>>>> On 10/23/2013 5:43 PM, Mike wrote:
> 'shared' guarantees that all reads and writes specified in source code
> happen in the exact order specified with no omissions, as there may be
> other threads reading/writing to the variable at the same time.

All that's missing is a guarantee that the reading/writing actually occur at the intended address and not in some compiler cache.

October 24, 2013
"Mike" <none@none.com> wrote in message news:bifrvifzrhgocrejepvc@forum.dlang.org...
> I've read a few discussions on the D forums about the volatile keyword debate, but noone seemed to reconcile the need for volatile in memory-mapped IO.  Was this an oversight?
>
> What's D's answer to this?  If one were to use D to read from memory-mapped IO, how would one ensure the compiler doesn't cache the value?

There are a few options:

1. Use shared in place of volatile.  I'm not sure this actually works, but otherwise this is pretty good.

2. Use the deprecated volatile statement.  D got it right that volatile access is a property of the load/store and not the variable, but missed the point that it's a huge pain to have to remember volatile at use.  Could be made better with a wrapper.  I think this still works.

3. Use inline assembly.  This sucks.

4. Defeat the optimizer with inline assembly.

asm { nop; } // Haha, gotcha
*my_hardware_register = 999;
asm { nop; }

This might be harder with gdc/ldc than it is with dmd, but I'm pretty sure there's a way to trick it into thinking an asm block could clobber/read arbitrary memory.

5. Lobby for/implement some nice new volatile_read and volatile_write intrinsics.

Old discussion: http://www.digitalmars.com/d/archives/digitalmars/D/volatile_variables_in_D...._51984.html


October 24, 2013
On 24 October 2013 12:10, John Colvin <john.loughran.colvin@gmail.com> wrote:
> On Thursday, 24 October 2013 at 09:43:51 UTC, Iain Buclaw wrote:
>>>>
>>>> 'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions
>
>
>> If you require memory barriers to access share data, that is what 'synchronized' and core.atomic is for.  There is *no* implicit locks occurring when accessing the data.
>
>
> If there are no memory barriers, then there is no guarantee* of ordering of reads or writes. Sure, the compiler can promise not to rearrange them, but the CPU is a different matter.
>
> *dependant on CPU architecture of course. e.g. IIRC the intel atom never reorders anything.

I was talking about the compiler, not CPU.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
October 24, 2013
On 24 October 2013 12:22, eles <eles@eles.com> wrote:
> On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
>>
>> On 24 October 2013 08:18, Mike <none@none.com> wrote:
>>>
>>> On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
>>>>
>>>>
>>>> On 24 October 2013 06:37, Walter Bright <newshound2@digitalmars.com> wrote:
>>>>>
>>>>>
>>>>> On 10/23/2013 5:43 PM, Mike wrote:
>>
>> 'shared' guarantees that all reads and writes specified in source code happen in the exact order specified with no omissions, as there may be other threads reading/writing to the variable at the same time.
>
>
> All that's missing is a guarantee that the reading/writing actually occur at the intended address and not in some compiler cache.
>

The compiler does not cache shared data (at least in GDC).


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
October 24, 2013
On Thursday, 24 October 2013 at 13:05:58 UTC, Iain Buclaw wrote:
> On 24 October 2013 12:22, eles <eles@eles.com> wrote:
>> On Thursday, 24 October 2013 at 08:20:43 UTC, Iain Buclaw wrote:
>>>
>>> On 24 October 2013 08:18, Mike <none@none.com> wrote:
>>>>
>>>> On Thursday, 24 October 2013 at 06:37:08 UTC, Iain Buclaw wrote:
>>>>>
>>>>>
>>>>> On 24 October 2013 06:37, Walter Bright <newshound2@digitalmars.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 10/23/2013 5:43 PM, Mike wrote:

> The compiler does not cache shared data (at least in GDC).

Well, that should not be a matter of implementation, but of language standard.

Besides not caching, still MIA is the fact that these read/write operations should occur when asked, not later (orderly execution means almost nothing if all those operations are executed by the compiler at some time later, eventually not taking into account sleep()s between operations - sometimes the hardware needs, let's say, 500ms to guarantee a register is filled with a meaning value - and so on.

So it is about the correct memory location, the immediateness of those operations (this will also ensure orderly execution) and about the uncaching.
October 24, 2013
On 24 October 2013 12:50, Daniel Murphy <yebblies@nospamgmail.com> wrote:
> "Mike" <none@none.com> wrote in message news:bifrvifzrhgocrejepvc@forum.dlang.org...
>> I've read a few discussions on the D forums about the volatile keyword debate, but noone seemed to reconcile the need for volatile in memory-mapped IO.  Was this an oversight?
>>
>> What's D's answer to this?  If one were to use D to read from memory-mapped IO, how would one ensure the compiler doesn't cache the value?
>
> There are a few options:
>
> 1. Use shared in place of volatile.  I'm not sure this actually works, but otherwise this is pretty good.
>
> 2. Use the deprecated volatile statement.  D got it right that volatile access is a property of the load/store and not the variable, but missed the point that it's a huge pain to have to remember volatile at use.  Could be made better with a wrapper.  I think this still works.
>
> 3. Use inline assembly.  This sucks.
>
> 4. Defeat the optimizer with inline assembly.
>
> asm { nop; } // Haha, gotcha
> *my_hardware_register = 999;
> asm { nop; }
>
> This might be harder with gdc/ldc than it is with dmd, but I'm pretty sure there's a way to trick it into thinking an asm block could clobber/read arbitrary memory.
>

In gdc:
---
asm {"" ::: "memory";}

An asm instruction without any output operands will be treated identically to a volatile asm instruction in gcc, which indicates that the instruction has important side effects.  So it creates a point in the code which may not be deleted (unless it is proved to be unreachable).

The "memory" clobber will tell the backend to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.  (That does not prevent a CPU from reordering loads and stores with respect to another CPU, though; you need real memory barrier instructions for that.)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
October 24, 2013
On Thursday, 24 October 2013 at 13:22:50 UTC, Iain Buclaw wrote:

> In gdc:
> ---
> asm {"" ::: "memory";}
>
> An asm instruction without any output operands will be treated
> identically to a volatile asm instruction in gcc, which indicates that
> the instruction has important side effects.  So it creates a point in
> the code which may not be deleted (unless it is proved to be
> unreachable).
>
> The "memory" clobber will tell the backend to not keep memory values
> cached in registers across the assembler instruction and not optimize
> stores or loads to that memory.  (That does not prevent a CPU from
> reordering loads and stores with respect to another CPU, though; you
> need real memory barrier instructions for that.)

I have not (yet) had any problems when writing io registers but more with read access. Any operation after write should read the register back from real memory and not in processor registers.  Any repetitive read should always read the real io register in memory. The hardware may change the register value at any time.

Now a very common task like
while (regs.status==0) ...
may be optimized to an endless loop because the memory is read only once before the loop starts.

I understood from earlier posts that variables should not be volatile but the operation should. It seems it is possible to guide the compiler like above. So would the right solution be to have a volatile block, similar to synchronized? Inside that block no memory access is optimized.  This way no information of volatility is needed outside the block or in variables used there.



October 24, 2013
On Thu, 2013-10-24 at 08:19 +0200, Mike wrote:
[…]
> >
> >     int peek(int* p);
> >     void poke(int* p, int value);
> >
> > Implement them in the obvious way, and compile them separately so the optimizer will not try to inline/optimize them.
> 
> Thanks for the answer, Walter. I think this would be acceptable in many (most?) cases, but not where high performance is needed I think these functions add too much overhead if they are not inlined and in a critical path (bit-banging IO, for example). Afterall, a read/write to a volatile address is a single atomic instruction, if done properly.
> 
> Is there a way to tell D to remove the function overhead, for example, like a "naked" attribute, yet still retain the "volatile" behavior?

Also this (peek and poke) is not a viable approach if you wanted to
write an operating system in D.

I think it should be an aim to have the replacement for Windows, OS X, Linux, etc. written in D instead of C/C++.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

October 24, 2013
On Thursday, 24 October 2013 at 14:53:18 UTC, Russel Winder wrote:
> On Thu, 2013-10-24 at 08:19 +0200, Mike wrote:
> […]
> I think it should be an aim to have the replacement for Windows, OS X,
> Linux, etc. written in D instead of C/C++.

I pray strongly that W&A believe the same.

October 24, 2013
On 10/24/2013 4:18 AM, eles wrote:
> On Thursday, 24 October 2013 at 06:48:07 UTC, Walter Bright wrote:
>> On 10/23/2013 11:19 PM, Mike wrote:
>>> Thanks for the answer, Walter. I think this would be acceptable in many (most?)
>>> cases, but not where high performance is needed I think these functions add too
>>> much overhead if they are not inlined and in a critical path (bit-banging IO,
>>> for example). Afterall, a read/write to a volatile address is a single atomic
>>> instruction, if done properly.
>>>
>>> Is there a way to tell D to remove the function overhead, for example, like a
>>> "naked" attribute, yet still retain the "volatile" behavior?
>>
>> You have to give up on volatile. Nobody agrees on what it means. What does
>> "don't optimize" mean? And that's not at all the same thing as "atomic".
>
> Is not about "atomize me", it is about "really *read* me" or "really *write* me"
> at that memory location, don't fake it, don't cache me. And do it now, not 10
> seconds later.

Like I said, nobody (on the standards committees) could agree on exactly what that meant.