August 17, 2014
On 08/17/14 13:57, Johannes Pfau via D.gnu wrote:
> Am Sun, 17 Aug 2014 13:38:36 +0200
> schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:
> 
>> On 08/17/14 10:31, Johannes Pfau via D.gnu wrote:
>>> Am Sat, 16 Aug 2014 13:15:57 +0200
>>> schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:
>>>
>>>> It already does. Apparently there are some kind of problems with certain setups, but, instead of addressing those problems, more and more /language/ hacks are proposed...
>>
>>> So as you know all these problems and you know exactly how to fix them, where's your contribution?
>>
>> *I* haven't encountered any problems and have been using functions+ data+gc-sections for years...
>>
> 
> Then I don't understand your statement at all. You said 'instead of addressing those problems' but there are no problems?

I don't know - it wasn't me who proposed:

- attribute("noinit")
- attribute("notypeinfo")
- attribute("nocode")
- pragma(GNU_nomoduleinfo)

etc

> Also what exactly are 'more /language/ hacks'?

The above, volatile attribute etc. Note that I agree (some of) those
are necessary -- it's just that they are all useful for certain very
specific cases -- they are not a general solution to the codegen
bloat problem. A situation where practically every declaration and
almost every scope in a D program needs to be annotated with compiler-
-specific non-portable annotations is not a good one. And not even
a practical one -- it not reasonable to expect everyone to modify
the source of every used library (!) to match the requirements of
every project (some may need RTTI, other may not want it at all, etc).

artur

August 17, 2014
On 08/17/14 10:49, Johannes Pfau via D.gnu wrote:

> That's a good start. Can you also get unary operators working?
> e.g
> TimerB++;

Unary ops are easy. If you mean post-inc and post-dec -- that's a language problem. At least for volatile, they will cause a compile error; for atomic ops the naive `post-op->tmp-load+op+tmp` rewrite can introduce bugs... D would need to make the post-ops overloadable to get rid of these issues.

> Do you think it's possible to combine this with the other solution you posted for struct fields? Or do we need separate Volatile!T and VolatileField!T types?

Right now, I'd prefer this approach:

--------------------------------------------------------------
   module volat;

   version (GNU) {
   static import gcc.attribute;
   enum inline = gcc.attribute.attribute("forceinline");
   }

   extern int volatile_dummy;

   @inline T volatile_load(T)(ref T v) nothrow {
      asm { "" : "+m" v, "+m" volatile_dummy; }
      T res = v;
      asm { "" : "+g" res, "+m" v, "+m" volatile_dummy; }
      return res;
   }

   @inline void volatile_store(T, A)(ref T v, A a) nothrow {
      asm { "" : "+m" volatile_dummy : "m" v; }
      v = a;
      asm { "" : "+m" v, "+m" volatile_dummy; }
   }

   @inline void volatile_barrier(T)(ref T v) nothrow {
      asm { "" : "+m" v, "+m" volatile_dummy; }
   }

   struct Volatile(T) {
      T raw;
      nothrow: @inline:
      @disable this(this);
      void opAssign(A)(A a) { volatile_store(raw, a); }
      T load() @property { return volatile_load(raw); }
      alias load this;
      void opOpAssign(string OP)(const T b) {
           volatile_barrier(raw);
           mixin("raw " ~ OP ~ "= b;");
           volatile_barrier(raw);
      }
      T opUnary(string OP)() {
           volatile_barrier(raw);
           auto result = mixin(OP ~ "raw");
           volatile_barrier(raw);
           return result;
      }
   }
--------------------------------------------------------------
   import volat;

   struct Timer
   {
       Volatile!uint control;
       Volatile!uint data;
   }

   enum timerA = cast(Timer*)0xDEADBEAF;

   int main() {
      timerA.control |= 0b1;
      timerA.control += 1;
      timerA.control = 42;
      int a = timerA.data - timerA.data;
      int b = ++timerA.control;
      --timerA.data;
      timerA.control /= 2;
      return b;
   }
--------------------------------------------------------------

compiles to:

--------------------------------------------------------------
0000000000403620 <_Dmain>:
  403620:       ba af be ad de          mov    $0xdeadbeaf,%edx
  403625:       b9 b3 be ad de          mov    $0xdeadbeb3,%ecx
  40362a:       83 0a 01                orl    $0x1,(%rdx)
  40362d:       83 02 01                addl   $0x1,(%rdx)
  403630:       c7 02 2a 00 00 00       movl   $0x2a,(%rdx)
  403636:       8b 42 04                mov    0x4(%rdx),%eax
  403639:       8b 72 04                mov    0x4(%rdx),%esi
  40363c:       8b 02                   mov    (%rdx),%eax
  40363e:       83 c0 01                add    $0x1,%eax
  403641:       89 02                   mov    %eax,(%rdx)
  403643:       83 6a 04 01             subl   $0x1,0x4(%rdx)
  403647:       d1 2a                   shrl   (%rdx)
  403649:       c3                      retq
--------------------------------------------------------------

Do you see any problems with it? (Other than gcc not removing
that dead constant load)

[The struct-with-volatile-fields can be built from a "normal"
 struct at CT. But that's just syntax sugar.]

artur
August 17, 2014
On Sunday, 17 August 2014 at 11:35:33 UTC, Artur Skawina via D.gnu wrote:

> It works for me:
>
>    import volat; // module w/ the last Volatile(T) implementation.
>
>    struct uartreg {
>        Volatile!int sr;
>        Volatile!int dr;
>        Volatile!int brr;
>        Volatile!int cr1;
>        Volatile!int cr2;
>        Volatile!int cr3;
>        Volatile!int gtpr;
>
>        // send a byte to the uart
>        void send(int t) {
>          while ((sr&0x80)==0)
>          {  }
>          dr=t;
>        }
>    }
>
>    enum uart = cast(uartreg*)0xDEADBEAF;
>
>    void main() {
>       uart.send(42);
>    }
>
> =>
>
> 0000000000403620 <_Dmain>:
>   403620:       b8 af be ad de          mov    $0xdeadbeaf,%eax
>   403625:       0f 1f 00                nopl   (%rax)
>   403628:       b9 af be ad de          mov    $0xdeadbeaf,%ecx
>   40362d:       8b 11                   mov    (%rcx),%edx
>   40362f:       81 e2 80 00 00 00       and    $0x80,%edx
>   403635:       74 f1                   je     403628 <_Dmain+0x8>
>   403637:       bf b3 be ad de          mov    $0xdeadbeb3,%edi
>   40363c:       31 c0                   xor    %eax,%eax
>   40363e:       c7 07 2a 00 00 00       movl   $0x2a,(%rdi)
>   403644:       c3                      retq
>
> Except for some obviously missed optimizations (dead eax load,
> unnecessary ecx reload), the code seems fine. What platform
> are you using and what does the emitted code look like?
>
>> Also if I have:
>> cr1=cr2=0;
>> I get: expression this.cr2.opAssign(0) is void and has no value
>
> That's because the opAssign returns void, which prevents this
> kind of chaining. This was a deliberate choice, as I /wanted/ to
> disallow that; it's already a bad idea for normal assignments;
> for volatile ones, which can require a specific order, it's an
> even worse one.
> But it's trivial to "fix", just change
>
>    void opAssign(A)(A a) { volatile_store(raw, a); }
>
> to
>
>    T opAssign(A)(A a) { volatile_store(raw, a); return a; }
>
> artur

I am compiling for arm and I am sorry I misinterpreted the optimized code. Actually the code is correct but it still does not work.
The problem is that the call to get the tls pointer for volatile_dummy seems to corrupt the register (r3) where the this pointer is. The call is inside the while loop.  After removing tha call by hand in the assembly everything works. R3 is usually pushed into stack when it is used in a function. I have to check what is wrong in this case.
August 17, 2014
On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:

> I am compiling for arm and I am sorry I misinterpreted the optimized code. Actually the code is correct but it still does not work.
> The problem is that the call to get the tls pointer for volatile_dummy seems to corrupt the register (r3) where the this pointer is. The call is inside the while loop.  After removing tha call by hand in the assembly everything works. R3 is usually pushed into stack when it is used in a function. I have to check what is wrong in this case.

Does declaring it as:

   extern __gshared int volatile_dummy;

help?

artur
August 17, 2014
Am Sun, 17 Aug 2014 15:15:12 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:

> Do you see any problems with it? (Other than gcc not removing
> that dead constant load)

It's perfect for structs, but when simply declaring a Volatile!uint the pointer dereference must be done manually, right?

----
enum TimerB = cast(Volatile!(uint)*)0xDEADBEEF;

*TimerB |= 0b1;
----

I don't think that a huge problem though, just a little bit inconvenient.
August 17, 2014
On Sunday, 17 August 2014 at 13:59:03 UTC, Artur Skawina via D.gnu wrote:
> On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:
>
>> I am compiling for arm and I am sorry I misinterpreted the optimized code. Actually the code is correct but it still does not work.
>> The problem is that the call to get the tls pointer for volatile_dummy seems to corrupt the register (r3) where the this pointer is. The call is inside the while loop.  After removing tha call by hand in the assembly everything works. R3 is usually pushed into stack when it is used in a function. I have to check what is wrong in this case.
>
> Does declaring it as:
>
>    extern __gshared int volatile_dummy;
> 
> help?
>
> artur

Yes, now it works.

But the register corruption is still an issue. My tls function clearly uses r3 and does not save it.

Johannes, do you know the arm calling system? Is it caller or callee that should save r3?
In this case it is my function that has one function inlined that has another function inlined that contains a compiler generated function call. Could this be a bug in the compiler that it does not recognize the innermost call and does not save registers?
August 17, 2014
Am Sun, 17 Aug 2014 14:36:53 +0000
schrieb "Timo Sintonen" <t.sintonen@luukku.com>:

> On Sunday, 17 August 2014 at 13:59:03 UTC, Artur Skawina via D.gnu wrote:
> > On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:
> >
> >> I am compiling for arm and I am sorry I misinterpreted the
> >> optimized code. Actually the code is correct but it still does
> >> not work.
> >> The problem is that the call to get the tls pointer for
> >> volatile_dummy seems to corrupt the register (r3) where the
> >> this pointer is. The call is inside the while loop.  After
> >> removing tha call by hand in the assembly everything works. R3
> >> is usually pushed into stack when it is used in a function. I
> >> have to check what is wrong in this case.
> >
> > Does declaring it as:
> >
> >    extern __gshared int volatile_dummy;
> > 
> > help?
> >
> > artur
> 
> Yes, now it works.
> 
> But the register corruption is still an issue. My tls function clearly uses r3 and does not save it.
> 
> Johannes, do you know the arm calling system? Is it caller or
> callee that should save r3?
> In this case it is my function that has one function inlined that
> has another function inlined that contains a compiler generated
> function call. Could this be a bug in the compiler that it does
> not recognize the innermost call and does not save registers?

r3 is an argument/scratch register, the callee can't rely on its contents after a function call. This could also be caused by the inline ASM.
August 17, 2014
Am Sun, 17 Aug 2014 16:45:15 +0200
schrieb Johannes Pfau <nospam@example.com>:

> the callee can't rely on its
caller of course ;-)
August 17, 2014
On Sunday, 17 August 2014 at 14:47:57 UTC, Johannes Pfau wrote:
> Am Sun, 17 Aug 2014 14:36:53 +0000
> schrieb "Timo Sintonen" <t.sintonen@luukku.com>:
>
>> On Sunday, 17 August 2014 at 13:59:03 UTC, Artur Skawina via D.gnu wrote:
>> > On 08/17/14 15:44, Timo Sintonen via D.gnu wrote:
>> >
>> >> I am compiling for arm and I am sorry I misinterpreted the optimized code. Actually the code is correct but it still does not work.
>> >> The problem is that the call to get the tls pointer for volatile_dummy seems to corrupt the register (r3) where the this pointer is. The call is inside the while loop.  After removing tha call by hand in the assembly everything works. R3 is usually pushed into stack when it is used in a function. I have to check what is wrong in this case.
>> >
>> > Does declaring it as:
>> >
>> >    extern __gshared int volatile_dummy;
>> > 
>> > help?
>> >
>> > artur
>> 
>> Yes, now it works.
>> 
>> But the register corruption is still an issue. My tls function clearly uses r3 and does not save it.
>> 
>> Johannes, do you know the arm calling system? Is it caller or callee that should save r3?
>> In this case it is my function that has one function inlined that has another function inlined that contains a compiler generated function call. Could this be a bug in the compiler that it does not recognize the innermost call and does not save registers?
>
> r3 is an argument/scratch register, the callee can't rely on its
> contents after a function call. This could also be caused by the inline
> ASM.

So is this a bug or just undefined behavior?
August 17, 2014
On 08/17/14 16:16, Johannes Pfau via D.gnu wrote:
> Am Sun, 17 Aug 2014 15:15:12 +0200
> schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:
> 
>> Do you see any problems with it? (Other than gcc not removing
>> that dead constant load)
> 
> It's perfect for structs, but when simply declaring a Volatile!uint the pointer dereference must be done manually, right?
> 
> ----
> enum TimerB = cast(Volatile!(uint)*)0xDEADBEEF;
> 
> *TimerB |= 0b1;
> ----
> 
> I don't think that a huge problem though, just a little bit inconvenient.

Another D-problem - the language doesn't have /real/ refs. But...

   import volat;

   @inline ref @property timerA() { return *cast(Volatile!uint*)0xDEADBEAF; }

   int main() {
      timerA |= 0b1;
      timerA += 1;
      timerA = 42;
      int a = timerA - timerA;
      int b = ++timerA;
      --timerA;
      timerA /= 2;
      return b;
   }

=>

0000000000403620 <_Dmain>:
  403620:       ba af be ad de          mov    $0xdeadbeaf,%edx
  403625:       83 0a 01                orl    $0x1,(%rdx)
  403628:       83 02 01                addl   $0x1,(%rdx)
  40362b:       c7 02 2a 00 00 00       movl   $0x2a,(%rdx)
  403631:       8b 02                   mov    (%rdx),%eax
  403633:       8b 0a                   mov    (%rdx),%ecx
  403635:       8b 02                   mov    (%rdx),%eax
  403637:       83 c0 01                add    $0x1,%eax
  40363a:       89 02                   mov    %eax,(%rdx)
  40363c:       83 2a 01                subl   $0x1,(%rdx)
  40363f:       d1 2a                   shrl   (%rdx)
  403641:       c3                      retq

artur