August 17, 2014
Am Sat, 16 Aug 2014 11:58:49 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:

> On 08/16/14 09:33, Johannes Pfau via D.gnu wrote:
> > https://github.com/D-Programming-GDC/GDC/pull/82
> 
> [Only noticed this accidentally; using a mailing list instead of some web forum would increase visibility...]
> 
> >  enum var = Volatile!(T,addr)(): doesn't allow |= on enum literals,
> > even if the type implements opAssign as there's no this pointer
> 
>    T volatile_load(T)(ref T v) nothrow {
>       asm { "" : "+m" v; }
>       T res = v;
>       asm { "" : "+g" res; }
>       return res;
>    }
> 
>    void volatile_store(T)(ref T v, const T a) nothrow {
>       asm { "" : : "m" v; }
>       v = a;
>       asm { "" : "+m" v; }
>    }
> 
>    struct Volatile(T, alias /* T* */ A) {
>        void opOpAssign(string OP)(const T rhs) nothrow {
>            auto v = volatile_load(*A);
>            mixin("v " ~ OP ~ "= rhs;");
>            volatile_store(*A, v);
>        }
>    }
> 
>    enum TimerB = Volatile!(uint, cast(uint*)0xDEADBEEF)();
> 
>    void main() {
>       TimerB |= 0b1;
>       TimerB += 1;
>    }

That's a good start. Can you also get unary operators working?
e.g
TimerB++;

Do you think it's possible to combine this with the other solution you posted for struct fields? Or do we need separate Volatile!T and VolatileField!T types?
August 17, 2014
Am Sun, 17 Aug 2014 07:57:15 +0000
schrieb "Timo Sintonen" <t.sintonen@luukku.com>:

> On Saturday, 16 August 2014 at 20:01:06 UTC, Artur Skawina via D.gnu wrote:
> > On 08/16/14 20:40, Artur Skawina wrote:
> >>> How can I use this with struct members ?
> >> 
> >> One possibility would be to declare all members as `Volatile!...`, or
> >
> > I did not like that required dereference in the previous
> > version,
> > and tried a different approach:
> >
> >    struct Timer
> >    {
> >        Volatile!uint control;
> >        Volatile!uint data;
> >    }
> >
> >    enum timerA = cast(Timer*)0xDEADBEAF;
> >
> >    int main() {
> >       timerA.control |= 0b1;
> >       timerA.control += 1;
> >       timerA.control = 42;
> >       int a = timerA.data - timerA.data;
> >       int b = timerA.control;
> >       return timerA.control;
> >    }
> >
> >    version (GNU) {
> >    static import gcc.attribute;
> >    enum inline = gcc.attribute.attribute("forceinline");
> >    }
> > 
> >    extern int volatile_dummy;
> > 
> >    @inline T volatile_load(T)(ref T v) nothrow {
> >       asm { "" : "+m" v, "+m" volatile_dummy; }
> >       T res = v;
> >       asm { "" : "+g" res, "+m" v, "+m" volatile_dummy; }
> >       return res;
> >    }
> >
> >    @inline void volatile_store(T, A)(ref T v, A a) nothrow {
> >       asm { "" : "+m" volatile_dummy : "m" v; }
> >       v = a;
> >       asm { "" : "+m" v, "+m" volatile_dummy; }
> >    }
> > 
> >    struct Volatile(T) {
> >       T raw;
> >       nothrow: @inline:
> >       @disable this(this);
> >       void opAssign(A)(A a) { volatile_store(raw, a); }
> >       T load() @property { return volatile_load(raw); }
> >       alias load this;
> >       void opOpAssign(string OP)(const T rhs) {
> >            auto v = volatile_load(raw);
> >            mixin("v " ~ OP ~ "= rhs;");
> >            volatile_store(raw, v);
> >       }
> >    }
> >
> >
> > artur
> 
> This seems to work. With inlining the code is quite compact.
> 
> Not tested yet but the code for these constructs looks correct:
> for (f=0;f<50;f++) { regs.txreg = śomebuf[f] }
> while (regs.status == 0) {}
> 
> What is the purpose of volatile_dummy? Even if it is not used, the address for it is calculated in several places.
> 
> The struct members are defined saparately. This means the address of every member is stored and fetched separately. The compiler seems to remove some of these and use the pointer, but I am not sure what happens when the structs are bigger.
> 
> It seems all loads and stores access the real memory, like volatile should do. It is hard to follow the optimized code so I am not yet sure that they have not been reordered in any way.
> 
> Anyway, this seems acceptable solution to me.
> 
> Johannes, is this good starting point to you or is your work with compiler builtins giving us some more?

You mean __builtin_volatile_load/store? I'm not sure if compiler barriers and these builtins are 100% equal, I think I managed to produce example code where the barriers didn't work 100% as expected. But these builtins will need to be introduced anyway as core.bitop.volatileLoad or whatever final name the DMD devs decide on.

Regarding nocode/typeinfo/noinit/GNU_nomoduleinfo: I think these are useful anyway. The linker can strip these out, but I don't want to rely on the linker and on the user to know all special linker flags only to avoid some binary bloat which can be avoided in the compiler.

But overall this approach looks fine.

August 17, 2014
On Sunday, 17 August 2014 at 07:57:15 UTC, Timo Sintonen wrote:
>
>
> This seems to work.
>

This does not work with member functions

struct uartreg {
    Volatile!int sr;
    Volatile!int dr;
    Volatile!int brr;
    Volatile!int cr1;
    Volatile!int cr2;
    Volatile!int cr3;
    Volatile!int gtpr;

    // send a byte to the uart
    void send(int t) {
      while ((sr&0x80)==0)
      {  }
      dr=t;
    }

}

In this function the fetch of sr is omitted but compare is still
made against an invalid register value. Then address of dr is
omitted and store is made from wrong register to invalid address.
So the generated code is totally invalid.

If I move this function out of the struct then it is ok.
I use -O2, not tested what it woud do without optimization.

Also if I have:
cr1=cr2=0;
I get: expression this.cr2.opAssign(0) is void and has no value
August 17, 2014
On Saturday, 16 August 2014 at 11:16:09 UTC, Artur Skawina via
D.gnu wrote:

> A `@nocode` attribute would be a good idea, yes, but there's no need
> to make it implicit for `@inline`.
>
>> But this situation demonstrates why having an intelligent linker is a better solution than decorating with attributes.  The linker should know if you took an address of an always-inlined function or not and decide whether or not to remove it from the binary.
>
> It already does. Apparently there are some kind of problems with
> certain setups, but, instead of addressing those problems, more and
> more /language/ hacks are proposed...
>

Do you mean the problems with --gc-sections breaking code?

Mike
August 17, 2014
On Sunday, 17 August 2014 at 08:26:40 UTC, Johannes Pfau wrote:
>
> Great! But I think this pull request addresses a different monitor
> problem: There's an implicit __monitor field in every class right now,
> which makes every class _instance_ bigger.
>
> But the monitor in TypeInfo/ClassInfo is different: ClassInfo exists
> only once per class, it doesn't matter how many class instances you've
> got. AFAIR this monitor is to support synchronize(ClassType) which
> synchronizes on the class type, not on an instance.

I looked through the source code, and couldn't find any such
monitor.  Can you please point it out for me?

Thanks,
Mike
August 17, 2014
On 08/17/14 11:24, Timo Sintonen via D.gnu wrote:
> On Sunday, 17 August 2014 at 07:57:15 UTC, Timo Sintonen wrote:
>>
>>
>> This seems to work.
>>
> 
> This does not work with member functions
> 
> struct uartreg {
>     Volatile!int sr;
>     Volatile!int dr;
>     Volatile!int brr;
>     Volatile!int cr1;
>     Volatile!int cr2;
>     Volatile!int cr3;
>     Volatile!int gtpr;
> 
>     // send a byte to the uart
>     void send(int t) {
>       while ((sr&0x80)==0)
>       {  }
>       dr=t;
>     }
> 
> }
> 
> In this function the fetch of sr is omitted but compare is still made against an invalid register value. Then address of dr is omitted and store is made from wrong register to invalid address. So the generated code is totally invalid.
> 
> If I move this function out of the struct then it is ok.
> I use -O2, not tested what it woud do without optimization.

It works for me:

   import volat; // module w/ the last Volatile(T) implementation.

   struct uartreg {
       Volatile!int sr;
       Volatile!int dr;
       Volatile!int brr;
       Volatile!int cr1;
       Volatile!int cr2;
       Volatile!int cr3;
       Volatile!int gtpr;

       // send a byte to the uart
       void send(int t) {
         while ((sr&0x80)==0)
         {  }
         dr=t;
       }
   }

   enum uart = cast(uartreg*)0xDEADBEAF;

   void main() {
      uart.send(42);
   }

=>

0000000000403620 <_Dmain>:
  403620:       b8 af be ad de          mov    $0xdeadbeaf,%eax
  403625:       0f 1f 00                nopl   (%rax)
  403628:       b9 af be ad de          mov    $0xdeadbeaf,%ecx
  40362d:       8b 11                   mov    (%rcx),%edx
  40362f:       81 e2 80 00 00 00       and    $0x80,%edx
  403635:       74 f1                   je     403628 <_Dmain+0x8>
  403637:       bf b3 be ad de          mov    $0xdeadbeb3,%edi
  40363c:       31 c0                   xor    %eax,%eax
  40363e:       c7 07 2a 00 00 00       movl   $0x2a,(%rdi)
  403644:       c3                      retq

Except for some obviously missed optimizations (dead eax load, unnecessary ecx reload), the code seems fine. What platform are you using and what does the emitted code look like?

> Also if I have:
> cr1=cr2=0;
> I get: expression this.cr2.opAssign(0) is void and has no value

That's because the opAssign returns void, which prevents this
kind of chaining. This was a deliberate choice, as I /wanted/ to
disallow that; it's already a bad idea for normal assignments;
for volatile ones, which can require a specific order, it's an
even worse one.
But it's trivial to "fix", just change

   void opAssign(A)(A a) { volatile_store(raw, a); }

to

   T opAssign(A)(A a) { volatile_store(raw, a); return a; }

artur

August 17, 2014
On 08/17/14 10:31, Johannes Pfau via D.gnu wrote:
> Am Sat, 16 Aug 2014 13:15:57 +0200
> schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:
> 
>> It already does. Apparently there are some kind of problems with certain setups, but, instead of addressing those problems, more and more /language/ hacks are proposed...

> So as you know all these problems and you know exactly how to fix them, where's your contribution?

*I* haven't encountered any problems and have been using functions+ data+gc-sections for years...

artur

August 17, 2014
Am Sun, 17 Aug 2014 10:44:34 +0000
schrieb "Mike" <none@none.com>:

> On Sunday, 17 August 2014 at 08:26:40 UTC, Johannes Pfau wrote:
> >
> > Great! But I think this pull request addresses a different
> > monitor
> > problem: There's an implicit __monitor field in every class
> > right now,
> > which makes every class _instance_ bigger.
> >
> > But the monitor in TypeInfo/ClassInfo is different: ClassInfo
> > exists
> > only once per class, it doesn't matter how many class instances
> > you've
> > got. AFAIR this monitor is to support synchronize(ClassType)
> > which
> > synchronizes on the class type, not on an instance.
> 
> I looked through the source code, and couldn't find any such monitor.  Can you please point it out for me?
> 
> Thanks,
> Mike

In gcc/d/d-objfile.cc: Search for

  /* Put out the ClassInfo.
   * The layout is:
   *  void **vptr;
   *  monitor_t monitor;
   *  byte[] initializer;         // static initialisation data

Actually I just realized that this is also true for all TypeInfo, so I'll have to revert the commit which placed TypeInfo into .rodata

(Thinking more about it, it's more or less the same monitor as the
one referred in the pull request: TypeInfo are classes and for every
type there is one instance, then these instances have __monitor fields.
But the implementation in the compiler is slightly different)
August 17, 2014
On 08/17/14 09:57, Timo Sintonen via D.gnu wrote:

> What is the purpose of volatile_dummy? Even if it is not used,

Ensuring ordering, w/o it the compiler could reorder operations
on different volatile objects. (Which isn't necessarily a bad thing,
but people expect certain semantics of 'volatile', so it would be
a bad and dangerous default)

> the address for it is calculated in several places.

It's completely optimized away for me (I'm testing on x86). Can you
show the emitted code?

> The struct members are defined saparately. This means the address of every member is stored and fetched separately. The compiler seems to remove some of these and use the pointer, but I am not sure what happens when the structs are bigger.

Yes, the compiler does not to generate optimal code, but so far I've only seen dead immediate-constant->register loads; so it's not a huge problem.

artur
August 17, 2014
Am Sun, 17 Aug 2014 13:38:36 +0200
schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:

> On 08/17/14 10:31, Johannes Pfau via D.gnu wrote:
> > Am Sat, 16 Aug 2014 13:15:57 +0200
> > schrieb "Artur Skawina via D.gnu" <d.gnu@puremagic.com>:
> > 
> >> It already does. Apparently there are some kind of problems with certain setups, but, instead of addressing those problems, more and more /language/ hacks are proposed...
> 
> > So as you know all these problems and you know exactly how to fix them, where's your contribution?
> 
> *I* haven't encountered any problems and have been using functions+ data+gc-sections for years...
> 

Then I don't understand your statement at all. You said 'instead of addressing those problems' but there are no problems? Also what exactly are 'more /language/ hacks'?