March 22, 2002
"Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7g4uv$2q8r$1@digitaldaemon.com...
> While I agree that you can use inline asm, and there are ways to code that could cause trouble, in practice, it works pretty well.  People don't do things like post increment external registers when reading them.  I know
the
> syntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware they have.  In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word with
the
> desired contents and write it in one piece to the external register.  So, while volatile isn't a complete solution, it avoids having to delve into
asm
> for the vast majority of such uses.

Wouldn't it be better to have a more reliable method than trial and error? Trial and error is subject to subtle changes if a new compiler is used.

I also wish to point out that volatile permeates the typing system in a C/C++ compiler. There is a great deal of code to keep everything straight in the contexts of overloading, casting, type copying, etc.

I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it and going *p. The compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem). Any reads through a pointer are not cached across any assignments through a pointer, including any function calls (again, due to the aliasing problem). For example, the second read of *p will not get cached away:

    x = *p;        // first read
    func();        // call function to prevent caching of pointer results
    y = *p;        // second read

func() can simply consist of RET. To do, say, a spin lock on *p:

    while (*p != value)
        func();


March 22, 2002
"Serge K" <skarebo@programmer.net> wrote in message news:a7gclf$ej$1@digitaldaemon.com...
> > BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still
optimize
> > the surrounding code, unlike any other inline implementation I'm aware
of.
> You should try Visual C++ for Alpha.
> It can optimize not only the surrounding code,
> but inline assembly code as well.
> I was truly amazed when I've noticed that.

D's instruction scheduler (and peephole optimizer) is specifically prevented from operating on the inline assembler blocks. I'm a little surprised that a compiler wouldn't do that. The whole point of inline asm is to wrest control away from the compiler and precisely lay out the instructions.


March 23, 2002
"Walter" <walter@digitalmars.com> wrote in message news:a7gfrs$35e$1@digitaldaemon.com...
>
> "Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7g4uv$2q8r$1@digitaldaemon.com...
> > While I agree that you can use inline asm, and there are ways to code
that
> > could cause trouble, in practice, it works pretty well.  People don't do things like post increment external registers when reading them.  I know
> the
> > syntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware
they
> > have.  In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word with
> the
> > desired contents and write it in one piece to the external register.
So,
> > while volatile isn't a complete solution, it avoids having to delve into
> asm
> > for the vast majority of such uses.
>
> Wouldn't it be better to have a more reliable method than trial and error?

Of course!  :-)

> Trial and error is subject to subtle changes if a new compiler is used.

Yes.


> I also wish to point out that volatile permeates the typing system in a C/C++ compiler. There is a great deal of code to keep everything straight
in
> the contexts of overloading, casting, type copying, etc.

I'll take your word for what is required within the compiler.  I'm a compiler user, not a designer.


> I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it and
going
> *p.

Sure.  But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a major source of error.

> The compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem).

Again, I am not a compiler designer, but "very very hard" implies that it isn't impossible and therefore, some future compiler *could* do it and thus breaking code as you described the problem above.  :-(

> Any reads through a pointer are not cached across any
> assignments through a pointer, including any function calls (again, due to
> the aliasing problem). For example, the second read of *p will not get
> cached away:
>
>     x = *p;        // first read
>     func();        // call function to prevent caching of pointer results
>     y = *p;        // second read
>
> func() can simply consist of RET. To do, say, a spin lock on *p:
>
>     while (*p != value)
>         func();
>

Oh, that's intuitive!  :-(  Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization.  Uccccch! There has got to be a better way to address the problem than this.  I'm not wedded to the "volatile" syntax and certainly not wedded to how C does things.  I was just pointing out, for those who have never done embedded programming, a major reason for that syntax.  If you can come up with a better solution (I guess I don't count the ones you have proposed so far to be better.) than I am all for it.  You have showed such immagination in solving other C/C++ deficiencies that I have reason to hope you can solve this one elegantly.

- Sorry to put you on the spot.  :-)

--
 - Stephen Fuld
   e-mail address disguised to prevent spam


March 23, 2002
On Fri, 22 Mar 2002 10:20:54 -0800, "Walter" <walter@digitalmars.com> wrote:

> BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.
> 

Watcom has a form of asm that allows optimization.

#pragma aux  setSP = \
    "mov ESP,  eax"  \
    parm [eax]            \
   modify [EAX] ;

#pragma aux getSP = \
    "mov edx, esp" \
    value [edx] modify [eax];

Then:
    ...
    current_sp = getSP()
   ---
is fully optimized.

It also has the 'asm("mov eax, esp") form, which I believe is opaque
to the compiler.


Watcom also allows register passing convention in addition
to the standard _stdcall and _stddecl.  This, and extensive optimization,
enables it to produce the fastest C code of any compiler
that I am aware of. An excellent back-end for D, someday.

free too ;-)

Karl Bochert



March 23, 2002
"Karl Bochert" <kbochert@ix.netcom.com> wrote in message news:1103_1016902791@bose...

> Watcom also allows register passing convention in addition
> to the standard _stdcall and _stddecl.  This, and extensive optimization,
> enables it to produce the fastest C code of any compiler

AFAIK, D chooses calling convention on its own, and might use fastcall where it seems better.

> that I am aware of. An excellent back-end for D, someday.
>
> free too ;-)

Hm? Where can I get it, then?


March 23, 2002
On Sat, 23 Mar 2002 22:49:09 +0300, "Pavel Minayev" <evilone@omen.ru> wrote:
> "Karl Bochert" <kbochert@ix.netcom.com> wrote in message news:1103_1016902791@bose...
> 
> > Watcom also allows register passing convention in addition
> > to the standard _stdcall and _stddecl.  This, and extensive optimization,
> > enables it to produce the fastest C code of any compiler
> 
> AFAIK, D chooses calling convention on its own, and might use fastcall where it seems better.
> 
> > that I am aware of. An excellent back-end for D, someday.
> >
> > free too ;-)
> 
> Hm? Where can I get it, then?
> 

To quote from a message on the Euphoria newsgroup
"
OpenWatcom is available as most of you know. The Beta to 11c does
compile Euphoria Translated Code and runs much faster than LCC
or Borland but you have to know a few tricks to get Watcom to work at all
because the libraries and header files arent included in the beta release.
I have the solution to this problem!

Download Watcom 11c beta
Download Masm32 by Hutch
"

I did this and the only problem I had was that I downloaded the file groups individually and missed one. Also the Watcom resource compiler is missing.

The URL's are:
http://www.openwatcom.org/
http://www.movsd.com/masm.htm

A couple of benchmarks: http://www.byte.com/art/9801/sec12/art7.htm. http://www.geocities.com/SiliconValley/Vista/6552/compila.html.

Karl Bochert




March 25, 2002
Watcom did run circles around the competition back in the day.  GCC's inline asm provides a similar amount of information to the optimizer so in theory it should be able to perform as well as Watcom (but in practice it doesn't, from what I can tell so far)

Watcom's inline asm had one main problem, which GCC doesn't:  Watcom didn't let your inline asm request an empty register from the compiler... you just used a given register and the asm around the call would be rearranged to make room for the register your inline asm used.  For recursive functions that doesn't work so well.  For instance if you made a vector add routine where the vectors are pointed to by edx and eax, then edx and eax would become bottleneck registers whilst doing lots of vector adds and would end up getting pushed and popped alot.

Sean


> Watcom also allows register passing convention in addition
> to the standard _stdcall and _stddecl.  This, and extensive optimization,
> enables it to produce the fastest C code of any compiler
> that I am aware of. An excellent back-end for D, someday.
>
> free too ;-)
>
> Karl Bochert



March 26, 2002
"Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7gom0$96n$1@digitaldaemon.com...
>
> "Walter" <walter@digitalmars.com> wrote in message news:a7gfrs$35e$1@digitaldaemon.com...
> > I don't see why volatile is that necessary for hardware registers. You
can
> > still easilly read a hardware register by setting a pointer to it and
> going
> > *p.
> Sure.  But I am trying, as I thought you were with D, trying to minimize/eliminate the use of pointers in the source code as a major
source
> of error.

Pointers are still in D, for the reason that sometimes you just gotta have them. Minimizing them is a design goal, though. Also, to access hardware registers, you're going to need pointers because there is no way to specify absolute addresses for variables.

> > The compiler isn't going to skip the write to it through *p (it's very, very hard for a C optimizer to remove dead stores through pointers, due
to
> > the aliasing problem).
> Again, I am not a compiler designer, but "very very hard" implies that it isn't impossible and therefore, some future compiler *could* do it and
thus
> breaking code as you described the problem above.  :-(

To make it impossible just have the pointer set in a function that the compiler doesn't know about.

> > Any reads through a pointer are not cached across any
> > assignments through a pointer, including any function calls (again, due
to
> > the aliasing problem). For example, the second read of *p will not get
> > cached away:
> >     x = *p;        // first read
> >     func();        // call function to prevent caching of pointer
results
> >     y = *p;        // second read
> > func() can simply consist of RET. To do, say, a spin lock on *p:
> >     while (*p != value)
> >         func();
> Oh, that's intuitive!  :-(  Add an extra empty function call in order to prevent the compiler from doing some undesirable optimization.  Uccccch! There has got to be a better way to address the problem than this.  I'm
not
> wedded to the "volatile" syntax and certainly not wedded to how C does things.  I was just pointing out, for those who have never done embedded programming, a major reason for that syntax.  If you can come up with a better solution (I guess I don't count the ones you have proposed so far
to
> be better.) than I am all for it.

Yeah, I understand it isn't the greatest, but it'll work reliably. I also happen to be fond of inline assembler when dealing with hardware <g>.

> You have showed such immagination in
> solving other C/C++ deficiencies that I have reason to hope you can solve
> this one elegantly.

Ahem. I'm on to that tactic!


March 26, 2002
"Karl Bochert" <kbochert@ix.netcom.com> wrote in message news:1103_1016902791@bose...
> On Fri, 22 Mar 2002 10:20:54 -0800, "Walter" <walter@digitalmars.com>
wrote:
> > BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still
optimize
> > the surrounding code, unlike any other inline implementation I'm aware
of.
> Watcom has a form of asm that allows optimization.
>
> #pragma aux  setSP = \
>     "mov ESP,  eax"  \
>     parm [eax]            \
>    modify [EAX] ;
> #pragma aux getSP = \
>     "mov edx, esp" \
>     value [edx] modify [eax];
> Then:
>     current_sp = getSP()
> is fully optimized.

The Digital Mars optimizer doesn't need those hints to be specified by the user, it just analyzes the instructions.

> Watcom also allows register passing convention in addition
> to the standard _stdcall and _stddecl.  This, and extensive optimization,
> enables it to produce the fastest C code of any compiler
> that I am aware of.

My marketing has always been bad. I remember magazine compiler reviews where the reviewer's own numbers showed us to be the fastest compiler, but borland got the writeup as fastest. Where we produced the fastest benchmarks according to the reviewer's own numbers, but watcom got the writeup as fastest. It's all a bit maddening <g>.


March 26, 2002
"Walter" <walter@digitalmars.com> wrote in message news:a7gfrs$35e$1@digitaldaemon.com...
> I don't see why volatile is that necessary for hardware registers. You can still easilly read a hardware register by setting a pointer to it and
going
> *p. The compiler isn't going to skip the write to it through *p (it's
very,
> very hard for a C optimizer to remove dead stores through pointers, due to the aliasing problem).

The linux crowd had the devil of a time with a new release of GCC.  It seems that the standard for C states that acessing the bytes of one object does not necessarily alias the bytes of any other object if their accesses are by different types, unless one is char.

This means that in:

    auto float f;
    *(volatile long *)&f = 0;

...this need not visibly affect the object f.  Yep.