Jump to page: 1 24  
Page
Thread overview
Volatile
Mar 21, 2002
Jim Starkey
Mar 21, 2002
Walter
Mar 22, 2002
Serge K
Mar 22, 2002
Walter
Mar 22, 2002
Stephen Fuld
Mar 22, 2002
Walter
Mar 22, 2002
Stephen Fuld
Mar 22, 2002
Walter
Mar 23, 2002
Stephen Fuld
Mar 26, 2002
Walter
Mar 27, 2002
Stephen Fuld
Mar 27, 2002
Pavel Minayev
Mar 27, 2002
OddesE
Mar 26, 2002
Walter
Mar 26, 2002
Russ Lewis
Mar 26, 2002
Walter
Mar 27, 2002
Richard Krehbiel
Mar 31, 2002
Walter
Mar 27, 2002
Stephen Fuld
Mar 27, 2002
OddesE
Mar 27, 2002
Stephen Fuld
Mar 28, 2002
OddesE
Mar 26, 2002
Richard Krehbiel
Mar 22, 2002
Serge K
Mar 22, 2002
Walter
Mar 23, 2002
Karl Bochert
Mar 23, 2002
Pavel Minayev
Mar 23, 2002
Karl Bochert
Watcom
Mar 25, 2002
Sean L. Palmer
Mar 26, 2002
Walter
Mar 22, 2002
Jim Starkey
Mar 22, 2002
Walter
March 21, 2002
Please pardon my ignorance if this has been hashed and re-hashed.  I
just got a pointer to D
from another list, came over for a quick look-see, and liked what I
saw.  So I thought I'd
toss in a few thoughts.

I notice there is no support for volatile, which perplexes me.  Volatile
is necessary to
warn an optimizer that another thread may change a data item without
warning.  It
isn't necessary in a JVM because those types of optimization can be
expressed in
byte codes, although it does limit what a JIT compiler can do.  D is
intended for real
compilation, however, and when the instruction set guys give us enough
registers,
the compiler is going to want to stick intermediates in them.  Without
volatile, this
ain't a gona work.

That said, the C concept of volatile declaration doesn't go far enough.
While it does
warn the compiler that an unexpected change is value is fair game, it
doesn't tell
the compiler when or if to generate multi-process safe instruction
sequences.

The obvious response is that data structures should be protected by a
mutex or
synchronize.   The problem is that these are vastly too expensive to use
in a
tight, fine-grained multi-thread application.  Modern multi-processors
do a
wonderful job of implementing processor interlocked atomic
instructions.  Modern
OSes do a reasonable job of scheduling threads on multi-processors.
Modern
language, however, do a rotten job of giving the primitives to exploit
these
environments.  Yeah, I know I can write an inline "lock xsub decl" yada
yada
yada.  But it's painful and non-portable.  And we all know that writing
assembler
rots the soul.

So, guys, I would like the following:

    1.  A volatile declaration so the compiler can do smart things while
I do
         fast things.
    2.  A "volatile volatile" declaration or distinct operator or
operator modified
         to tell the compiler to use an processor interlock instruction
sequence OR
         give me a compile time error why it can't.

There are probably smarter ways to do this than a volatile declaration.
But something
is needed in that niche.

Or, alternatively, I could have my head throughly wedged.  But I'll take
on all comers
until that is so obvious that I can see it myself.

March 21, 2002
"Jim Starkey" <jas@netfrastructure.com> wrote in message news:3C9A43BC.AFBA03BA@netfrastructure.com...
> I notice there is no support for volatile, which perplexes me.  Volatile
> is necessary to
> warn an optimizer that another thread may change a data item without
> warning.

They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word boundary, which can happen writing doubles, longs, or even misaligned ints.

> That said, the C concept of volatile declaration doesn't go far enough.
> While it does
> warn the compiler that an unexpected change is value is fair game, it
> doesn't tell
> the compiler when or if to generate multi-process safe instruction
> sequences.

I agree that the C definition of volatile is next to useless.

> The obvious response is that data structures should be protected by a
> mutex or
> synchronize.   The problem is that these are vastly too expensive to use
> in a
> tight, fine-grained multi-thread application.  Modern multi-processors
> do a
> wonderful job of implementing processor interlocked atomic
> instructions.  Modern
> OSes do a reasonable job of scheduling threads on multi-processors.
> Modern
> language, however, do a rotten job of giving the primitives to exploit
> these
> environments.  Yeah, I know I can write an inline "lock xsub decl" yada
> yada
> yada.  But it's painful and non-portable.  And we all know that writing
> assembler
> rots the soul.
> So, guys, I would like the following:
>
>     1.  A volatile declaration so the compiler can do smart things while
> I do
>          fast things.
>     2.  A "volatile volatile" declaration or distinct operator or
> operator modified
>          to tell the compiler to use an processor interlock instruction
> sequence OR
>          give me a compile time error why it can't.
>
> There are probably smarter ways to do this than a volatile declaration.
> But something
> is needed in that niche.

You're wrong, writing assembler puts one into a State of Grace <g>.



March 22, 2002
> > I notice there is no support for volatile, which perplexes me.  Volatile
> > is necessary to
> > warn an optimizer that another thread may change a data item without
> > warning.
>
> They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized".

"volatile" does not mean "atomic" or even "synchronized".
It's just an indication that some variable in the memory can be changed from "outside".
And nobody cares when *exactly* it happens, as long as it happens.

For example:

    by another thread on the same processor.
        => everything is in the same cache - no problem here.

    by another processor, or any other hardware (DMA, ...)
        => any modern processor has support for cache coherency
        (MESI or better), in fact - it's a "must" thing for any processor with the cache.
        - no problem there. (..even i486 had it..)

> I agree that the C definition of volatile is next to useless.

Is it?



March 22, 2002
"Serge K" <skarebo@programmer.net> wrote in message news:a7e2kc$17qp$1@digitaldaemon.com...
> > > I notice there is no support for volatile, which perplexes me.
Volatile
> > > is necessary to
> > > warn an optimizer that another thread may change a data item without
> > > warning.
> > They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized".
> "volatile" does not mean "atomic" or even "synchronized".

It does in Java, which to me makes it more useful than C's notion of "don't put it in a register".

> It's just an indication that some variable in the memory can be changed
from "outside".
> And nobody cares when *exactly* it happens, as long as it happens.
> For example:
>     by another thread on the same processor.
>         => everything is in the same cache - no problem here.
>     by another processor, or any other hardware (DMA, ...)
>         => any modern processor has support for cache coherency
>         (MESI or better), in fact - it's a "must" thing for any processor
with the cache.
>         - no problem there. (..even i486 had it..)

If you are writing to, say, a long, the long will be two write cycles. In between those two, another thread could change part of it, resulting in a scrambled write.

> > I agree that the C definition of volatile is next to useless.
> Is it?

 Since it does not guarantee atomic writes, yes, I believe it is useless.


March 22, 2002
Walter wrote in message ...
>> I notice there is no support for volatile, which perplexes me.  Volatile
>> is necessary to
>> warn an optimizer that another thread may change a data item without
>> warning.
>
>They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word
boundary,
>which can happen writing doubles, longs, or even misaligned ints.
>

No, it neither necessary nor desirable to use mutexes.  Yes, there are
restrictions on the interlocked instructions, but since volatile is
implemented/
enforced by the compiler, this should be acceptable.  The compiler's
responsibility should be to either implement an operation atomically
or generate a diagnostic explaining why it can't.

An example of something that can be cheaply handled by enhanced volatile
is use counts by objects shared across threads.  An atomic interlocked
decrement implemented with "lock xsub decl" does the trick correctly
with no more cost than an extra bus cycle, where a mutex requires an
OS call.  The ratio of costs are probably three orders of magnitude or
more.

>
>I agree that the C definition of volatile is next to useless.
>

I didn't mean to imply that the C definition of volatle is next to
useless -- it
is, in fact, absolutely critical for all but the most primitive
multi-threaded code.
Even when used with mutexes volatile is necessary to warn the optimizer
off unwarranted assumptions of invariance.

If D is going to succeed, it is necessary to anticipate where computer
architures are going.  Everyone, I hope, understands that memory is
cheap and plentiful, larger virtual address spaces are in easy sight,
and dirt cheap multi-processors are here.  Although we're in a period
of rapidly increasing clock rates, we're also approaching physical
limits on feature size.  In the not distant future it will be cheaper to
add more processors than buy/build faster ones.  At that point
performance will be gated by the degree to which doubling the
number of processors doubles the speed of the system.

There are a hierarchy of synchronization primitives -- interlocked
instructions, shared/exclusive locks, and mutexes -- with a large
variation in cost.  Interlocked instructions are almost free, mutexes
cost an arm and a leg.  Forcing all synchronization to use mutexes
is an unnecessary waste of resources.  In the absence of
volatile, however, it is impossible to implement finer grained
sychronization primitives.  This doesn't strike me as wise....


March 22, 2002
"Walter" <walter@digitalmars.com> wrote in message news:a7entf$1hik$2@digitaldaemon.com...
> "Serge K" <skarebo@programmer.net> wrote in message news:a7e2kc$17qp$1@digitaldaemon.com...
> > > > I notice there is no support for volatile, which perplexes me.
> Volatile
> > > > is necessary to
> > > > warn an optimizer that another thread may change a data item without
> > > > warning.
> > > They'd have to be implemented with mutexes anyway, so might as well
just
> > > wrap them in "synchronized".
> > "volatile" does not mean "atomic" or even "synchronized".
>
> It does in Java, which to me makes it more useful than C's notion of
"don't
> put it in a register".

This is necessary in many embedded systems, even when they are single threaded and even some operating system applications.  For example, it is common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space.  This makes it easy to use standard syntax to manipulate the register and is the only way to implement I/O on some processors.  However, you can't let the CPU keep the "data" in a CPU register or it won't work.  For example, an update to the register has to actually go to the external register to be effective.  It doesn't accomplish anything to update the copy in a CPU register without doing the store as the external hardware might not see it for a long time.  Similarly, of course, these external registers can change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.)  You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether.  Note that this is a different issue than cache coherence.

--
 - Stephen Fuld
   e-mail address disguised to prevent spam


March 22, 2002
"Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7fq36$2uha$1@digitaldaemon.com...
> > It does in Java, which to me makes it more useful than C's notion of
> "don't put it in a register".
> This is necessary in many embedded systems, even when they are single
> threaded and even some operating system applications.  For example, it is
> common in embedded systems to have external hardware be made visible by
> memory mapping the external hardware registers into the process memory
> space.  This makes it easy to use standard syntax to manipulate the
register
> and is the only way to implement I/O on some processors.  However, you
can't
> let the CPU keep the "data" in a CPU register or it won't work.  For example, an update to the register has to actually go to the external register to be effective.  It doesn't accomplish anything to update the
copy
> in a CPU register without doing the store as the external hardware might
not
> see it for a long time.  Similarly, of course, these external registers
can
> change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.)  You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether.  Note that this is a different issue than cache coherence.

I understand what you mean. It's still problematic how that actually winds
up being implemented in the compiler. C doesn't really define how many reads
are done to an arbitrary expression in order to implement it, for example:
    j = i++;
How many times is i read? Once or twice?
    mov eax, i
    inc i
    mov j, eax
or:
    mov eax, i
    mov j, eax
    inc eax
    mov i, eax
These ambiguities to me mean that if you need precise control over memory
read and write cycles, the appropriate thing to use is the inline assembler.
Volatile may happen to work, but to my mind is unreliable and may change
behavior from compiler to compiler.

BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.


March 22, 2002
"Jim Starkey" <jas@netfrastructure.com> wrote in message news:a7fk2p$20rj$1@digitaldaemon.com...
> >They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word
> boundary,
> >which can happen writing doubles, longs, or even misaligned ints.
> No, it neither necessary nor desirable to use mutexes.  Yes, there are
> restrictions on the interlocked instructions, but since volatile is
> implemented/
> enforced by the compiler, this should be acceptable.  The compiler's
> responsibility should be to either implement an operation atomically
> or generate a diagnostic explaining why it can't.

Writes to bytes and aligned words/dwords are done atomically by the CPU, misaligned data and multiword data is not.

> An example of something that can be cheaply handled by enhanced volatile
> is use counts by objects shared across threads.  An atomic interlocked
> decrement implemented with "lock xsub decl" does the trick correctly
> with no more cost than an extra bus cycle, where a mutex requires an
> OS call.  The ratio of costs are probably three orders of magnitude or
> more.

Synchronizing mutexes do not require an os call most of the time, although they still are slower than a simple lock. None of the modern java vm's do an os call for each synchronize.

> >I agree that the C definition of volatile is next to useless.
> I didn't mean to imply that the C definition of volatle is next to
> useless -- it
> is, in fact, absolutely critical for all but the most primitive
> multi-threaded code.
> Even when used with mutexes volatile is necessary to warn the optimizer
> off unwarranted assumptions of invariance.

I'm sorry, I just don't see how. See my other post here about j=i++; and how volatile doesn't help.

> If D is going to succeed, it is necessary to anticipate where computer
> architures are going.  Everyone, I hope, understands that memory is
> cheap and plentiful, larger virtual address spaces are in easy sight,
> and dirt cheap multi-processors are here.  Although we're in a period
> of rapidly increasing clock rates, we're also approaching physical
> limits on feature size.  In the not distant future it will be cheaper to
> add more processors than buy/build faster ones.  At that point
> performance will be gated by the degree to which doubling the
> number of processors doubles the speed of the system.

I think you're right.

> There are a hierarchy of synchronization primitives -- interlocked
> instructions, shared/exclusive locks, and mutexes -- with a large
> variation in cost.  Interlocked instructions are almost free, mutexes
> cost an arm and a leg.  Forcing all synchronization to use mutexes
> is an unnecessary waste of resources.  In the absence of
> volatile, however, it is impossible to implement finer grained
> sychronization primitives.  This doesn't strike me as wise....

I think your points merit further investigation, though I don't see how volatile is the answer.


March 22, 2002
"Walter" <walter@digitalmars.com> wrote in message news:a7ft6h$1ccq$1@digitaldaemon.com...
>
> "Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7fq36$2uha$1@digitaldaemon.com...
> > > It does in Java, which to me makes it more useful than C's notion of
> > "don't put it in a register".
> > This is necessary in many embedded systems, even when they are single
> > threaded and even some operating system applications.  For example, it
is
> > common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space.  This makes it easy to use standard syntax to manipulate the
> register
> > and is the only way to implement I/O on some processors.  However, you
> can't
> > let the CPU keep the "data" in a CPU register or it won't work.  For example, an update to the register has to actually go to the external register to be effective.  It doesn't accomplish anything to update the
> copy
> > in a CPU register without doing the store as the external hardware might
> not
> > see it for a long time.  Similarly, of course, these external registers
> can
> > change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.)  You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or
worse
> > yet, even be "optimized" away altogether.  Note that this is a different issue than cache coherence.
>
> I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how many
reads
> are done to an arbitrary expression in order to implement it, for example:
>     j = i++;
> How many times is i read? Once or twice?
>     mov eax, i
>     inc i
>     mov j, eax
> or:
>     mov eax, i
>     mov j, eax
>     inc eax
>     mov i, eax
> These ambiguities to me mean that if you need precise control over memory
> read and write cycles, the appropriate thing to use is the inline
assembler.
> Volatile may happen to work, but to my mind is unreliable and may change behavior from compiler to compiler.
>
> BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still
optimize
> the surrounding code, unlike any other inline implementation I'm aware of.

While I agree that you can use inline asm, and there are ways to code that could cause trouble, in practice, it works pretty well.  People don't do things like post increment external registers when reading them.  I know the syntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware they have.  In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word with the desired contents and write it in one piece to the external register.  So, while volatile isn't a complete solution, it avoids having to delve into asm for the vast majority of such uses.

--
 - Stephen Fuld
   e-mail address disguised to prevent spam


March 22, 2002
> BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.

You should try Visual C++ for Alpha.
It can optimize not only the surrounding code,
but inline assembly code as well.
I was truly amazed when I've noticed that.



« First   ‹ Prev
1 2 3 4