Thread overview | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 21, 2002 Volatile | ||||
---|---|---|---|---|
| ||||
Please pardon my ignorance if this has been hashed and re-hashed. I just got a pointer to D from another list, came over for a quick look-see, and liked what I saw. So I thought I'd toss in a few thoughts. I notice there is no support for volatile, which perplexes me. Volatile is necessary to warn an optimizer that another thread may change a data item without warning. It isn't necessary in a JVM because those types of optimization can be expressed in byte codes, although it does limit what a JIT compiler can do. D is intended for real compilation, however, and when the instruction set guys give us enough registers, the compiler is going to want to stick intermediates in them. Without volatile, this ain't a gona work. That said, the C concept of volatile declaration doesn't go far enough. While it does warn the compiler that an unexpected change is value is fair game, it doesn't tell the compiler when or if to generate multi-process safe instruction sequences. The obvious response is that data structures should be protected by a mutex or synchronize. The problem is that these are vastly too expensive to use in a tight, fine-grained multi-thread application. Modern multi-processors do a wonderful job of implementing processor interlocked atomic instructions. Modern OSes do a reasonable job of scheduling threads on multi-processors. Modern language, however, do a rotten job of giving the primitives to exploit these environments. Yeah, I know I can write an inline "lock xsub decl" yada yada yada. But it's painful and non-portable. And we all know that writing assembler rots the soul. So, guys, I would like the following: 1. A volatile declaration so the compiler can do smart things while I do fast things. 2. A "volatile volatile" declaration or distinct operator or operator modified to tell the compiler to use an processor interlock instruction sequence OR give me a compile time error why it can't. There are probably smarter ways to do this than a volatile declaration. But something is needed in that niche. Or, alternatively, I could have my head throughly wedged. But I'll take on all comers until that is so obvious that I can see it myself. |
March 21, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jim Starkey | "Jim Starkey" <jas@netfrastructure.com> wrote in message news:3C9A43BC.AFBA03BA@netfrastructure.com... > I notice there is no support for volatile, which perplexes me. Volatile > is necessary to > warn an optimizer that another thread may change a data item without > warning. They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word boundary, which can happen writing doubles, longs, or even misaligned ints. > That said, the C concept of volatile declaration doesn't go far enough. > While it does > warn the compiler that an unexpected change is value is fair game, it > doesn't tell > the compiler when or if to generate multi-process safe instruction > sequences. I agree that the C definition of volatile is next to useless. > The obvious response is that data structures should be protected by a > mutex or > synchronize. The problem is that these are vastly too expensive to use > in a > tight, fine-grained multi-thread application. Modern multi-processors > do a > wonderful job of implementing processor interlocked atomic > instructions. Modern > OSes do a reasonable job of scheduling threads on multi-processors. > Modern > language, however, do a rotten job of giving the primitives to exploit > these > environments. Yeah, I know I can write an inline "lock xsub decl" yada > yada > yada. But it's painful and non-portable. And we all know that writing > assembler > rots the soul. > So, guys, I would like the following: > > 1. A volatile declaration so the compiler can do smart things while > I do > fast things. > 2. A "volatile volatile" declaration or distinct operator or > operator modified > to tell the compiler to use an processor interlock instruction > sequence OR > give me a compile time error why it can't. > > There are probably smarter ways to do this than a volatile declaration. > But something > is needed in that niche. You're wrong, writing assembler puts one into a State of Grace <g>. |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | > > I notice there is no support for volatile, which perplexes me. Volatile > > is necessary to > > warn an optimizer that another thread may change a data item without > > warning. > > They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". "volatile" does not mean "atomic" or even "synchronized". It's just an indication that some variable in the memory can be changed from "outside". And nobody cares when *exactly* it happens, as long as it happens. For example: by another thread on the same processor. => everything is in the same cache - no problem here. by another processor, or any other hardware (DMA, ...) => any modern processor has support for cache coherency (MESI or better), in fact - it's a "must" thing for any processor with the cache. - no problem there. (..even i486 had it..) > I agree that the C definition of volatile is next to useless. Is it? |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Serge K | "Serge K" <skarebo@programmer.net> wrote in message news:a7e2kc$17qp$1@digitaldaemon.com... > > > I notice there is no support for volatile, which perplexes me. Volatile > > > is necessary to > > > warn an optimizer that another thread may change a data item without > > > warning. > > They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". > "volatile" does not mean "atomic" or even "synchronized". It does in Java, which to me makes it more useful than C's notion of "don't put it in a register". > It's just an indication that some variable in the memory can be changed from "outside". > And nobody cares when *exactly* it happens, as long as it happens. > For example: > by another thread on the same processor. > => everything is in the same cache - no problem here. > by another processor, or any other hardware (DMA, ...) > => any modern processor has support for cache coherency > (MESI or better), in fact - it's a "must" thing for any processor with the cache. > - no problem there. (..even i486 had it..) If you are writing to, say, a long, the long will be two write cycles. In between those two, another thread could change part of it, resulting in a scrambled write. > > I agree that the C definition of volatile is next to useless. > Is it? Since it does not guarantee atomic writes, yes, I believe it is useless. |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | Walter wrote in message ... >> I notice there is no support for volatile, which perplexes me. Volatile >> is necessary to >> warn an optimizer that another thread may change a data item without >> warning. > >They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word boundary, >which can happen writing doubles, longs, or even misaligned ints. > No, it neither necessary nor desirable to use mutexes. Yes, there are restrictions on the interlocked instructions, but since volatile is implemented/ enforced by the compiler, this should be acceptable. The compiler's responsibility should be to either implement an operation atomically or generate a diagnostic explaining why it can't. An example of something that can be cheaply handled by enhanced volatile is use counts by objects shared across threads. An atomic interlocked decrement implemented with "lock xsub decl" does the trick correctly with no more cost than an extra bus cycle, where a mutex requires an OS call. The ratio of costs are probably three orders of magnitude or more. > >I agree that the C definition of volatile is next to useless. > I didn't mean to imply that the C definition of volatle is next to useless -- it is, in fact, absolutely critical for all but the most primitive multi-threaded code. Even when used with mutexes volatile is necessary to warn the optimizer off unwarranted assumptions of invariance. If D is going to succeed, it is necessary to anticipate where computer architures are going. Everyone, I hope, understands that memory is cheap and plentiful, larger virtual address spaces are in easy sight, and dirt cheap multi-processors are here. Although we're in a period of rapidly increasing clock rates, we're also approaching physical limits on feature size. In the not distant future it will be cheaper to add more processors than buy/build faster ones. At that point performance will be gated by the degree to which doubling the number of processors doubles the speed of the system. There are a hierarchy of synchronization primitives -- interlocked instructions, shared/exclusive locks, and mutexes -- with a large variation in cost. Interlocked instructions are almost free, mutexes cost an arm and a leg. Forcing all synchronization to use mutexes is an unnecessary waste of resources. In the absence of volatile, however, it is impossible to implement finer grained sychronization primitives. This doesn't strike me as wise.... |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:a7entf$1hik$2@digitaldaemon.com... > "Serge K" <skarebo@programmer.net> wrote in message news:a7e2kc$17qp$1@digitaldaemon.com... > > > > I notice there is no support for volatile, which perplexes me. > Volatile > > > > is necessary to > > > > warn an optimizer that another thread may change a data item without > > > > warning. > > > They'd have to be implemented with mutexes anyway, so might as well just > > > wrap them in "synchronized". > > "volatile" does not mean "atomic" or even "synchronized". > > It does in Java, which to me makes it more useful than C's notion of "don't > put it in a register". This is necessary in many embedded systems, even when they are single threaded and even some operating system applications. For example, it is common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate the register and is the only way to implement I/O on some processors. However, you can't let the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update the copy in a CPU register without doing the store as the external hardware might not see it for a long time. Similarly, of course, these external registers can change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether. Note that this is a different issue than cache coherence. -- - Stephen Fuld e-mail address disguised to prevent spam |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Stephen Fuld | "Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7fq36$2uha$1@digitaldaemon.com... > > It does in Java, which to me makes it more useful than C's notion of > "don't put it in a register". > This is necessary in many embedded systems, even when they are single > threaded and even some operating system applications. For example, it is > common in embedded systems to have external hardware be made visible by > memory mapping the external hardware registers into the process memory > space. This makes it easy to use standard syntax to manipulate the register > and is the only way to implement I/O on some processors. However, you can't > let the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update the copy > in a CPU register without doing the store as the external hardware might not > see it for a long time. Similarly, of course, these external registers can > change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse yet, even be "optimized" away altogether. Note that this is a different issue than cache coherence. I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how many reads are done to an arbitrary expression in order to implement it, for example: j = i++; How many times is i read? Once or twice? mov eax, i inc i mov j, eax or: mov eax, i mov j, eax inc eax mov i, eax These ambiguities to me mean that if you need precise control over memory read and write cycles, the appropriate thing to use is the inline assembler. Volatile may happen to work, but to my mind is unreliable and may change behavior from compiler to compiler. BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of. |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Jim Starkey | "Jim Starkey" <jas@netfrastructure.com> wrote in message news:a7fk2p$20rj$1@digitaldaemon.com... > >They'd have to be implemented with mutexes anyway, so might as well just wrap them in "synchronized". Note: the X86 CPU doesn't guarantee that writing to memory will be atomic if the item crosses a 32 bit word > boundary, > >which can happen writing doubles, longs, or even misaligned ints. > No, it neither necessary nor desirable to use mutexes. Yes, there are > restrictions on the interlocked instructions, but since volatile is > implemented/ > enforced by the compiler, this should be acceptable. The compiler's > responsibility should be to either implement an operation atomically > or generate a diagnostic explaining why it can't. Writes to bytes and aligned words/dwords are done atomically by the CPU, misaligned data and multiword data is not. > An example of something that can be cheaply handled by enhanced volatile > is use counts by objects shared across threads. An atomic interlocked > decrement implemented with "lock xsub decl" does the trick correctly > with no more cost than an extra bus cycle, where a mutex requires an > OS call. The ratio of costs are probably three orders of magnitude or > more. Synchronizing mutexes do not require an os call most of the time, although they still are slower than a simple lock. None of the modern java vm's do an os call for each synchronize. > >I agree that the C definition of volatile is next to useless. > I didn't mean to imply that the C definition of volatle is next to > useless -- it > is, in fact, absolutely critical for all but the most primitive > multi-threaded code. > Even when used with mutexes volatile is necessary to warn the optimizer > off unwarranted assumptions of invariance. I'm sorry, I just don't see how. See my other post here about j=i++; and how volatile doesn't help. > If D is going to succeed, it is necessary to anticipate where computer > architures are going. Everyone, I hope, understands that memory is > cheap and plentiful, larger virtual address spaces are in easy sight, > and dirt cheap multi-processors are here. Although we're in a period > of rapidly increasing clock rates, we're also approaching physical > limits on feature size. In the not distant future it will be cheaper to > add more processors than buy/build faster ones. At that point > performance will be gated by the degree to which doubling the > number of processors doubles the speed of the system. I think you're right. > There are a hierarchy of synchronization primitives -- interlocked > instructions, shared/exclusive locks, and mutexes -- with a large > variation in cost. Interlocked instructions are almost free, mutexes > cost an arm and a leg. Forcing all synchronization to use mutexes > is an unnecessary waste of resources. In the absence of > volatile, however, it is impossible to implement finer grained > sychronization primitives. This doesn't strike me as wise.... I think your points merit further investigation, though I don't see how volatile is the answer. |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | "Walter" <walter@digitalmars.com> wrote in message news:a7ft6h$1ccq$1@digitaldaemon.com... > > "Stephen Fuld" <s.fuld.pleaseremove@att.net> wrote in message news:a7fq36$2uha$1@digitaldaemon.com... > > > It does in Java, which to me makes it more useful than C's notion of > > "don't put it in a register". > > This is necessary in many embedded systems, even when they are single > > threaded and even some operating system applications. For example, it is > > common in embedded systems to have external hardware be made visible by memory mapping the external hardware registers into the process memory space. This makes it easy to use standard syntax to manipulate the > register > > and is the only way to implement I/O on some processors. However, you > can't > > let the CPU keep the "data" in a CPU register or it won't work. For example, an update to the register has to actually go to the external register to be effective. It doesn't accomplish anything to update the > copy > > in a CPU register without doing the store as the external hardware might > not > > see it for a long time. Similarly, of course, these external registers > can > > change their contents as the state of the external hardware changes (For example, a status register showing the completion of some external operation.) You can't let the data be stay in a register as subsequent reads, in say a polling loop, wouldn't go to the actual hardware, or worse > > yet, even be "optimized" away altogether. Note that this is a different issue than cache coherence. > > I understand what you mean. It's still problematic how that actually winds up being implemented in the compiler. C doesn't really define how many reads > are done to an arbitrary expression in order to implement it, for example: > j = i++; > How many times is i read? Once or twice? > mov eax, i > inc i > mov j, eax > or: > mov eax, i > mov j, eax > inc eax > mov i, eax > These ambiguities to me mean that if you need precise control over memory > read and write cycles, the appropriate thing to use is the inline assembler. > Volatile may happen to work, but to my mind is unreliable and may change behavior from compiler to compiler. > > BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize > the surrounding code, unlike any other inline implementation I'm aware of. While I agree that you can use inline asm, and there are ways to code that could cause trouble, in practice, it works pretty well. People don't do things like post increment external registers when reading them. I know the syntax allows it, but programmers, especially embedded programmers learn pretty quickly what things to do and what not to do with the hardware they have. In practice, most uses of stuff like this is to read the whole register and test some bits or extract a field, or to create a word with the desired contents and write it in one piece to the external register. So, while volatile isn't a complete solution, it avoids having to delve into asm for the vast majority of such uses. -- - Stephen Fuld e-mail address disguised to prevent spam |
March 22, 2002 Re: Volatile | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter | > BTW, D's inline assembler is well integrated in with the compiler. The compiler can track register usage even in asm blocks, and can still optimize the surrounding code, unlike any other inline implementation I'm aware of.
You should try Visual C++ for Alpha.
It can optimize not only the surrounding code,
but inline assembly code as well.
I was truly amazed when I've noticed that.
|
Copyright © 1999-2021 by the D Language Foundation