April 29, 2008
Me Here wrote:
> Just as the answer to the occasional hit-and-run death is not banning cars, so fixing unintentional aliasing in threaded applications does not lie in forcing all character arrays to be immutable.

D does not force all character arrays to be immutable. You can use mutable ones by declaring them as:

	char[]

Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. const) and invariant.
April 29, 2008
Walter Bright wrote:

>Me Here wrote:
>>Just as the answer to the occasional hit-and-run death is not banning  cars, so fixing unintentional aliasing in threaded applications does not  lie in forcing all character arrays to be immutable.
>
>D does not force all character arrays to be immutable. You can use mutable ones by declaring them as:
>
>	char[]
>
>Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. const) and invariant.

Well no, but having lhe string libraries only accept and return invariant strings it amounts to much the same thing.

I'm disappointed that's the only point from my post worthy of reaction :(

-- 

April 29, 2008
Me Here wrote:
> Janice Caron wrote:
>> Functions don't overload on return value.
> They don't? Why not? Seems like a pretty obvious step to me.

Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.


>>>  The idea that runtime obtained or derived strings can be made truly
>>>  invariant is purely theoretical.
>> But the fact that someone else might be sharing the data is not.
> By "someone else" you mean 'another thread'?

No, it could be the same thread, via another alias to the same data. Using invariant strings allows the programmer to treat them as if they were value types and being copied for every use (like ints are), except they don't need to be actually copied.

With mutable strings, one always has to be careful to keep track of who 'owns' the string, and who has references to it. When mutating the string, one must manually ensure that there are no other references to it that would be surprised by the data changing. For example, if you insert a string into a symbol table, and then later some other reference to that string changes it, it could wind up corrupting the symbol table.

The point about the main(char[][] args) and modifying those strings in-place is very valid - nothing is said about where those strings actually reside, and who else may have references to the same data, and whether you can modify them with impunity or not. You could argue "this should be better documented" and you'd be right, but if the declaration instead said main(invariant(char[])args) then I *know* that I am not allowed to change them, and whoever calls main() *knows* that those arg strings won't get changed. We can both sleep comfortably.

Invariant strings offer a guarantee that the data won't change, which clarifies the API of the functions. (Whenever I see an API function that takes a char*, say putenv(), it rarely says whether it saves a copy of the data or saves a copy of the reference. That just sucks.)


> If so, then if that is a possibility, if my code is using threads, then I, the programmer,
> will be aware of that  and will be able to take appropriate choices.
> 
> I /might/ chose to use invariance to 'protect' this particular piece of data from the problems
> of shared state concurrency--if there is any possibility that I intend to shared this particular piece of data.
> But in truth, it is very unlikely that I *will* make /that/ choice. Here's why.
> 
> What does it mean to make and hold multiple (mutated) copies of a single entity?
> 
> That is, I obtain a piece of data from somewhere and make it invariant.
> Somehow two threads obtain references to that piece of data.
> If none of them attempt to change it, then it makes no difference that it is marked invariant.
> If however, one of them is programmed to change it, then it now has a different,
> version of that entity to the other thread. But what does that mean? Who has the 'right' version?
> 
> Show me a real situation where two threads can legitimately be making disparate modifications to a single entity,
> string or otherwise, and I'll show you a programming error. Once two threads make disparate modifications to an entity,
> they are separate entities. And they should have been given copies, not references to a single copy, in the first place.
> 
> If the intent is that the share a single entity, then any legitimate modifications to that single entity should be reflected
> in the views of that single entity by both threads. And therefore subjected to locking, or STM or whatever mechanism is
> used to control that modification.
> 
> This whole thing of invariance and concurrency seems to be aimed at enabling the use of COW.

Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at its core?

> And if that is the case, and I very much hope it isn't, then let me tell you as someone who is intimately familiar with the
> one existing system that wen this route (iThreads: look'em up), that it is a total disaster,

ithreads copies the entire user data per thread. Using invariant is, of course, a way to avoid copying the data.

> The whole purpose and advantage of multi-threading, over multi-processing, is (mutable) shared state. And the elimination of
> costs of serialisation and narrow bandwidth if IPC in the forking concurrency mode. Attempting to emulate that model
> using threading gives few of its advantages, all of its disadvantages, and throws away all of the advantages of threading.
> It is a complete and utter waste of time and effort.

I can agree with that.
April 29, 2008
Me Here wrote:
> Walter Bright wrote:
> 
>> Me Here wrote:
>>> Just as the answer to the occasional hit-and-run death is not banning  cars, so fixing unintentional aliasing in threaded applications does not  lie in forcing all character arrays to be immutable.
>>
>> D does not force all character arrays to be immutable. You can use mutable ones by declaring them as:
>>
>>     char[]
>>
>> Reference types all come in 3 flavors: mutable, read-only-view-of (i.e. const) and invariant.
> 
> Well no, but having lhe string libraries only accept and return invariant strings it amounts to much the same thing.

I agreed with Janet's proposal to create a parallel set of routines that worked on mutable strings.


> I'm disappointed that's the only point from my post worthy of reaction :(

It appeared to me to be based on the assumption that D forced all character arrays to be invariant.
April 29, 2008
Robert Fraser wrote:
> The important part is new String(offset + beginIndex, endIndex - beginIndex, value) which does indeed do a "slice" of sorts (that is, it returns a string with the same char array backing it with a new offset and length). No copying of data is done.

Sun has it right. GNU Classpath has it wrong and copies the data every time.
April 29, 2008
Walter Bright wrote:

> 
> > I'm disappointed that's the only point from my post worthy of reaction :(
> 
> It appeared to me to be based on the assumption that D forced all character arrays to be invariant.

Well no. It also went on to counter the idea that we're all going to  come around to your way of thinking on this in short order.
And to attempt to dispell the idea that the provision of inmutable strings, without doing the same for all the other datatypes, is going to fix anthing major.

The exact same problems you describe for character arrays, exists for int arrays and unit arrays and....hashes of every flavour. Fixing one, if fixing them is what this does, without also fixing all the others, just moves the goal posts (a little).

If a piece of code needs to know that the subject of a reference (string, int array, hash, whatever), isn't going to change, it is (and should be) *its responsibility* to ensure that--by taking a private copy.

Burdening all code with the costs of immutability just in case someone is vulnerable to its mutation, *and* is too lazy to take a copy,
seems like making everyne wear condoms in case someone might have sex. And doing for just one type of array when they all suffer
from the same problem, doesn't seem liely to address the problems of unwanted pregnancies.

> I agreed with Janet's proposal to create a parallel set of routines that worked on mutable strings.

Sure. Sometime soon we will have a mutable string capable library again, and then we'll see how beneficial immutable strings really are on the basis of how many people make use of them.

But that doesn't address the issue of the salience of the reasoning for having them in the first place. Or the costs of using them in terms of stack fragmentation, additional GC runs, destruction of cache coherency, etc. etc. etc.
-- 

April 29, 2008
Me Here wrote:
> If a piece of code needs to know that the subject of a reference
> (string, int array, hash, whatever), isn't going to change, it is
> (and should be) *its responsibility* to ensure that--by taking a
> private copy.

There are two ways of doing it. One is COW, where those who make the change make the copy. The other way doesn't have a name, but it's making a copy "just in case" someone else might mutate it. I think you're proposing the latter. Invariant strings is a way of enforcing COW, rather than relying on documentation.

There's no doubt you can make JIC work successfully. I've used it myself for decades. But I always find myself expending effort trying to optimize away those copies, and so find it more productive to go the other way and use COW.

While I am comfortable using COW with mutable strings, the many many discussions of it in this forum made it clear that most would like to have some compiler help with it. Invariant strings fit the bill nicely.
April 29, 2008
Walter Bright wrote:

>There are two ways of doing it. One is COW, where those who make the change make the copy. The other way doesn't have a name, but it's making a copy "just in case" someone else might mutate it. I think you're proposing the latter. Invariant strings is a way of enforcing COW, rather than relying on documentation.
>
>There's no doubt you can make JIC work successfully. I've used it myself for decades. But I always find myself expending effort trying to optimize away those copies, and so find it more productive to go the other way and use COW.
>
>While I am comfortable using COW with mutable strings, the many many discussions of it in this forum made it clear that most would like to have some compiler help with it. Invariant strings fit the bill nicely.

Okay Walter,

This will be my last word on the subject. When I posted the headpost of this thread, I had no idea what I was getting into.
I've since taken the time to catch up on some of the history, along with that of the Phobos/Tango debate. See below.

As I see it, both mechanisms are "just in case". The difference is that with invariants and COW, everyone who /doesn't/ need immutability has
to copy so that the one person who does need it, if they indeed exist at all which we have no way of knowing, doesn't have to copy.

The other way, the one person who knows they need immutability has to copy, and everyone else simply ignores the issue.

If you're given a reference and you need it not to change, take a copy and hide it away. Then it cannot.
If you're given a reference and you don't care if it changes, (or you want to be apprised of any changes), use it, Keep it or throw it away.

Expecting everyone else to take extra precuations, always, "just in case", so that you don't have to take precautions even when you know
you need to, seems the height of selfishness.

STM (from elsewhere)

>>This whole thing of invariance and concurrency seems to be aimed at
>>enabling the use of COW.

>Wouldn't that be more of a copy-swap thing? And isn't STM copy-swap at
>its core?

I'm not sure that I follow the question in context, or the meaning of "copy-swap".

STM is an alternative to locking for concurrency control. Essentially, each reader of known (marked) shared state gets a copy of the state. And an internal copy is made.
If that reader later attempts to write back to the shared state, it's current value is compared against the internal copy taken when read,
If they are disparate, the code that is attempting to write gets rolled back to the read point and is given the updated value (and another internal copy is taken)
Lather, rinse, repeat until the copy and current values are the same, then commit the change and continue.

Fairly expensive, and only works for code that can be rolled back (ie. referentially transparent code).
Useless for anything that interacts with the outside world. Eg. writes to the screen, or a file, or the file system,
or reads from a non-rewindable source like a port or socket or the terminal.
Efficient if you live in a referentially transparent world--all data exists at compile time; no interaction with the outside world.
Next to useless otherwise. You still need locking or some other mechanism to deal with external state.

If that describes copy-swap then yes. Else no :)

Phobos vs. Tango

I definitely don't want the dead weight of pointless OO wrappers or deeply nested hierarchies. Nor the "everything must be OO" philosophy.

Once I regain access to std.string for my char[]s, (and a simple, expectation conformant rand() function :), I'll be happy.

Till then, I'll get outta yer hair and go back to trying to process my 140GB of data using D1.
(
Which is a shame because I really like some of the language changes for D2. The extension to foreach for processing files looks cool.
I'd also vote for the convergence of for/foreach if that was possible without moving away from a context-free grammar,
I haven't had occasion to explore the lazyness facilties yet, but they sound cool.
Ditto the templating.
)

Despite our difference on the issue above, please add my goodwill and paudits to your trophy box for your vision and provision of D.

Cheers, b.

-- 

April 29, 2008
"Bill Baxter" <dnewsgroup@billbaxter.com> wrote in message news:fv3612$sgu$1@digitalmars.com...
> That stuff like this compiles and seems to work is why we really need to make at least one alternative version of cast.  One would be for relative safe run-of-the-mill casts, like casting float to int, or casting Object to some class (and checking for null),  and the other category would be for dangerous big red flags kind of things like the above.  Using the run-of-the-mill cast in the above situation would not be allowed.

That request has been on the "unofficial wish list" since the beginning.. And I still agree with it.

Maybe cast() should be parsed as a template. Then, the compiler should require more "!"s as the risc increases:

SomeClass sc = cast(SomeClass)some_obj;   //OK
int i = cast!(int)some_float;    //might not fit
SomeClass sc = cast!!(SomeClass)void_ptr;  //unsafe
char[] mutstring = cast!!!!!!!!(char[])toUpper("...");  //wtf are you doing!

L.


April 29, 2008
"Walter Bright" <newshound1@digitalmars.com> wrote in message news:48169E90.6050700@digitalmars.com...
> Me Here wrote:
>> Janice Caron wrote:
>>> Functions don't overload on return value.
>> They don't? Why not? Seems like a pretty obvious step to me.
>
> Type inference in D is done "bottom up". Doing overloading based on function return type is "top down". Trying to get both schemes to coexist is a hard problem.

But a function's result can be overloaded using "out", so why can't it be overloaded using the return value?

Can't the compiler treat a return value as an implicit out argument?

L.