Thread overview
I just read that D uses the COW principal for Strings
Sep 10, 2003
Helium
Sep 10, 2003
Ilya Minkov
Sep 11, 2003
Philippe Mori
Sep 11, 2003
Helmut Leitner
Sep 11, 2003
Philippe Mori
Sep 11, 2003
Ilya Minkov
Sep 12, 2003
Philippe Mori
September 10, 2003
Als said in the topic D seems to use COW. Used in singel threaded applications it can realy speed up things. But in a multithreaded world that we have today it can realy slow down things.

I'm new to D, and I don't know, if it even support multithreading. If not forgert this post, I it does you should realy think about COW, because it's a speedup that isn't.


September 10, 2003
Helium wrote:
> Als said in the topic D seems to use COW. Used in singel threaded applications
> it can realy speed up things. But in a multithreaded world that we have today it
> can realy slow down things.

Why actually slow down? And how does it depend on threads? The only consern i'm aware of are old strings which are left to a garbage collector -- which happens to lots of other stuff anyway. The GC is currently somewhat spartanian to threads and could use some improvement, but it will change someday.

> I'm new to D, and I don't know, if it even support multithreading. If not
> forgert this post, I it does you should realy think about COW, because it's a
> speedup that isn't.

Sure it supports threads. See phobos pages in the spec.

Welcome to community and be sure to read further. ;)

-eye

September 11, 2003
"Ilya Minkov" <minkov@cs.tum.edu> a écrit dans le message de news:bjnk9u$2joq$1@digitaldaemon.com...
> Helium wrote:
> > Als said in the topic D seems to use COW. Used in singel threaded
applications
> > it can realy speed up things. But in a multithreaded world that we have
today it
> > can realy slow down things.
>
> Why actually slow down? And how does it depend on threads? The only consern i'm aware of are old strings which are left to a garbage collector -- which happens to lots of other stuff anyway. The GC is currently somewhat spartanian to threads and could use some improvement, but it will change someday.
>

It is known in C++ that COW string implementation are either slower
or marginally faster for typical multi-threaded applications... and it does
not worth the increased complexity and bugs...

The problem is essentially that it is hard to have a thread-safe and
efficient
string class in C++ at the same time... Even though the client need to uses
some critical sections (or mutex) for safe access, library must implement
thread-safe ref-count as this is not possible for the user to do it
(cleanly).

Even the latest STL used by Microsoft Visual C++ does not used COW
anymore for those reason and I'm sure they are not alone to have done that.


September 11, 2003

Philippe Mori wrote:
> 
> "Ilya Minkov" <minkov@cs.tum.edu> a écrit dans le message de news:bjnk9u$2joq$1@digitaldaemon.com...
> > Helium wrote:
> > > Als said in the topic D seems to use COW. Used in singel threaded
> applications
> > > it can realy speed up things. But in a multithreaded world that we have
> today it
> > > can realy slow down things.
> >
> > Why actually slow down? And how does it depend on threads? The only consern i'm aware of are old strings which are left to a garbage collector -- which happens to lots of other stuff anyway. The GC is currently somewhat spartanian to threads and could use some improvement, but it will change someday.
> >
> 
> It is known in C++ that COW string implementation are either slower
> or marginally faster for typical multi-threaded applications... and it does
> not worth the increased complexity and bugs...
>
> The problem is essentially that it is hard to have a thread-safe and
> efficient
> string class in C++ at the same time... Even though the client need to uses
> some critical sections (or mutex) for safe access, library must implement
> thread-safe ref-count as this is not possible for the user to do it
> (cleanly).
> 
> Even the latest STL used by Microsoft Visual C++ does not used COW
> anymore for those reason and I'm sure they are not alone to have done that.

D uses a different garbage collection method that is not based on reference
counting. While it may have other disadvantages it should be robust in
a multi-threaded system.

-- 
Helmut Leitner    leitner@hls.via.at
Graz, Austria   www.hls-software.com
September 11, 2003
> >
> > It is known in C++ that COW string implementation are either slower or marginally faster for typical multi-threaded applications... and it
does
> > not worth the increased complexity and bugs...
> >
> > The problem is essentially that it is hard to have a thread-safe and
> > efficient
> > string class in C++ at the same time... Even though the client need to
uses
> > some critical sections (or mutex) for safe access, library must
implement
> > thread-safe ref-count as this is not possible for the user to do it
> > (cleanly).
> >
> > Even the latest STL used by Microsoft Visual C++ does not used COW anymore for those reason and I'm sure they are not alone to have done
that.
>
> D uses a different garbage collection method that is not based on
reference
> counting. While it may have other disadvantages it should be robust in a multi-threaded system.
>

But then, does making a copy of a string make a real copy or take another reference to it..

What will happen in D with the following example?.

string a = "hello";
string b;
b = a;
a = "goodbye";

In C++ a, would be "goodbye" and b "hello" and if the last line
removed, we have one copy of the string if COW is used and
2 otherwise.

So if you want one copy of the actual text in D if you do something like above, you need COW and this as almost nothing to do with GC.

OTOH, if copy are reference to the same string, then it would be faster but you would have to ask explictly for a copy if you want one

b = a.clone();

And if we always make copy, then if would be the same as in C++ without COW.


September 11, 2003
Philippe Mori wrote:

> But then, does making a copy of a string make a real copy or take another
> reference to it..
> 
> What will happen in D with the following example?.
> 
> string a = "hello";
> string b;
> b = a;
> a = "goodbye";

Given you use char[] instead of string - because we do not have string class:

1. a is created and is a slice of a constant "hello";
2. b becomes a slice of a - they point to the same data which is constant;
3. a becomes a slice of a constant "goodbye".

Reminder: slice is a structure of a start adress and a length of an array.

There are obviously cases which will make them point to actually allocated arrays. You should then simply make sure that you overwrite nothing - and let GC pick up your leftovers.

> OTOH, if copy are reference to the same string, then it would be faster
> but you would have to ask explictly for a copy if you want one

So is it.

> b = a.clone();

I think it should read "b = a.dup;"

> And if we always make copy, then if would be the same as in C++
> without COW.

No, you don't want to always make copies, just on writes...
Another difference, that C++ can use its destruction rules instead of GC.

Sure, you can assure C++ follows a similar behaviour by using const qualifier. That's how we did this in Delphi.

Hey! This leads me to a new idea:
A function should automatically duplicate an array at the beginning, if an array is "in" qualified, and could be written to within this function. Const-ness is implicit, but it doesn't affect interfacing the functions, which is determined by the "in" qualifier, thus the conventions are not broken.

Sometimes, it may be desirable that the array is not duplicated, and writes go back to the original array. In this case, it should be qualified "inout"!

This regulation might be expanded to other things like Objects...

-eye.

September 12, 2003
> >
> > What will happen in D with the following example?.
> >
> > string a = "hello";
> > string b;
> > b = a;
> > a = "goodbye";
>
> Given you use char[] instead of string - because we do not have string class:
>
> 1. a is created and is a slice of a constant "hello";
> 2. b becomes a slice of a - they point to the same data which is constant;
> 3. a becomes a slice of a constant "goodbye".
>
> Reminder: slice is a structure of a start adress and a length of an array.
>
>

My sample was not well chosen... I'm relatively new to D and I have taken one of the simplest example I can imagine...  What would happen if a = "goodbye"; is replaced by a call to a function that modify the content of the string like:

a ~= " world";    // append at end if I remember well

Sinc e a and b where shared before that, a copy must be made at that time... and to know that they are shared, we need a reference count (since otherwise, we would need to check from GC if it used more than once which would be very slow).

Thus COW must be used and we face the same problem as in C++ where the performance will degrade a lot if we do a lot of modifications like appending one char at a time...

From what I understand, in C++ the problem comes from the fact that we must lock very often and in some implementation slow thread synchronisation is or must be used... and it is not easy to provide an implementation that works correctly simply with locked arithmetic and comparison operations...