Thread overview
Strings in D
Oct 29, 2003
Matthias Becker
Oct 29, 2003
Lars Ivar Igesund
Oct 29, 2003
Matthew Wilson
Oct 29, 2003
Ilya Minkov
October 29, 2003
on www.digitalmars.com/d/cppstrings.html you can read the folowing

(quote)
C++ strings, as implemented by STLport, are by value and are 0-terminated. [The
latter is an implementation choice, but STLport seems to be the most popular
implementation.] This, coupled with no garbage collection, has some
consequences. First of all, any string created must make its own copy of the
string data. The 'owner' of the string data must be kept track of, because when
the owner is deleted all references become invalid. If one tries to avoid the
dangling reference problem by treating strings as value types, there will be a
lot of overhead of memory allocation, data copying, and memory deallocation.
Next, the 0-termination implies that strings cannot refer to other strings.
String data in the data segment, stack, etc., cannot be referred to.
D strings are reference types, and the memory is garbage collected. This means
that only references need to be copied, not the string data. D strings can refer
to data in the static data segment, data on the stack, data inside other
strings, objects, file buffers, etc. There's no need to keep track of the
'owner' of the string data.

The obvious question is if multiple D strings refer to the same string data, what happens if the data is modified? All the references will now point to the modified data. This can have its own consequences, which can be avoided if the copy-on-write convention is followed. All copy-on-write is is that if a string is written to, an actual copy of the string data is made first.

The result of D strings being reference only and garbage collected is that code
that does a lot of string manipulating, such as an lzw compressor, can be a lot
more efficient in terms of both memory consumption and speed.
(/quote)


Sorry, but this text is a bit stupid. It seems like you assume C++-coders to be stupid. C++ knows references. If you pass a string to a function you pass it by reference of course.

void foo (const std::string & the_string)
{ ... }

The problem with garbagecollection is solved by smart-pointers. the most common ones are boost::shared_ptr. And I know no good C++-coder that doesn't use boost (www.boost.org), so please compare D with C++ + boost, because everything else is not pragmatic.

And about your copy on write "optimization": read the folowing (it's only the third part. You find the other articles on the same site) http://www.gotw.ca/gotw/045.htm


October 29, 2003
"Matthias Becker" <Matthias_member@pathlink.com> wrote in message news:bnobm4$542>

> The problem with garbagecollection is solved by smart-pointers. the most
common
> ones are boost::shared_ptr. And I know no good C++-coder that doesn't use
boost
> (www.boost.org), so please compare D with C++ + boost, because everything
else
> is not pragmatic.

I know several, and they are of the best.

Lars Ivar Igesund


October 29, 2003
> And I know no good C++-coder that doesn't use boost

Are you kidding? Are we in a world where there's only one way to do things? Isn't that Java?


-- 
Matthew Wilson

STLSoft moderator and C++ monomaniac       (http://www.stlsoft.org)
Contributing editor, C/C++ Users Journal
(www.synesis.com.au/articles.html#columns)

"But if less is more, think how much more more will be!" -- Dr Frazier Crane

----------------------------------------------------------------------------
---



October 29, 2003
Matthias Becker wrote:

> Sorry, but this text is a bit stupid. It seems like you assume C++-coders to be
> stupid. C++ knows references. If you pass a string to a function you pass it by
> reference of course.

True, since it is isually evident whether you modify Strings or not. Those cases where it is not certain, are really not worth the worry.

It's almost the same case in D, except that here you can decide dynamically whether you want to copy a string or not.

> void foo (const std::string & the_string)
> { ... }
> 
> The problem with garbagecollection is solved by smart-pointers. the most common
> ones are boost::shared_ptr. And I know no good C++-coder that doesn't use boost
> (www.boost.org), so please compare D with C++ + boost, because everything else
> is not pragmatic.

Some C++ programmers rely more on ref-counted smart pointers, others rely on global garbage collection - depending on prior experience and the project at hand. Though the current D GC is not better than than Boehm's C++ GC, but it has a potential to become up to 2 orders of magnitude faster and thus less obtrusive.

> And about your copy on write "optimization": read the folowing (it's only the
> third part. You find the other articles on the same site)
> http://www.gotw.ca/gotw/045.htm

This article is about "smart" string implementations, which *force* COW on strings. They check the count on each operation, hence count acess must be atomic. However, in D strings are stupid garbage-collected slices. They don't really use COW. It is only a convention, that all libraries return a copy of the string if they modify it, instead of changing the existing one. If the old one is not used any longer, it will eventually be collected by a GC. No count is ever maitained or checked, and thus there is no interference. It is equivalent to smart use of copying String implementation in C++, and only marginally slower because of the use of GC, which is unavoidable because D doesn't support Scope-based guaranteed destruction like in C++.

-eye