August 01, 2006
Andrei Khropov wrote:
> Dawid Ciężarkiewicz wrote:
> 
> 
>>I'd rather wait till const/immutability in D problem will be resolved. Don't
>>forget that additional "option" is runtime cost. There are some
>>propositions of const/immutability that could help providing compile time
>>information to deal with your proposition.
> 
> 
> I agree. Adding additional parameter doesn't seem to be a good idea and also
> raises the question whether the default behavior will be to copy or not and
> also introduces possibility of subtle errors when passing the flag was
> mistakenly omitted.
> 


"to CoW or not to CoW ~ that is the question ..."

"to err is human; to moo is bovine"

August 02, 2006
Dawid Ci??arkiewicz schrieb am 2006-08-01:
> Lionello Lunesu wrote:
>
>> 
>> "Dave" <Dave_member@pathlink.com> wrote in message news:ealack$bjg$1@digitaldaemon.com...
>>>
>>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>>>
>>> For example, toupper and tolower?
>>>
>>> How many times have we seen something like this:
>>>
>>> str = toupper(str); // or equivalent in another language.
>> 
>> str being an UTF-8 string, I don't think you can guarantee that it CAN be made uppercase in-place. It seems to me that it's quite possible that some uppercase UNICODE characters are larger than their lowercase versions, possibly crossing an UTF-8 byte-count border. But there are other string functions that don't have this problem.
>
> This _is_ problem.

http://www.unicode.org/reports/tr21/

from ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt
# The data supports both implementations that require simple case
# foldings (where string lengths don't change), and implementations
# that allow full case folding (where string lengths may grow).

This allows to keep the code point count constant, the UTF-8 fragment count however is a problem. Currently (5.0.0 2006-03-03, 08:22:43 GMT) there are 9 + 2 cases where the fragment count changes:

# 017F; C; 0073; # LATIN SMALL LETTER LONG S
# 023A; C; 2C65; # LATIN CAPITAL LETTER A WITH STROKE
# 023E; C; 2C66; # LATIN CAPITAL LETTER T WITH DIAGONAL STROKE
# 1FBE; C; 03B9; # GREEK PROSGEGRAMMENI
# 2126; C; 03C9; # OHM SIGN
# 212A; C; 006B; # KELVIN SIGN
# 212B; C; 00E5; # ANGSTROM SIGN
# 2C62; C; 026B; # LATIN CAPITAL LETTER L WITH MIDDLE TILDE
# 2C64; C; 027D; # LATIN CAPITAL LETTER R WITH TAIL

Only used for Turkic languages (tr, az):
# 0049; T; 0131; # LATIN CAPITAL LETTER I
# 0130; T; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE

Thomas


August 02, 2006
On Mon, 31 Jul 2006 11:18:40 -0500, Dave <Dave_member@pathlink.com> wrote:
> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>
> For example, toupper and tolower?
>
> How many times have we seen something like this:
>
> str = toupper(str); // or equivalent in another language.

I think it's the right idea, but I think it's simply a variation of the idea that the array itself needs a flag to tell functions whether they have to copy, or can modify in place. A 'readonly' flag, as mentioned here in other threads.

I'd prefer the flag was internal to the array so that my function signatures were simpler and less cluttered by things not directly related to the function.

That said, your idea can be implemented right now. The internal array flag requires Walter to agree and change D's arrays.

Regan
August 03, 2006
Dave wrote:
> 
> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
> 
> For example, toupper and tolower?
> 
> How many times have we seen something like this:
> 
> str = toupper(str); // or equivalent in another language.

Why not:

    str = toupper(str);     // in-place
    str = toupper(str.dup); // COW

or alternately:

    char[] toupper(char[] src, char[] dst = null);

where dst is an optional destination argument.


Sean
August 03, 2006
Sean Kelly wrote:
> Dave wrote:
>>
>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>>
>> For example, toupper and tolower?
>>
>> How many times have we seen something like this:
>>
>> str = toupper(str); // or equivalent in another language.
> 
> Why not:
> 
>     str = toupper(str);     // in-place
>     str = toupper(str.dup); // COW
> 

That's how I think things should be, but it might break a lot of code now <g>

> or alternately:
> 
>     char[] toupper(char[] src, char[] dst = null);
> 
> where dst is an optional destination argument.
> 
> 
> Sean
August 03, 2006
> Why not:
> 
>     str = toupper(str);     // in-place
>     str = toupper(str.dup); // COW

This is not copy on write. That is simply 'always copy', and this performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.
August 03, 2006
Reiner Pope wrote:
>> Why not:
>>
>>     str = toupper(str);     // in-place
>>     str = toupper(str.dup); // COW
> 
> This is not copy on write. That is simply 'always copy', and this 

But presumably the user would only do the dup if they didn't want to modify str, so CoW would basically go away as a design pattern.

> performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.

That's what I'm questioning ultimately. The caller knows best if the object that _they created_ should be modified or copied and they can do that best before a call to a modifying function. No matter if that happens to be the developer of another lib. function or an application programmer.

What's more, CoW for arrays is inconsistent with how other reference objects are treated (class objects are really not made for CoW - there's not even a rudimentary copy ctor provided by the language. Same with AA's, which don't have a .dup for example).

Ultimately, most data that is modified is used modified for its remaining program "lifetime", and however the original data was sourced (e.g.: reading from disk) can be replicated if needed instead of having to keep copies around.

I think CoW for arrays was a mistake -- it is most often unnecessary, will cause D to repeat many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.

- Dave
August 03, 2006
Dave wrote:
> Reiner Pope wrote:
>>> Why not:
>>>
>>>     str = toupper(str);     // in-place
>>>     str = toupper(str.dup); // COW
>>
>> This is not copy on write. That is simply 'always copy', and this 
> 
> But presumably the user would only do the dup if they didn't want to modify str, so CoW would basically go away as a design pattern.
> 
>> performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.
> 
> That's what I'm questioning ultimately. The caller knows best if the object that _they created_ should be modified or copied and they can do that best before a call to a modifying function. No matter if that happens to be the developer of another lib. function or an application programmer.
> 
> What's more, CoW for arrays is inconsistent with how other reference objects are treated (class objects are really not made for CoW - there's not even a rudimentary copy ctor provided by the language. Same with AA's, which don't have a .dup for example).
> 
> Ultimately, most data that is modified is used modified for its remaining program "lifetime", and however the original data was sourced (e.g.: reading from disk) can be replicated if needed instead of having to keep copies around.
> 
> I think CoW for arrays was a mistake -- it is most often unnecessary, will cause D to repeat many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
> 
> - Dave
While I'm not convinced that CoW is such a bad situation, I agree with you that it is not perfect. However, a proper solution would need to make use of some facts:
 - the caller knows best whether the array may be edited in-place
 - whether the string should be modified in-place is often not known at compile time.
These require the passing of a bool indicating whether it should be copied on write, or not, which is just as you suggest.

However, to support this with the nicest code, it would be best to be both compiler-checked and language-supported. Of course, this is just advertising for the rocheck type modifier I'm proposing in YACP. The benefit of language support can also mean that inlining in situations with readonlyness known at compile time may have the CoW checking optimized away.

Cheers,

Reiner
August 03, 2006
Dave wrote:
> Reiner Pope wrote:
>>> Why not:
>>>
>>>     str = toupper(str);     // in-place
>>>     str = toupper(str.dup); // COW

What is the advantage of redundantly assigning the result of an in-place function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as:

foreach(e; arr.reverse) { ... }
// OOPS, arr is now reversed

.dup followed by calling an in-place function is certainly ok, but in those cases, an ordinary functional (non-in-place) function would have been more efficient.

>> This is not copy on write. That is simply 'always copy', and this 
> 
> But presumably the user would only do the dup if they didn't want to modify str, so CoW would basically go away as a design pattern.
> 
>> performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.
> 
> That's what I'm questioning ultimately. The caller knows best if the object that _they created_ should be modified or copied and they can do that best before a call to a modifying function. No matter if that happens to be the developer of another lib. function or an application programmer.
> 
> What's more, CoW for arrays is inconsistent with how other reference objects are treated (class objects are really not made for CoW - there's not even a rudimentary copy ctor provided by the language. Same with AA's, which don't have a .dup for example).

> 
> Ultimately, most data that is modified is used modified for its remaining program "lifetime", and however the original data was sourced (e.g.: reading from disk) can be replicated if needed instead of having to keep copies around.

> 
> I think CoW for arrays was a mistake -- it is most often unnecessary, will cause D to repeat many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.

Consider the following (just made up) case insensitive multi-file word count application:

import std.stdio;
import std.file;
import std.string;

void main(char[][] args) {
        int[char[]] wc;
        foreach(filename; args[1..$]) {
                char[] data = cast(char[]) read(filename);
                foreach(word; data.split())
                        wc[tolower(word)]++;
        }
        writefln("num words: ",wc.length);
}

If you ran this program on the full collection of 18000 Gutenberg books, you would inevitably run out of memory. Why would you do that when a standard English dictionary only occupies a couple of megabytes?

Without knowing the intricate details of D and Phobos, I bet you would have no way of knowing that you got killed by the cow. :)

/Oskar
August 03, 2006
Reiner Pope wrote:
>> Why not:
>>
>>     str = toupper(str);     // in-place
>>     str = toupper(str.dup); // COW
> 
> This is not copy on write. That is simply 'always copy', and this performs worse than COW (which in turn performs worse than in-place, if in-place is possible). Walter has also said earlier that, with COW, it should be the responsibility of the writer to ensure the copy, not the caller.

To do true COW, toupper would have to test every element against its uppercase equivalent--the first diff would cause a copy to occur.  For mutating algorithms such as this, I think it makes more sense for them to always change the data in place if possible and to document them as such.


Sean