COW vs. in-place. (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » COW vs. in-place. (page 4)

August 03, 2006

Re: COW vs. in-place.

Posted by Sean Kelly
in reply to Oskar Linde

Sean Kelly

Posted in reply to Oskar Linde

Oskar Linde wrote:
> Dave wrote:
>> Reiner Pope wrote:
>>>> Why not:
>>>>
>>>>     str = toupper(str);     // in-place
>>>>     str = toupper(str.dup); // COW
> 
> What is the advantage of redundantly assigning the result of an in-place function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as:
> 
> foreach(e; arr.reverse) { ... }
> // OOPS, arr is now reversed

I like returning the mutated value so the function call can be embedded in other code.  And arr.reverse is already a built-in mutating function, according to the spec.

> .dup followed by calling an in-place function is certainly ok, but in those cases, an ordinary functional (non-in-place) function would have been more efficient.

Why?


Sean

August 03, 2006

Re: COW vs. in-place.

Posted by Oskar Linde
in reply to Sean Kelly

Oskar Linde

Posted in reply to Sean Kelly

Sean Kelly wrote:
> Oskar Linde wrote:
>> Dave wrote:
>>> Reiner Pope wrote:
>>>>> Why not:
>>>>>
>>>>>     str = toupper(str);     // in-place
>>>>>     str = toupper(str.dup); // COW
>>
>> What is the advantage of redundantly assigning the result of an in-place function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as:
>>
>> foreach(e; arr.reverse) { ... }
>> // OOPS, arr is now reversed
> 
> I like returning the mutated value so the function call can be embedded in other code.  

I have already seen the above foreach error in others D code.
I believe it is good library design to clearly mark functions with side-effects. Giving them a void return type will prevent any mistake of the following kind (assume toupper is in-place modifying as well as returning):

func(toupper(mystring));
func(arr.reverse);

where the side effect was unintended.
could those be errors: ?

arr2 = arr1.reverse;

toupper(mystring) ~ mystring;

> And arr.reverse is already a built-in mutating function, according to the spec.

Yes. I find that unfortunate and inconsistent with how Phobos is designed. Luckily, arr.sort and arr.reverse are not callable as arr.sort() and arr.reverse(), so they really don't look like functions.

>> .dup followed by calling an in-place function is certainly ok, but in those cases, an ordinary functional (non-in-place) function would have been more efficient.
> 
> Why?

What I meant was that .dup + inplace will never be more efficient than a copying algorithm. In-place algorithms are often more complicated. If you want a copy anyway, it is more efficient to use a copying algorithm. As an example, consider stable sorting, where efficient copying algorithms are trivial.

Re: Library design

I would like to see both copying and in-place versions of algorithms where it makes sense, but only one behavior should be default. That default should be consistent throughout the standard library and preferably be recommended in an official style guide for third party libraries to follow.

I see two valid designs:

1. in-place default, copying algorithms specially named
-------------------------------------------------------

Design:
void toUpper(char[] str); // in-place
char[] toUpperCopy(char[] str); // copy

Pros:
* in-place is often more efficient and therefore default.
* many functions are imperative verbs, and as such one expects them to be modifying
* Similar to how the C++ STL is designed
Cons:
* many functions can not be expressed in-place (example: UTF-8 toUpper)

2. copying default, in-place versions specially named
-----------------------------------------------------

Design:
void toUpperInPlace(char[] str); // in-place
char[] toUpper(char[] str); // copy

Pros:
* copying is safer, and is therefore a better default
* in-place is an optimization and would stand out as such
* default is functional (no-side effects), side effects stand out
* people used to functional style programming would not find any
surprises
* all functions can be defined as copying functions
* how many popular languages are designed (Ruby, Python, php, all "functional" languages, etc...)
Cons:
* could confuse people, lead to silent errors:
toupper(str); // doesn't change str
cos(x); // doesn't change x ;)

For the record, I am in favor of number 2 and that would have biased the arguments above.

/Oskar

August 03, 2006

Re: COW vs. in-place.

Posted by Oskar Linde
in reply to Dave

Oskar Linde

Posted in reply to Dave

Dave wrote:
> 
> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.

There are at least three ways an array algorithm can operate:
- in-place
- copying
- CoW

In this case, CoW would mean a function that made a copy in all cases except when the return value would become identical to the argument and as such, is semantically very close to the copying version.

It would make more sense to have separate in-place and copying functions, and add a possible runtime CoW-flag to the copying function.

I don't think a runtime flag for CoW vs in-place does make much sense when the compile time semantics are different.

An efficient implementation of a copying algorithm would also often be quite different from an in-place version, speaking for separate functions.

/Oskar

August 03, 2006

Re: COW vs. in-place.

Posted by Dave
in reply to Oskar Linde

Dave

Posted in reply to Oskar Linde

Oskar Linde wrote:
> 
> 1. in-place default, copying algorithms specially named
> -------------------------------------------------------
> 
> Design:
> void toUpper(char[] str); // in-place
> char[] toUpperCopy(char[] str); // copy
> 
> Pros:
> * in-place is often more efficient and therefore default.
> * many functions are imperative verbs, and as such one expects them to be modifying
> * Similar to how the C++ STL is designed
> Cons:
> * many functions can not be expressed in-place (example: UTF-8 toUpper)
> 

Hmmm - Is the current implementation of std.string.toupper wrong then?

(If you removed the if(!changed) {...} blocks [where the CoW is milked] you would effectively have an in-place implementation).

> 
> 2. copying default, in-place versions specially named
> -----------------------------------------------------
> 
> Design:
> void toUpperInPlace(char[] str); // in-place
> char[] toUpper(char[] str); // copy
> 
> Pros:
> * copying is safer, and is therefore a better default

Only if the coder expects that is the default, *and* they most often need the original data intact later in the program.

And that safety is not much of an advantage when your code is three-legged dog slow and eats up resources that could be used by other processes :) Walking to work may be safer than going 70 MPH on the freeway, but it would take me a week and I'd starve.

> * in-place is an optimization and would stand out as such

It's only considered an 'optimization' right now because it's different from the default (CoW).

> * default is functional (no-side effects), side effects stand out
> * people used to functional style programming would not find any
> surprises
> * all functions can be defined as copying functions
> * how many popular languages are designed (Ruby, Python, php, all "functional" languages, etc...)

Yes, but all of these are languages where performance is not an imperative (excepting some of the functional languages perhaps). Plus think of all the time and effort that have been spent on GC's because of this design choice :)

> Cons:
> * could confuse people, lead to silent errors:
> toupper(str); // doesn't change str
> cos(x); // doesn't change x ;)
> 
> For the record, I am in favor of number 2 and that would have biased the arguments above.

Likewise I'm in favor of #1 ;)

> 
> /Oskar

August 03, 2006

Re: COW vs. in-place.

Posted by renox
in reply to Dave

renox

Posted in reply to Dave

Dave wrote:
> 
> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
> 
> For example, toupper and tolower?
> 
> How many times have we seen something like this:
> 
> str = toupper(str); // or equivalent in another language.

In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a.

Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed..

functionXIP (eXecute In Place), functionWSD (With Side Effect)?
Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object!

In the absence of proper naming termination, an optionnal parameter could be used yes.

Regards,
Renaud Hebert

> 
> Thanks,
> 
> - Dave

August 03, 2006

Re: COW vs. in-place.

Posted by Reiner Pope
in reply to Oskar Linde

Reiner Pope

Posted in reply to Oskar Linde

Oskar Linde wrote:
> Dave wrote:
>>
>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
> 
> There are at least three ways an array algorithm can operate:
> - in-place
> - copying
> - CoW
To the caller, however, there are only two situations (in an ideal world with adequate const protection*):
 - modifies my copy (in-place)
 - doesn't modify my copy

As long as the function sticks to what it promises, then it should be free to implement it in the fastest/easiest way possible.

*I know that there is a difference at the moment: with CoW, you have to be careful about modifying the returned value, because it might also be your original, in which case you would be modifying both. However, this is where const protection helps, especially the runtime flag included in rocheck.

> It would make more sense to have separate in-place and copying functions, and add a possible runtime CoW-flag to the copying function.
> 
When would ever want the copying function instead of the CoW function? At most times, the overhead from keeping track of CoW is generally minimal, but in the situations where CoW requires no copying, it gets a huge advantage. The only situation where choosing copying makes sense is if you have determined that the CoW is too much. In that case, however, you probably wouldn't want to send the flag at runtime, but change it at compile time, I would say.

> I don't think a runtime flag for CoW vs in-place does make much sense when the compile time semantics are different.
> 
> An efficient implementation of a copying algorithm would also often be quite different from an in-place version, speaking for separate functions.
There's a simple solution to this:

// If the implementations for in-place and copying are substantially different, then wrap them like this
rocheck T[] sort(rocheck T[] array)
{
    if (array.isMutable())
        return inPlaceSort(array.ensureWritable());
    else
        return copyingSort(array);
}

// If there is no real difference, put them together in the one function
rocheck dchar[] toupper(rocheck dchar[] array)
{
    // Do some stuff and call ensureWritable() when required, which manages whether copying is necessary behind the scenes
}

The point behind the runtime flag is that the required checking can be made to be low overhead, with O(1) cost, whereas unnecessary copying has O(n) cost.

Cheers,

Reiner

August 03, 2006

Re: COW vs. in-place.

Posted by Dave
in reply to Oskar Linde

Dave

Posted in reply to Oskar Linde

Oskar Linde wrote:
> Dave wrote:
>> Reiner Pope wrote:
>>>> Why not:
>>>>
>>>>     str = toupper(str);     // in-place
>>>>     str = toupper(str.dup); // COW
>
> What is the advantage of redundantly assigning the result of an in-place

No advantage - the poster was just using the example from the OP. And what the OP example was showing is that the way it is now (CoW), the coder (often) ends-up assigning the results back to the original string reference, in which case the .dup inside toupper is a total waste.

    writefln(toupper(str));             // in-place

    char[] st2 = cast(char[])file.read("somedata");
    writefln("Uppercase string: ", toupper(st2.dup)); // dup only if needed
    writefln("Original string:  ", st2);

> function to itself? In my opinion, all in-place functions should have a void return type to avoid common mistakes such as:
>

    writefln(toupper(str));             // function chain

Many of C's string functions do this too.

> foreach(e; arr.reverse) { ... }
> // OOPS, arr is now reversed
>
> .dup followed by calling an in-place function is certainly ok, but in those cases, an ordinary functional (non-in-place) function would have been more efficient.
>

If the programmer needs to keep a copy of the original, the way toupper/tolower/etc is done now is more efficient only in the case where the data was not modified.

My argument is that most often when data is modified at some point in a program, it is because the rest of the program needs the modified version and not a copy of the original (so defensive .dups won't be done anyhow).

>>
>> I think CoW for arrays was a mistake -- it is most often unnecessary, will cause D to repeat many of Java's performance woes for the average user, and as I mentioned is inconsistent as well. It's a lose-lose-lose.
>
> Consider the following (just made up) case insensitive multi-file word count application:
>
> import std.stdio;
> import std.file;
> import std.string;
>
> void main(char[][] args) {
>         int[char[]] wc;
>         foreach(filename; args[1..$]) {
>                 char[] data = cast(char[]) read(filename);
>                 foreach(word; data.split())
>                         wc[tolower(word)]++;
>         }
>         writefln("num words: ",wc.length);
> }
>
> If you ran this program on the full collection of 18000 Gutenberg books, you would inevitably run out of memory. Why would you do that when a standard English dictionary only occupies a couple of megabytes?
>
> Without knowing the intricate details of D and Phobos, I bet you would have no way of knowing that you got killed by the cow. :)
>

Exactly my point and great example. It's that kind of stuff that is really tough on a newbie trying to get the most out of a high-performance language.

IMHO, it's not too big of a leap for a beginner to suspect that data will be modified when they pass a byref argument into a function like toupper. If 'in-place' is clearly documented then I don't see a problem.

- Dave

> /Oskar

August 03, 2006

Re: COW vs. in-place.

Posted by Kirk McDonald
in reply to renox

Kirk McDonald

Posted in reply to renox

renox wrote:
> Dave wrote:
> 
>>
>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>>
>> For example, toupper and tolower?
>>
>> How many times have we seen something like this:
>>
>> str = toupper(str); // or equivalent in another language.
> 
> 
> In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a.
> 
> Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed..
> 
> functionXIP (eXecute In Place), functionWSD (With Side Effect)?
> Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object!
> 
> In the absence of proper naming termination, an optionnal parameter could be used yes.
> 

What about:

void   toupper(char[] s);  // Modifies s in-place
char[] asupper(char[] s);  // COW function

Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

August 03, 2006

Re: COW vs. in-place.

Posted by Oskar Linde
in reply to Kirk McDonald

Oskar Linde

Posted in reply to Kirk McDonald

Kirk McDonald wrote:
> renox wrote:
>> Dave wrote:
>>
>>>
>>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>>>
>>> For example, toupper and tolower?
>>>
>>> How many times have we seen something like this:
>>>
>>> str = toupper(str); // or equivalent in another language.
>>
>>
>> In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a.
>>
>> Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed..
>>
>> functionXIP (eXecute In Place), functionWSD (With Side Effect)?
>> Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object!
>>
>> In the absence of proper naming termination, an optionnal parameter could be used yes.
>>
> 
> What about:
> 
> void   toupper(char[] s);  // Modifies s in-place
> char[] asupper(char[] s);  // COW function
> 
> Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that.

It doesn't really apply to functions that are verbs, like capitalize, sort and map.

For those one option is: capitalized, sorted and mapped for COW versions.

/Oskar

August 03, 2006

Re: COW vs. in-place.

Posted by Tom S
in reply to Oskar Linde

Tom S

Posted in reply to Oskar Linde

Oskar Linde wrote:
> Kirk McDonald wrote:
>> renox wrote:
>>> Dave wrote:
>>>
>>>>
>>>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>>>>
>>>> For example, toupper and tolower?
>>>>
>>>> How many times have we seen something like this:
>>>>
>>>> str = toupper(str); // or equivalent in another language.
>>>
>>>
>>> In ruby, they have this nice convention that a.function() leaves a unchanged and a.function!() modifies a.
>>>
>>> Something like this would be nice, the hard part is choosing the correct naming convention so that it is followed..
>>>
>>> functionXIP (eXecute In Place), functionWSD (With Side Effect)?
>>> Sigh, hard to achieve something as simple and elegant as '!' : caution this function modifies the object!
>>>
>>> In the absence of proper naming termination, an optionnal parameter could be used yes.
>>>
>>
>> What about:
>>
>> void   toupper(char[] s);  // Modifies s in-place
>> char[] asupper(char[] s);  // COW function
>>
>> Of course, this convention would only apply to functions named "tosomething", but I bet most/all of the functions for which an "in-place" operation makes sense are named that.
> 
> It doesn't really apply to functions that are verbs, like capitalize, sort and map.
> 
> For those one option is: capitalized, sorted and mapped for COW versions.

I know we aren't supposed to like pointers, but it could also work the following way:

void   toupper(char[]* s);  // modifies *s in-place
char[] toupper(char[] s);   // moo

then by writing:

toupper(&foo);

you'd make it pretty clear that foo is to be modified. Internally, the in-place version could immediately call sth like
void toupper_inPlace(inout char[] s);


--
Tomasz Stachowiak

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation