COW vs. in-place. (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » COW vs. in-place. (page 2)

July 31, 2006

Re: COW vs. in-place.

Posted by Kirk McDonald
in reply to Derek

Kirk McDonald

Posted in reply to Derek

Derek wrote:
> On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
> 
> 
>>Not a bad idea... The main prob. would be that there would be a lot of duplication of code.
> 
> 
> void toUpper_inplace(char[] x)
> {
>  . . .
> }
> 
> char[] toUpper(char[] x)
> {
>    char[] y = x.dup;
>    toUpper_inplace(y);
>    return y;
> }
> 

I've got one better. Say we have a whole bunch of inplace string functions, like the one above and this one:

void toLower_inplace(char[] x) {
    // ...
}

and others. Then we can:

char[] cow_func(alias fn)(char[] x) {
    char[] y = x.dup;
    fn(y);
    return y;
}

alias cow_func!(toUpper_inplace) toUpper;
alias cow_func!(toLower_inplace) toLower;

Etc. Obviously, you'd have to provide a different template for each function footprint, but the string library has a lot of repeated footprints.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

July 31, 2006

Re: COW vs. in-place.

Posted by Dave
in reply to Kirk McDonald

Dave

Posted in reply to Kirk McDonald

Kirk McDonald wrote:
> Derek wrote:
>> On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
>>
>>
>>> Not a bad idea... The main prob. would be that there would be a lot of duplication of code.
>>
>>
>> void toUpper_inplace(char[] x)
>> {
>>  . . .
>> }
>>
>> char[] toUpper(char[] x)
>> {
>>    char[] y = x.dup;
>>    toUpper_inplace(y);
>>    return y;
>> }
>>

With this one, you're always dup'ing instead of .dup'ing only when needed (the current one is actually more efficient).

> 
> I've got one better. Say we have a whole bunch of inplace string functions, like the one above and this one:
> 
> void toLower_inplace(char[] x) {
>     // ...
> }
> 
> and others. Then we can:
> 
> char[] cow_func(alias fn)(char[] x) {
>     char[] y = x.dup;
>     fn(y);
>     return y;
> }
> 
> alias cow_func!(toUpper_inplace) toUpper;
> alias cow_func!(toLower_inplace) toLower;
> 
> Etc. Obviously, you'd have to provide a different template for each function footprint, but the string library has a lot of repeated footprints.
> 

I think to maximize code re-use you'd have to build the "COW or not to COW" logic into the "base" function. And if you did that you'd have to live with a little more function call overhead (passing a bool or small enum around) in order to avoid the defensive copying like in cow_func above.

I'm wondering - if Phobos would have been built that way (making it the 'D way' of doing things), would all the concerns about GC performance and "const" have been so acute over the last year or so (hind-sight is always closer to 20-20 of course)?

The problem w/ all the dup'ing is when you put something like this in a tight loop you get sloooowwwww code:

import std.file, std.string, std.stdio;

void main()
{
  char[][] formatted;
  char[][] text = split(cast(char[])read("largefile.txt"), ".");
  foreach(char[] sentence; text)
  {
    formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n";
  }
  //...
  foreach(char[] sentence; formatted)
  {
    writefln(sentence);
  }
}

None of those functions (except for read()) would really have to do much allocating because the input file for all intents and purposes is read-only here (it won't get implicitly modified even if COW isn't used).

- Dave

August 01, 2006

Re: COW vs. in-place.

Posted by Derek Parnell
in reply to Dave

Derek Parnell

Posted in reply to Dave

On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:

> Kirk McDonald wrote:
>> Derek wrote:
>>> On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
>>>
>>>
>>>> Not a bad idea... The main prob. would be that there would be a lot of duplication of code.
>>>
>>>
>>> void toUpper_inplace(char[] x)
>>> {
>>>  . . .
>>> }
>>>
>>> char[] toUpper(char[] x)
>>> {
>>>    char[] y = x.dup;
>>>    toUpper_inplace(y);
>>>    return y;
>>> }
>>>
> 
> With this one, you're always dup'ing instead of .dup'ing only when needed (the current one is actually more efficient).

I'm getting confused about what you are after now, sorry.

It seems that you are wanting a CoW version, an InPlace version, and a non-Destructive version of each function and let the compiler and/or the author choose the best one for the job at hand.

The example about gave the InPlace and non-destructive versoins and the current version is CoW.

...

> The problem w/ all the dup'ing is when you put something like this in a tight loop you get sloooowwwww code:

Not if the author has a choice ...

import std.file, std.string, std.stdio;

void main()
{
   char[][] formatted;
   char[][] text = split(cast(char[])read("largefile.txt"), ".");
   foreach(char[] sentence; text)
   {
     strip_IP(sentence);
     tolower_IP(sentence);
     capitalize_IP(sentence);
     formatted ~= sentence ~ ".\r\n";
   }
   //...
   foreach(char[] sentence; formatted)
   {
     writefln(sentence);
   }
}


-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocrity!"
1/08/2006 11:18:40 AM

August 01, 2006

Re: COW vs. in-place.

Posted by Dave
in reply to Derek Parnell

Dave

Posted in reply to Derek Parnell

Derek Parnell wrote:
> On Mon, 31 Jul 2006 18:01:14 -0500, Dave wrote:
> 
>> Kirk McDonald wrote:
>>> Derek wrote:
>>>> On Mon, 31 Jul 2006 16:40:54 -0500, Dave wrote:
>>>>
>>>>
>>>>> Not a bad idea... The main prob. would be that there would be a lot of duplication of code.
>>>>
>>>> void toUpper_inplace(char[] x)
>>>> {
>>>>  . . .
>>>> }
>>>>
>>>> char[] toUpper(char[] x)
>>>> {
>>>>    char[] y = x.dup;
>>>>    toUpper_inplace(y);
>>>>    return y;
>>>> }
>>>>
>> With this one, you're always dup'ing instead of .dup'ing only when needed (the current one is actually more efficient).
> 
> I'm getting confused about what you are after now, sorry. 
> 
> It seems that you are wanting a CoW version, an InPlace version, and a
> non-Destructive version of each function and let the compiler and/or the
> author choose the best one for the job at hand.
> 
> The example about gave the InPlace and non-destructive versoins and the
> current version is CoW. 
> 
> ...
> 
>> The problem w/ all the dup'ing is when you put something like this in a tight loop you get sloooowwwww code:
> 
> Not if the author has a choice ...
>  import std.file, std.string, std.stdio;
> 
> void main()
> {
>    char[][] formatted;
>    char[][] text = split(cast(char[])read("largefile.txt"), ".");
>    foreach(char[] sentence; text)
>    {
>      strip_IP(sentence);
>      tolower_IP(sentence);
>      capitalize_IP(sentence);
>      formatted ~= sentence ~ ".\r\n";
>    }
>    //...
>    foreach(char[] sentence; formatted)
>    {
>      writefln(sentence);
>    }
> }
> 
> 

Sorry, I think some of that got lost in the thread...

I'm asking if it would make sense to change the current functions so COW is optional. That way current code wouldn't be broken but we'd have the choice.

For example, the current tolower w/ the changes added (denoted by **):

//** char[] tolower(char[] s)
char[] tolower(char[] s, bool cow = true)
//**
{
    int changed;
    int i;
    char[] r = s;

    changed = 0;
    for (i = 0; i < s.length; i++)
    {
        auto c = s[i];
        if ('A' <= c && c <= 'Z')
        {
            //**if (!changed)
            if (cow && !changed)
            //**
            {   r = s.dup;
                changed = 1;
            }
            r[i] = c + (cast(char)'a' - 'A');
        }
        else if (c >= 0x7F)
        {
            foreach (size_t j, dchar dc; s[i .. length])
            {
                //**if (!changed)
                if (cow && !changed)
                //**
                {
                    if (!std.uni.isUniUpper(dc))
                        continue;

                    r = s[0 .. i + j].dup;
                    changed = 1;
                }
                dc = std.uni.toUniLower(dc);
                std.utf.encode(r, dc);
            }
            break;
        }
    }
    return r;
}

So the sample code would become:

import std.file, std.string, std.stdio;

void main()
{
  char[][] formatted;
  char[][] text = split(cast(char[])read("largefile.txt"), ".");
  foreach(char[] sentence; text)
  {
    formatted ~= capitalize(tolower(strip(sentence, false), false), false) ~ ".\r\n";
  }
  //...
  foreach(char[] sentence; formatted)
  {
    writefln(sentence);
  }
}

Then I suggested either make the cow parameter default to false, or wondered how things would have worked out if the original data owner became responsible for there own dups:

void main()
{
  char[][] formatted;
  char[] original = cast(char[])read("largefile.txt").dup; //**
  char[][] text = split(original, ".");
  foreach(char[] sentence; text)
  {
    formatted ~= capitalize(tolower(strip(sentence))) ~ ".\r\n";
  }
  //...
  foreach(char[] sentence; formatted)
  {
    writefln(sentence);
  }
  //** The 'original' (duplicated, unmodified) data is used again here
}

If everything was done inplace in Phobos, then it would become 2nd nature for the owner to dup when needed. And the user wouldn't need to rely on the hope that the library developer didn't make a mistake and forget to COW when they were supposed to.

Thanks,

- Dave

August 01, 2006

Re: COW vs. in-place.

Posted by Kirk McDonald
in reply to Dave

Kirk McDonald

Posted in reply to Dave

Dave wrote:
> Sorry, I think some of that got lost in the thread...
> 
> I'm asking if it would make sense to change the current functions so COW is optional. That way current code wouldn't be broken but we'd have the choice.
> 

Using a function parameter as you suggest is fine and all (it helps in code re-use as your example ably shows), but I find calling, e.g. islower_inplace clearer than some strange 'false' parameter at the end of the argument list. If we make the 'cow' parameter default to 'true', we might also provide a wrapper:

char[] inplace_wrap(alias fn)(char[] s) {
    return fn(s, false);
}

alias inplace_wrap!(tolower) tolower_inplace;
alias inplace_wrap!(toupper) toupper_inplace;
// &c, &c

(I like this method of function wrapping, can you tell?) Or we could just as easily default cow to 'false' and have the wrapper be 'cow_wrap' instead. (It would also be easy enough to provide both.)

> If everything was done inplace in Phobos, then it would become 2nd nature for the owner to dup when needed. And the user wouldn't need to rely on the hope that the library developer didn't make a mistake and forget to COW when they were supposed to.

I sure hope the library makes this an important, documented part of its interface.

-- 
Kirk McDonald
Pyd: Wrapping Python with D
http://dsource.org/projects/pyd/wiki

August 01, 2006

Re: COW vs. in-place.

Posted by Reiner Pope
in reply to Dave

Reiner Pope

Posted in reply to Dave

Dave wrote:
> None of the const/immutability ideas will take care of having to "copy on write"; they were all more-or-less just ways of enforcing COW so there wouldn't be mistakes.
Argh, that's what all of my proposals are about. See:
rocheck in 'YACP -- Yet Another Const Proposal' on digitalmars.D
'constness for arrays' by xs0 on digitalmars.D
'what's wrong with just a runtime-checked const'? on digitalmars.D.learn

These all explore a way to make array functions work optimally in all cases. The rocheck proposal (the most recent one) would look as follows:

rocheck char[] toupper(rocheck char[] input)
{   foreach (i, c; input)
    {   if (islower(c))
        {   char[] temp = input.ensureWritable; // ensureWritable checks whether it is mutable and copies if not
            temp[i] = chartoupper(c);
	    input = temp; // if we did indeed duplicate, then make sure we now use the duplicated one
        }
    }
    return input;
}

// Another alternative: faster, but more code
rocheck char[] toupper(rocheck char[] input)
{   foreach (i, c; input)
    {   if (islower(c))
        {   char[] temp = input.ensureWritable;
            foreach (inout c2; temp[i..$])
            {   if (islower(c2)) c2 = toupper(c2);
            }
            return temp;
        }
    }
    return input;
}

// Now look what we can do:
char[] foo = "hello".dup;
foo = toupper(foo).ensureWritable;
// Ensurewritable is a null-op here, because there is never a const reference. It's only there to please the const checking of the compiler
readonly char[] bar = baz.getName();
foo = toupper(bar).ensureWritable;
// if toupper modifies, then it will dup it (since bar is readonly). Iff not, then ensureWritable will dup it. This way, we ensure exactly one duplication, which is as required.
readonly char[] asdf = CIP1(CIP2(CIP3(bar)));
/// CIP1, 2 and 3 are rocheck functions like toupper above. If none of them modify, then no duplication takes place. If one of them does, then only one duplication takes place.

Having it integrated into the language is more powerful, because it actually works with const checking and makes the syntax cleaner. Consider how you would get the same efficiency with the last statement using the CIP enum when just modifying the library:


CIP1, CIP2 and CIP3 would all need signatures as follows:
char[] CIP1(char[] input, inout CIP cipness) {...}

It would be inout so that you can tell it about the input, and it can tell you about the output. If you don't know the ownership of the output, you will get unnecessary dups. Here is how you would emulate the last line of the rocheck sample code:

CIP temp = CIP.COW;
char[] bar; // We mustn't modify this
bar = CIP1(bar, temp); // bar *might* be modifiable inplace, but only temp knows
bar = CIP2(bar, temp);
bar = CIP3(bar, temp);
// We still don't know whether bar is the original, unmodifiable one, or not. However, temp can tell us.

This code is much more verbose than one built into the language.

Cheers,

Reiner

August 01, 2006

Re: COW vs. in-place.

Posted by Lionello Lunesu
in reply to Dave

Lionello Lunesu

Posted in reply to Dave

"Dave" <Dave_member@pathlink.com> wrote in message news:ealack$bjg$1@digitaldaemon.com...
>
> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>
> For example, toupper and tolower?
>
> How many times have we seen something like this:
>
> str = toupper(str); // or equivalent in another language.

str being an UTF-8 string, I don't think you can guarantee that it CAN be made uppercase in-place. It seems to me that it's quite possible that some uppercase UNICODE characters are larger than their lowercase versions, possibly crossing an UTF-8 byte-count border. But there are other string functions that don't have this problem.

In either case, a standard library should simply provide two functions, one in-place and the other COW. I many cases, the COW function could use the in-place one, eliminating duplicate code. For example, In my own lib I use .ToUpper() for the in-place version and .UpperCase() for the COW one.

L.

August 01, 2006

Re: COW vs. in-place.

Posted by Dawid Ciężarkiewicz
in reply to Dave

Dawid Ciężarkiewicz

Posted in reply to Dave

Dave wrote:
>> Maybe just writting new module (std.strinplace) that do what you want and then sending it to Walter/D discussion group is good . I guess with newday import improvements names could stay like they were and people interested in this speedup would statically import this module and use FQN where they want such behavior.
> 
> Not a bad idea... The main prob. would be that there would be a lot of duplication of code.

Well. IMO not so much. There are not so many essential functions operating on strings and they don't change too often.

August 01, 2006

Re: COW vs. in-place.

Posted by Dawid Ciężarkiewicz
in reply to Lionello Lunesu

Dawid Ciężarkiewicz

Posted in reply to Lionello Lunesu

Lionello Lunesu wrote:

> 
> "Dave" <Dave_member@pathlink.com> wrote in message news:ealack$bjg$1@digitaldaemon.com...
>>
>> What if selected functions in phobos were modified to take an optional parameter that specified COW or in-place? The default for each would be whatever they do now.
>>
>> For example, toupper and tolower?
>>
>> How many times have we seen something like this:
>>
>> str = toupper(str); // or equivalent in another language.
> 
> str being an UTF-8 string, I don't think you can guarantee that it CAN be made uppercase in-place. It seems to me that it's quite possible that some uppercase UNICODE characters are larger than their lowercase versions, possibly crossing an UTF-8 byte-count border. But there are other string functions that don't have this problem.

This _is_ problem.

> In either case, a standard library should simply provide two functions, one in-place and the other COW. I many cases, the COW function could use the in-place one, eliminating duplicate code. For example, In my own lib I use .ToUpper() for the in-place version and .UpperCase() for the COW one.

Well thought.

August 01, 2006

Re: COW vs. in-place.

Posted by Andrei Khropov
in reply to Dawid Ciężarkiewicz

Andrei Khropov

Posted in reply to Dawid Ciężarkiewicz

Dawid Ciężarkiewicz wrote:

> I'd rather wait till const/immutability in D problem will be resolved. Don't forget that additional "option" is runtime cost. There are some propositions of const/immutability that could help providing compile time information to deal with your proposition.

I agree. Adding additional parameter doesn't seem to be a good idea and also raises the question whether the default behavior will be to copy or not and also introduces possibility of subtle errors when passing the flag was mistakenly omitted.

-- 
AKhropov

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation