| Thread overview | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
January 30, 2008 Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Really, why ? You can't assign a dynamically created data to a variable of type string. You can't pass a dynamically created data into a function receiving string ! If you wanted efficiency, if you wanted to guarantee that anybody receiving string can receive a char literal without duplication,---make it const then. Invariant casts to const(char)[] implicitly, so as char[]. And even then, I'd prefer plain old char[] for string and foo(in string s) for a function that guarantees not to change it. Right now I'm having trouble calling std.file.listdir() just because it receives string. P.S. Many thought mus be put into choosing a return type for a library function. Because if it returns a unique copy of data it must be char[] so that i'm free to modify it. The const(char)[] or const(char[]) must only be used if the function is not sure if the returned result is unique. SnakE | ||||
January 30, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov wrote:
> Really, why ? You can't assign a dynamically created data to a variable of type string. You can't pass a dynamically created data into a function receiving string ! If you wanted efficiency, if you wanted to guarantee that anybody receiving string can receive a char literal without duplication,---make it const then. Invariant casts to const(char)[] implicitly, so as char[]. And even then, I'd prefer plain old char[] for string and foo(in string s) for a function that guarantees not to change it.
I think the hope is that 'string' will place us in a position to get functional programming-type optimizations "for free" once the language and compiler are in a position to provide them. Such optimizations typically aren't possible with "const char[]" because the compiler must assume that the string data could change unexpectedly. In essence, I believe that 'invariant' is intended for optimization while 'const' is intended for documenting/enforcing API contracts.
Sean
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly Wrote:
> Sergey Gromov wrote:
> > Really, why ? You can't assign a dynamically created data to a variable of type string. You can't pass a dynamically created data into a function receiving string ! If you wanted efficiency, if you wanted to guarantee that anybody receiving string can receive a char literal without duplication,---make it const then. Invariant casts to const(char)[] implicitly, so as char[]. And even then, I'd prefer plain old char[] for string and foo(in string s) for a function that guarantees not to change it.
>
> I think the hope is that 'string' will place us in a position to get functional programming-type optimizations "for free" once the language and compiler are in a position to provide them. Such optimizations typically aren't possible with "const char[]" because the compiler must assume that the string data could change unexpectedly. In essence, I believe that 'invariant' is intended for optimization while 'const' is intended for documenting/enforcing API contracts.
I don't quite get it. If 'const' is "for documenting/enforcing API
contracts", then "string[] listdir(string)" is an API bug. If it were
"char[][] listdir(in char[])", then the compiler wouldn't be able
to "get functional programming-type optimizations" within
the 'listdir()'. Overall, the use of invariant 'string' is limited
to unmodified string literals and their substrings, because for
other data the behaviour is undefined, by specification.
SnakE
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | > Overall, the use of invariant 'string' is limited
> to unmodified string literals and their substrings, because for
> other data the behaviour is undefined, by specification.
You need to duplicate your mutable strings:
char[] mutablestring = "hello".dup();
string invariantstring = mutablestring.idup();
Where I think the second line is a shorthand for
string invariantstring = cast(string) mutablestring.dup();
and the cast to invariant is valid since you guarantee to give no one else access to the newly dupped data.
Christian Kamm
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | On Jan 30, 2008 11:03 PM, Sergey Gromov <snake.scaly@gmail.com> wrote: > You can't assign a dynamically created data to a variable of type string. Correct, and this is a good default. There are, of course, times when the default is not what you want. Others have suggested "idup" as the solution, but that means an extra copying pass. Forturnately, however, there is a solution which does /exactly/ what you want, with no superflous copying. It is this: import std.contracts; char[] buffer = whatever; string s = assumeUnique(buffer); > You can't pass a dynamically created data into a function receiving string! Again, correct. This is also a good thing. It allows copy-on-write to work correctly. For example, consider a function which lowercases a string. Since string is invariant, that means that when lowercasing something that is /already/ lowercase, the function is free to return the original string. If string were merely const (as opposed to invariant), it would have to make a copy every time. Again, assumeUnique is your friend. (But don't lie to the compiler or things will go badly wrong!) > P.S. Many thought mus be put into choosing a return type for a library function. Because if it returns a unique copy of data it must be char[] so that i'm free to modify it. Well, consider again the example of lowercasing a string to see why that is not so. If I return the original string (not a copy of it), then you are /not/ free to modify it, because there might be other pointers to that data. So you must first copy it (using dup) and then you can modify the copy. This means that the copy need be done /only when it is required/, instead of every single time you call the function - so yes, it is an improvement in efficiency. | |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote:
> On Jan 30, 2008 11:03 PM, Sergey Gromov <snake.scaly@gmail.com> wrote:
>
>> P.S. Many thought mus be put into choosing a return type for a library function. Because if it returns a unique copy of data it must be char[] so that i'm free to modify it.
>
> Well, consider again the example of lowercasing a string to see why that is not so. If I return the original string (not a copy of it), then you are /not/ free to modify it, because there might be other pointers to that data. So you must first copy it (using dup) and then you can modify the copy. This means that the copy need be done /only when it is required/, instead of every single time you call the function - so yes, it is an improvement in efficiency.
I'd say that it is an improvement in safety rather than efficiency because the model assumes the string may be shared and thus enforces copy on write. But consider something like this:
char[] data = cast(char[]) read( "myfile.txt" );
char[][] lines = splitlines( data );
foreach( line; lines )
{
writefln( tolower( line.idup ) );
}
In this routine, the programmer knows he is the sole owner of the data and simply wants to print the contents of a file in lower case line-by-line. And to do so the contents of data must be duplicated, which causes GC churn and may slow the app considerably. I suppose one possible fix for this sample app would be to convert the entire file to lowercase in one shot, thus incurring only one copy, but in a real app it may not be apparent or even possible to modify the algorithm in such a way.
Sean
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On 1/31/08, Sean Kelly <sean@f4.ca> wrote:
> And to do so the contents of data must be duplicated,
The problem there is the idup. Replace it with
foreach( line; lines )
{
writefln( tolower( assumeUnique(line)) );
}
and the duplication goes away.
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote:
> On 1/31/08, Sean Kelly <sean@f4.ca> wrote:
>> And to do so the contents of data must be duplicated,
>
> The problem there is the idup. Replace it with
>
> foreach( line; lines )
> {
> writefln( tolower( assumeUnique(line)) );
> }
>
> and the duplication goes away.
Seems that assumeUnique is going to be so common, it deserves a shorter name.
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Don Clugston | On 1/31/08, Don Clugston <dac@nospam.com.au> wrote:
> Seems that assumeUnique is going to be so common, it deserves a shorter name.
Oddly enough,
cast(string)s
is shorter than
assumeUnique(s)
and achieves the same effect. :) I think someone somewhere must have decided that explicit casts are to be avoided. That said, assumeUnique works for things that aren't strings, too. It's basically equivalent to cast(invariant).
assumeUnique does have a side effect though, which is that its parameter must be an lvalue, which assumeUnique nulls. So you couldn't do
string data = assumeUnique( read( "myfile.txt" ));
even if you wanted to. The "officially correct" way would be
char[] temp = cast(char[]) read( "myfile.txt" );
string data = assumeUnique(temp);
/* temp is now null */
But that's way too much typing for me, so I'd just write:
string data = cast(string) read("myfile.txt");
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Well thanks for explanation, all this really makes sense. Though there are many benefits and drawbacks for different approaches which I can't keep track of. Janice Caron Wrote: > On Jan 30, 2008 11:03 PM, Sergey Gromov <snake.scaly@gmail.com> wrote: > > You can't pass a dynamically created data into a function receiving string! > > Again, correct. This is also a good thing. It allows copy-on-write to work correctly. For example, consider a function which lowercases a string. Since string is invariant, that means that when lowercasing something that is /already/ lowercase, the function is free to return the original string. Yes, this works for string tokenizers for instance. But doesn't quite for lowercasing. The already lowercase case is generally rare and requires an additional pass to check, so you only benefit from this if you optimize for memory. And if you optimize for speed, you'd want lowercasing in-place because you're obviously in the middle of processing data, and the chances that this data is immutable are little. Even if so, there's always .dup for you. And of course there's no reason in having 'string' parameters in functions like read(), listdir() etc. because their result never contains parts of the arguments. > > P.S. Many thought mus be put into choosing a return type for a library function. Because if it returns a unique copy of data it must be char[] so that i'm free to modify it. > > Well, consider again the example of lowercasing a string to see why that is not so. But I'm talking about always returning a unique copy of data. What about that listdir() ? The returned names are obviously unique because they're received from file system. If you make them invariant, and I want to uppercase them, I can't do that in-place because I'm not sure if they're actually unique. If you make them mutable but I need them as is, I still cannot count on them not to change because who knows where this data is also used. Maybe a contract should be added for standard library that if a function returns a mutable array, it guarantees that this array can be modified without side effects and therefore can be safely assumed unique. SnakE | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply