January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | Sean Kelly Wrote:
> Janice Caron wrote:
> > On Jan 30, 2008 11:03 PM, Sergey Gromov <snake.scaly@gmail.com> wrote:
> >
> >> P.S. Many thought mus be put into choosing a return type for a library function. Because if it returns a unique copy of data it must be char[] so that i'm free to modify it.
> >
> > Well, consider again the example of lowercasing a string to see why
> > that is not so. If I return the original string (not a copy of it),
> > then you are /not/ free to modify it, because there might be other
> > pointers to that data. So you must first copy it (using dup) and then
> > This means that the copy need be done /only
> > when it is required/, instead of every single time you call the
> > function - so yes, it is an improvement in efficiency.
>
> I'd say that it is an improvement in safety rather than efficiency because the model assumes the string may be shared and thus enforces copy on write. But consider something like this:
>
> char[] data = cast(char[]) read( "myfile.txt" );
> char[][] lines = splitlines( data );
>
> foreach( line; lines )
> {
> writefln( tolower( line.idup ) );
> }
>
> In this routine, the programmer knows he is the sole owner of the data and simply wants to print the contents of a file in lower case line-by-line. And to do so the contents of data must be duplicated, which causes GC churn and may slow the app considerably.
If I were doing this in C by loading an entire file in memory, I would
lowercase it in-place which is both faster and takes less memory.
D should support such efficient solutions.
SnakE
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | On 1/31/08, Sergey Gromov <snake.scaly@gmail.com> wrote:
> If I were doing this in C by loading an entire file in memory, I would
> lowercase it in-place which is both faster and takes less memory.
> D should support such efficient solutions.
It does.
char[] data = cast(char[]) read( "myfile.txt" );
inPlaceLower(data);
Now all you have to do is write the function inPlaceLower(). :)
Presumably, what you really mean is that functions like inPlaceLower() should be in Phobos. I won't argue with that, because I agree with you. But the D language does not prevent you from modifying in place - it only prevents you from modifying invariant data in place.
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | On 1/31/08, Sergey Gromov <snake.scaly@gmail.com> wrote:
> Maybe a contract should be added for standard library that if a function returns a mutable array, it guarantees that this array can be modified without side effects and therefore can be safely assumed unique.
Yes, D has a problem here, in that it has no way to express uniqueness. (That's true of other languages too, of course). If a string is immutable, then it doesn't /have/ to be unique (but if it is, it's safe to cast it to invariant), but if a string is mutable then it /must/ be. That's a problem, because one simply cannot declare
unique(char)[] array;
(...although it would be interesting to try...) So D just makes it your problem.
One problem area where this sort of thing arises is concatenation. If
you concatenate char arrays, then /regardless/ of the constancy of the
inputs, the result is guaranteed to be unique (because, by
specification, concatenation always makes a copy). If we had a
"unique" type-constructor, one could make the result type unique(T)[].
But we don't, so D makes the (arbitrary) choice of invariant(T)[].
Also, it is currently a syntax error to concatenate a char[] with an
invariant(char)[], even though both will implicitly cast to
const(char)[].
So it's not a perfect system, but it is better than what we had before, and (I would argue) better than C++. But remember that D2.x is experimental. We're here to play with it. If we don't like it, things could change. Our experiences are important here in shaping the future.
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | "Janice Caron" wrote > On 1/31/08, Sergey Gromov <snake.scaly@gmail.com> wrote: >> Maybe a contract should be added for standard library that if a function returns a mutable array, it guarantees that this array can be modified without side effects and therefore can be safely assumed unique. > > One problem area where this sort of thing arises is concatenation. If > you concatenate char arrays, then /regardless/ of the constancy of the > inputs, the result is guaranteed to be unique (because, by > specification, concatenation always makes a copy). If we had a > "unique" type-constructor, one could make the result type unique(T)[]. > But we don't, so D makes the (arbitrary) choice of invariant(T)[]. > Also, it is currently a syntax error to concatenate a char[] with an > invariant(char)[], even though both will implicitly cast to > const(char)[]. I actually filed an enhancement request for that. http://d.puremagic.com/issues/show_bug.cgi?id=1654 -Steve | |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote:
> On 1/31/08, Sean Kelly <sean@f4.ca> wrote:
>> And to do so the contents of data must be duplicated,
>
> The problem there is the idup. Replace it with
>
> foreach( line; lines )
> {
> writefln( tolower( assumeUnique(line)) );
> }
>
> and the duplication goes away.
It does? I didn't think tolower would modify a string in place.
Sean
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sergey Gromov | Sergey Gromov wrote:
> Sean Kelly Wrote:
>> Janice Caron wrote:
>>> On Jan 30, 2008 11:03 PM, Sergey Gromov <snake.scaly@gmail.com> wrote:
>>>
>>>> P.S. Many thought mus be put into choosing a return type for a library function. Because if it returns a unique copy of data it must be char[] so that i'm free to modify it.
>>> Well, consider again the example of lowercasing a string to see why
>>> that is not so. If I return the original string (not a copy of it),
>>> then you are /not/ free to modify it, because there might be other
>>> pointers to that data. So you must first copy it (using dup) and then
>>> This means that the copy need be done /only
>>> when it is required/, instead of every single time you call the
>>> function - so yes, it is an improvement in efficiency.
>> I'd say that it is an improvement in safety rather than efficiency because the model assumes the string may be shared and thus enforces copy on write. But consider something like this:
>>
>> char[] data = cast(char[]) read( "myfile.txt" );
>> char[][] lines = splitlines( data );
>>
>> foreach( line; lines )
>> {
>> writefln( tolower( line.idup ) );
>> }
>>
>> In this routine, the programmer knows he is the sole owner of the data and simply wants to print the contents of a file in lower case line-by-line. And to do so the contents of data must be duplicated, which causes GC churn and may slow the app considerably.
>
> If I were doing this in C by loading an entire file in memory, I would
> lowercase it in-place which is both faster and takes less memory.
> D should support such efficient solutions.
The D language does. It simply isn't a feature provided by Phobos.
Sean
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Sean Kelly | On 1/31/08, Sean Kelly <sean@f4.ca> wrote:
> > and the duplication goes away.
>
> It does? I didn't think tolower would modify a string in place.
OK - /one/ of the duplications goes away.
It's not possible, even in principle, to lowercase a char[] in place, because a char[] by definition is an array of UTF-8 code units, /not/ an array of characters. Lowercasing a character may result in the length of its UTF-8 sequence changing. If the length increases, you're screwed.
You can lowercase a dchar[] in place, but not a char[].
| |||
January 31, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote:
> On 1/31/08, Sean Kelly <sean@f4.ca> wrote:
>
>>> and the duplication goes away.
>> It does? I didn't think tolower would modify a string in place.
>
> OK - /one/ of the duplications goes away.
>
> It's not possible, even in principle, to lowercase a char[] in place, because a char[] by definition is an array of UTF-8 code units, /not/ an array of characters. Lowercasing a character may result in the length of its UTF-8 sequence changing. If the length increases, you're screwed.
>
> You can lowercase a dchar[] in place, but not a char[].
Ah, good point.
Sean
| |||
February 01, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron wrote: > It's not possible, even in principle, to lowercase a char[] in place, > because a char[] by definition is an array of UTF-8 code units, /not/ > an array of characters. Lowercasing a character may result in the > length of its UTF-8 sequence changing. If the length increases, you're > screwed. > > You can lowercase a dchar[] in place, but not a char[]. I'm not sure if that's true[2]. However, I *am* sure it's *not* true for uppercasing. Some code points expand to 2 or 3 codepoints when uppercased. One common case is U+00DF "ß", LATIN SMALL LETTER SHARP S, which expands to "SS" (two characters) when uppercased[1]. Another example from the Unicode standard, U+0390, GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS apparently expands to three codepoints. [1] Interestingly though, the UTF-8 (aka char[]) representation is the same length :P. [2] The relevant section[3] of the Unicode standard says "Case mappings may produce strings of different lengths than the original." but proceeds to only give examples for uppercasing. [3] Section 5.18, see http://www.unicode.org/versions/Unicode5.0.0/ch05.pdf#G21180 | |||
February 01, 2008 Re: Why string alias is invariant ? | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Janice Caron | Janice Caron Wrote: > On 1/31/08, Sergey Gromov <snake.scaly@gmail.com> wrote: > > Maybe a contract should be added for standard library that if a function returns a mutable array, it guarantees that this array can be modified without side effects and therefore can be safely assumed unique. > > Yes, D has a problem here, in that it has no way to express uniqueness. (That's true of other languages too, of course). That's not true. Linear types and their derivatives solve this problem elegantly and have been known at least since 1990. They have interesting properties. The data can be simultaneously "constant" (referentially transparent) and unique, and still destructive updates are possible in situ. The Clean language is an example of this, it has uniqueness types[1]. But I'm afraid the current implementations of D aren't really well suited for these kinds of "advanced" type systems in the near future... [1] http://www.cs.ru.nl/~clean/download/papers/1996/bare96-uniclosed.pdf > If a > string is immutable, then it doesn't /have/ to be unique (but if it > is, it's safe to cast it to invariant), but if a string is mutable > then it /must/ be. That's a problem, because one simply cannot declare > > unique(char)[] array; > > (...although it would be interesting to try...) So D just makes it your problem. > > One problem area where this sort of thing arises is concatenation. If > you concatenate char arrays, then /regardless/ of the constancy of the > inputs, the result is guaranteed to be unique (because, by > specification, concatenation always makes a copy). If we had a > "unique" type-constructor, one could make the result type unique(T)[]. > But we don't, so D makes the (arbitrary) choice of invariant(T)[]. > Also, it is currently a syntax error to concatenate a char[] with an > invariant(char)[], even though both will implicitly cast to > const(char)[]. > > So it's not a perfect system, but it is better than what we had before, and (I would argue) better than C++. But remember that D2.x is experimental. We're here to play with it. If we don't like it, things could change. Our experiences are important here in shaping the future. | |||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply