March 06, 2009
Andrei Alexandrescu Wrote:

> Steven Schveighoffer wrote:
> > I think what Burton is saying is by annointing immutable(char)[] as the type "string," you are essentially sending a message to developers that all strings should be immutable, and all *string parameters* should be declared immutable.  What this does is force developers who want to deal in const or mutable chars have to do lots of duplication or casting, which either makes your code dog slow, or makes your code break const.
> > 
> > Evidence is how (at least in previous releases) anything in Phobos that took an argument that was a utf8 string of characters used the parameter type "string", making it very difficult to use when you don't have string types.  If you want to find a substring in a string, it makes no sense that you first have to make the argument invariant.  a substring function isn't saving a pointer to that data.
> > 
> > I think the complaint is simply that string is defined as immutable(char) [] and therefore is promoted as *the only* string type to use.  Using other forms (such as in char[] or const(char)[] or char[]) doesn't look like the argument is a string, when the word "string" is already taken to mean something else.
> > 
> > -Steve
> 
> I see. Phobos is being changed to accept in char[] instead of string wherever applicable. As far as what the default "string" ought to be, immutable(char)[] is the safest of the three so I think it should be that.

So far this discussion has no examples.  Let me toss one out as a dart board to see what folks think:

char[] s = get_some_data();  // accesses a large buffer that i don't want to copy
size_t pos = my_find(s, "blah");
s[pos] = "c";

string s2 = "another string";
size_t pos2 = my_find(s2, "blah");
another_func(s2[pos2 .. pos2+4]);

size_t my_find(string haystack, string needle)
{
  // Perform some kind of search that is read-only
}

another_func(string s) {}

---

It seems like string is the obvious parameter type to use for my_find, but it doesn't work because we're passing mutable data in.  Instead we have to use const(char)[], which is less intuitive and uglier.

Or did I miss the thrust of the argument?


March 07, 2009
Andrei Alexandrescu Wrote:

> This is a known problem for hich we will provide a solution.

If you have something which works everywhere please tell us because we've been trying to find one for a long time, but as far as I know there is no solution. The best I've ever gotten to is:

   // "A" means that the constness of the return type depends upon the constness of the argument. There are dozens of ways to specify the same thing.
   const (A) mstring match (const (A) mstring text, RE expression);

But setting aside whether that helps or hinders self-documentation, that's far from the only place at which you put mutable data through a const section that you need to modify later. What if the function were instead:

   struct REMatch
   {
      string match; /// The matched string.
      size_t offset; /// Offset within the string where the match occurs.
      string [] groups; /// Matched groups.

      this (string text);
   }

What am I going to do about this now without using templates? If you define a special syntax to make this work, then I can give you something even further which won't.
March 07, 2009
Burton Radons wrote:
> Andrei Alexandrescu Wrote:
> 
>> This is a known problem for hich we will provide a solution.
> 
> If you have something which works everywhere please tell us because
> we've been trying to find one for a long time, but as far as I know
> there is no solution. The best I've ever gotten to is:
> 
> // "A" means that the constness of the return type depends upon the
> constness of the argument. There are dozens of ways to specify the
> same thing. const (A) mstring match (const (A) mstring text, RE
> expression);
> 
> But setting aside whether that helps or hinders self-documentation,
> that's far from the only place at which you put mutable data through
> a const section that you need to modify later. What if the function
> were instead:
> 
> struct REMatch { string match; /// The matched string. size_t offset;
> /// Offset within the string where the match occurs. string []
> groups; /// Matched groups.
> 
> this (string text); }
> 
> What am I going to do about this now without using templates? If you
> define a special syntax to make this work, then I can give you
> something even further which won't.

The problem is you set up artificially constrained rules, i.e. "without using templates". You can't use the same struct to store mutable types and non-mutable types mixed with always-mutable types, and for good reasons. No type system will allow 100% of the correct programs to run. Why the fuss. Use a gorram template and call it a day.

Andrei
March 07, 2009
On Fri, 06 Mar 2009 14:56:04 -0800, Andrei Alexandrescu wrote:

> Steven Schveighoffer wrote:
>> I think what Burton is saying is by annointing immutable(char)[] as the type "string," you are essentially sending a message to developers that all strings should be immutable, and all *string parameters* should be declared immutable.

> Phobos is being changed to accept in char[] instead of string
> wherever applicable. As far as what the default "string" ought to be,
> immutable(char)[] is the safest of the three so I think it should be that.

I vaguely remember someone suggesting that "string" be the alias for immutable character arrays and "text" be the alias for mutable character arrays. For some people, it might be easier to relate the word "text" as being something that can be edited in-place.

I'm not advocating or rejecting this ... just trying to recall the original poster's suggestion.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 07, 2009
Andrei Alexandrescu Wrote:

> Burton Radons wrote:
> > Andrei Alexandrescu Wrote:
> > 
> >> This is a known problem for hich we will provide a solution.
> > 
> > If you have something which works everywhere please tell us because we've been trying to find one for a long time, but as far as I know there is no solution. The best I've ever gotten to is:
> > 
> > // "A" means that the constness of the return type depends upon the
> > constness of the argument. There are dozens of ways to specify the
> > same thing. const (A) mstring match (const (A) mstring text, RE
> > expression);
> > 
> > But setting aside whether that helps or hinders self-documentation, that's far from the only place at which you put mutable data through a const section that you need to modify later. What if the function were instead:
> > 
> > struct REMatch { string match; /// The matched string. size_t offset; /// Offset within the string where the match occurs. string [] groups; /// Matched groups.
> > 
> > this (string text); }
> > 
> > What am I going to do about this now without using templates? If you define a special syntax to make this work, then I can give you something even further which won't.
> 
> The problem is you set up artificially constrained rules, i.e. "without using templates". You can't use the same struct to store mutable types and non-mutable types mixed with always-mutable types, and for good reasons. No type system will allow 100% of the correct programs to run. Why the fuss. Use a gorram template and call it a day.

Ah, so you don't have a solution.

What you'll have instead are programs which are developed one way but then come to an impasse where they need to cast off const but can't. So after some cursing, the programmer starts wasting his time modifying all of his code to be templated, because avoiding casting off const has exactly the same viral progression as adding in const (I did it in C++ the last time we thought const might be able to make code better-optimised).

This goes fine, until he comes up to a library which hasn't gone through the same process, so it has a normal interface. Or he comes up to an interface itself, which will not normally be templatable. He doesn't have the code, so he can't change the library. What does he do then?

There are four options I can see. One, he could make a copy of the supposedly const data that's been trapped by the library, which won't always work. Two, he could take the pointer and length from the slice and figure out where in his mutable data the slice exists at, which might be impossible depending upon pointer arithmetic restrictions. Three, he could reimplement the library functionality himself, which may be impossible. Four, he could move on to a language which doesn't make his job hard just so that it can sometimes add numbers faster.

This is actually reminding me of C++ the more I think about it. C++'s bad features are so bad because they're far-reaching but they couldn't be consistently applied. So if you read the specification, you'll find twenty or so caveats that try to make the feature work, when the proper thing to have done was to realise that a feature which doesn't naturally fit in a language shouldn't be in that language. Yet here we have a feature that's not just C++, but it's C++^2.

Multiple keywords. The threat that abuse will eventually lead to code being compiled incorrectly, coupled with forcing abuse on common code (I stress that any data which is marked as invariant at any point but is not actually invariant will cause optimisation issues if it's given any weight whatsoever). At least three different ways to define a const. "const int *foo ()" and "int *foo () const" equivalency. A syntax which makes declarations hard to read. And coming, you say, is enforced const-correctness, just to make things extra awful.
March 07, 2009
On Fri, 06 Mar 2009 18:25:13 -0500, jerry quinn wrote:

> So far this discussion has no examples.  Let me toss one out as a dart board to see what folks think:
> 
> char[] s = get_some_data();  // accesses a large buffer that i don't want to copy
> size_t pos = my_find(s, "blah");
> s[pos] = "c";
> 
> string s2 = "another string";
> size_t pos2 = my_find(s2, "blah");
> another_func(s2[pos2 .. pos2+4]);
> 
> size_t my_find(string haystack, string needle)
> {
>   // Perform some kind of search that is read-only
> }
> 
> another_func(string s) {}
> 
> ---
> 
> It seems like string is the obvious parameter type to use for my_find, but it doesn't work because we're passing mutable data in.  Instead we have to use const(char)[], which is less intuitive and uglier.
> 
> Or did I miss the thrust of the argument?

Most of this discussion seems to assume that there is only two types of data - immutable and mutable, but there is a third type - "potentially mutable".

invariant(char)[] --> immutable
                  --> Once set, it cannot be changed by anything.
const(char)[]     --> potentially mutable
                  --> The compiler ensure that the routine that declares
                      this will not change it, but it can be changed by
                      other routines.
char[]            --> mutable
                  --> Anything can change this.


The alias "string" only refers to immutable stuff. Can we come up with aliases for "potentially mutable" and "mutable" too?

I would argue that the parameter signature for my_find() needs to use the "const(char)[]" type because that means that the coder and compiler says that this function won't change the input but we don't particularly care if the input is immutable, or mutable by something else.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell
March 07, 2009
Burton Radons wrote:
> Andrei Alexandrescu Wrote:
>> The problem is you set up artificially constrained rules, i.e.
>> "without using templates". You can't use the same struct to store
>> mutable types and non-mutable types mixed with always-mutable
>> types, and for good reasons. No type system will allow 100% of the
>> correct programs to run. Why the fuss. Use a gorram template and
>> call it a day.
> 
> Ah, so you don't have a solution.

I do have a solution, the problem is sometimes no amount of convincing will do any good. Arguments focusing on niche cases can be formulated against every single restriction of a type system. Const and immutable are supposed to express realities about data. Sometimes said realities undergo dialectics that are difficult to express. The system is not perfect, and cannot be made perfect within the constraints at hand (e.g. without putting undue complexity on the programmer). If you're bent from the get-go against const and immutable, there is no good scenario that is good enough, no bad scenario that's infrequent and avoidable enough, and no way to purport a gainful dialog.

Andrei
March 07, 2009
Derek Parnell wrote:
> I would argue that the parameter signature for my_find() needs to use the
> "const(char)[]" type because that means that the coder and compiler says
> that this function won't change the input but we don't particularly care if
> the input is immutable, or mutable by something else.

If you want it to be callable by all three variations on character arrays, then yes.
March 07, 2009
When we first got into what to do with strings and const/immutable/mutable, I was definitely in the camp that strings should be mutable char[], or at worst const(char)[]. The thing is, Andrei pointed out to me, languages that are considered very good at dealing with strings (like Perl) use immutable strings. The fascinating thing about strings in such languages is:

"Nobody notices they are immutable, they just work."

So what is it about immutability that makes strings "just work" in a natural and intuitive manner? The insight is that it enables strings, which are reference types, to behave exactly as if they were value types.

After all, it never occurs to anyone to think that the integer 123 could be a "mutable" integer and perhaps be 133 the next time you look at it. If you put 123 into a variable, it stays 123. It's immutable. People intuitively expect strings to behave the same way. Only C programmers expect that once they assign a string to a variable, that string may change in place.

C has it backwards by making strings mutable, and it's one of the main reasons why dealing with strings in C is such a gigantic pain. But as a longtime C programmer, I was so used to that I didn't notice what a pain it was until I started using other languages where string manipulation was a breeze.

The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program. Mutable char[] arrays should only exist as temporaries. This is exactly the opposite of the way one does it in C, but if you do it this way, you'll find you never need to defensively dup the string "just in case" and things just seem to naturally work out.

I tend to agree that if you try to do strings the C way in D2, you'll probably find it to be frustrating experience.
March 07, 2009
Walter Bright wrote:
> When we first got into what to do with strings and const/immutable/mutable, I was definitely in the camp that strings should be mutable char[], or at worst const(char)[]. The thing is, Andrei pointed out to me, languages that are considered very good at dealing with strings (like Perl) use immutable strings. The fascinating thing about strings in such languages is:
> 
> "Nobody notices they are immutable, they just work."
> 
> So what is it about immutability that makes strings "just work" in a natural and intuitive manner? The insight is that it enables strings, which are reference types, to behave exactly as if they were value types.
> 
> After all, it never occurs to anyone to think that the integer 123 could be a "mutable" integer and perhaps be 133 the next time you look at it. If you put 123 into a variable, it stays 123. It's immutable. People intuitively expect strings to behave the same way. Only C programmers expect that once they assign a string to a variable, that string may change in place.
> 
> C has it backwards by making strings mutable, and it's one of the main reasons why dealing with strings in C is such a gigantic pain. But as a longtime C programmer, I was so used to that I didn't notice what a pain it was until I started using other languages where string manipulation was a breeze.
> 
> The way to do strings in D is to have them be immutable. If you are building a string by manipulating its parts, start with mutable, when finished then convert it to immutable and 'publish' it to the rest of the program. Mutable char[] arrays should only exist as temporaries. This is exactly the opposite of the way one does it in C, but if you do it this way, you'll find you never need to defensively dup the string "just in case" and things just seem to naturally work out.
> 
> I tend to agree that if you try to do strings the C way in D2, you'll probably find it to be frustrating experience.

That hit the spot.

Andrei