December 23, 2011
On 12/23/2011 02:46 PM, Ali Çehreli wrote:
> On 12/23/2011 12:36 PM, Mr. Anonymous wrote:
>  > you generate an array of random numbers, and one of them appears to
>  > be an address of an allocated array. This array won't free even if not
>  > used anymore.

OK, I misread what you said. I thought you were filling the array with random numbers. You are right.

Ali
December 23, 2011
On Friday, December 23, 2011 09:47:35 Ali Çehreli wrote:
> - To be more useful, function parameters should not insist on immutable data, yet we type string all over the place.

That depends. If they're going to have to idup the data anyway, then it's better to require that the argument be immutable so that that cost is clear. The worst is taking const and then iduping, because then you're forced to idup strings which didn't need to be iduped.

And in general, operating on strings is more efficient than mutable character arrays, because you can slice them with impunity, whereas you often have to dup or idup mutable arrays in order to avoid altering the original data. The area where the immutability becomes problematic is when you actually want to directly mutate a string - but that's generally a rather iffy thing to do with UTF-8 anyway, since you have to deal with the varying length of the various code points within the string.

That being said, an increasing number of functions in Phobos are templated on string type so that you can use whatever string type that you want with them. And there is a push (at least with toString) to add the ability to put the result of a string function into an existing string of some variety (be it using a delegate or an output range). So, you'll be forced to use string less, but the reality of the matter is that in the general case you should probably be using string anyway (there are, of course, always exceptions).

> - To be more useful, functions should not insist on the mutability of the data that they return.
>
> The following function makes a new string:
> 
> char[] endWithDot(const(char)[] s)
> {
>      return s ~ '.';
> }
> 
>      char[] s;
>      s ~= "hello";
>      auto a = endWithDot(s);
> 
> It is good that the parameter is const(char) so that I could pass the
> mutable s to it.
> 
> But the orthogonal problem of the type of the return is troubling. The result is clearly mutable yet it can't be returned as such:
> 
> Error: cannot implicitly convert expression (s ~ '.') of type
> const(char)[] to char[]
> 
> We've talked about this before. There is nothing in the language that makes me say "the returned object is unique; you can cast it to mutable or immutable freely."

In general, D doesn't have features where the programmer says that something is okay. It's too interested in making guarantees for that. Either it can guarantee something, or you force it with a cast. I can't think of even one feature where you say that _you_ guarantee that something is okay. Casting is your only option.

That being said, the language is improving in what it can guarantee and in what it can do thanks to those guarantees. For instance, if you have a pure function and the compiler can guarantee that the return value doesn't reference anything in the argumetns passed in, then the return value is implicitly convertible to whatever const-ness you want.

If you want to be making such guarantees yourself, then what you typically have to do is templatize the function and take advantage of static if and D's compile-time reflection capabilities. Phobos does this quite a bit to improve performance and avoid having to duplicate data.

Your particular example is quite easily fixed though. The issue is that the string which was passed in is typed as const(char)[], and the expression s ~ '.' naturally results in the same type. But it's quite clear that the resulting string could be of any constness, since it's a new string. So, just tell it what constness to have by casting it.

- Jonathan M Davis
December 23, 2011
On Friday, December 23, 2011 20:19:28 Mr. Anonymous wrote:
> I saw that std.string functions use assumeUnique from std.exception. As for your example, it probably should be:
> 
> char[] endWithDot(const(char)[] s)
> {
>      return s.dup ~ '.';
> }

No, that allocates _two_ strings - one from dup and one as the result of the concatenation. It should either be

auto retval = s.dup;
retval ~= '.';
return retval;

or

return cast(char[])(s ~ '.');

The problem is that because s is const(char)[], the result of the concatenation is that type. But it's guaranteed to be a new string, so the cast is fine. It's arguably better to use the first version though, since it doesn't require a cast.

- Jonathan M Davis
December 23, 2011
On Friday, December 23, 2011 14:51:06 bearophile wrote:
> And sometimes inout helps.

Yeah, good point. I keep forgetting about inout, since it didn't work properly before. So, the best way to implement Ali's function would be

inout(char)[] endWithDot(inout(char)[] s)
{
     return s ~ '.';
}

- Jonathan M Davis
December 23, 2011
On Friday, December 23, 2011 14:51:21 Ali Çehreli wrote:
> On 12/23/2011 11:51 AM, bearophile wrote:
>  > Ali:
>  >> There is nothing in the language that makes me say "the returned
> 
> object is unique; you can cast it to mutable or immutable freely."<
> 
>  > The return value of strongly pure functions is implicitly castable to
> 
> immutable.
> 
> Is that working yet? The commented-out lines below don't compile with 2.057:
> 
> void main()
> {
>      char[] s = "hello".dup;
> 
>      char[]            am  = endWithDot(s);
>      const(char)[]     ac  = endWithDot(s);
>      const(char[])     acc = endWithDot(s);
>      // immutable(char)[] ai  = endWithDot(s);
>      // immutable(char[]) aii = endWithDot(s);
> }
> 
> pure char[] endWithDot(const(char)[] s)
> {
>      char[] result = s.dup;
>      result ~= '.';
>      return result;
> }

Well, that's not strongly pure - only weakly pure - so if the optimization is only for strongly pure functions, then that won't work. I know that it _could_ be done with weakly pure functions as well (such as your example here), but I'm not exactly sure what it does right now. The feature is new, so it doesn't yet work in all of the cases that it should, and it's not entirely clear exactly far it will go. IIRC, Daniel Murphy and Steven were discussing it a while back, and it clearly didn't do as much as it could, and it wasn't entirely clear that it ever would beacuse of the increased complications involved. However, it wil almost certainly work in more cases in the future than it does now as the feature is improved.

- Jonathan M Davis
December 24, 2011
Jonathan M Davis:

> The feature is new, so it doesn't yet work in all of the cases that it should, and it's not entirely clear exactly far it will go. IIRC, Daniel Murphy and Steven were discussing it a while back,

I have very recently opened another thread about it, but unfortunately it didn't attract a lot of attention: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=153041


> and it wasn't entirely clear that it ever would beacuse of the increased complications involved.

Yeah.

--------------------

Ali:

>      return s ~ '.';
> 
> as the type of the result is const(char)[]. I insist that it too should be castable to any mutable or immutable type.

Add a request in Bugzilla if it's not already present. I don't know how much complex it is to implement in the compiler.

Bye,
bearophile
December 24, 2011
On 12/23/2011 03:16 PM, Jonathan M Davis wrote:
> On Friday, December 23, 2011 09:47:35 Ali Çehreli wrote:
>> - To be more useful, function parameters should not insist on immutable
>> data, yet we type string all over the place.
>
> That depends. If they're going to have to idup the data anyway, then it's
> better to require that the argument be immutable so that that cost is clear.
>
> The worst is taking const and then iduping, because then you're forced to idup
> strings which didn't need to be iduped.

That would be leaking an implementation detail to the user. Besides, it doesn't solve the problem if the user is in the middle:

void user(const(char)[] p)
{
    writeln(endWithDot(p));
}

The user itself would be forced to take immutable, but this time the reason is different: not because he is passing a copy optimization of its own, but because he is passing endWithDot()'s copy optimization to its caller.

immutable would have to be leaked all the way up just because a low level function decided to make a copy!

Perhaps the guideline should be: Everybody should take by immutable references so that this leaking of immutable through all layers should not be a problem in case a low-level function decided to make a copy.

> And in general, operating on strings is more efficient than mutable character
> arrays, because you can slice them with impunity, whereas you often have to
> dup or idup mutable arrays in order to avoid altering the original data.

Agreed. But immutable on the parameter list is an insistence: The function insists that the data be immutable. Why? Because it is going to store it for later use? Perhaps share it between threads? It is understandable when there is such a legitimate reason. Then the caller would see the reason too: "oh, takes immutable; that means my data may be used later as is."

> That being said, an increasing number of functions in Phobos are templated on
> string type so that you can use whatever string type that you want with them.
> And there is a push (at least with toString) to add the ability to put the
> result of a string function into an existing string of some variety (be it
> using a delegate or an output range). So, you'll be forced to use string less,

Good. Ranges are more and more becoming "thinking in D." Perhaps we should be talking about a range that appends a dot at the end of the existing elements.

> but the reality of the matter is that in the general case you should probably
> be using string anyway (there are, of course, always exceptions).

I am looking for simple guidelines when designing functions. It is simple in C++: take data by reference to const if you are not going to modify it. (It is questionable whether small structs should be passed by value instead, but that's beside the point.)

In C++, passing by reference to const works because the function accepts any type of mutability and a copy is avoided because it's a reference.

In D, immutable is not more const than const (which was my initial assumption); it is an additional requirement: give me data that should never change. My point is that this requirement makes sense only in rare cases. Why would a function like endWithDot() insist on how mutable the user's data is?

>> - To be more useful, functions should not insist on the mutability of
>> the data that they return.
>>
>> The following function makes a new string:
>>
>> char[] endWithDot(const(char)[] s)
>> {
>>       return s ~ '.';
>> }
>>
>>       char[] s;
>>       s ~= "hello";
>>       auto a = endWithDot(s);
>>
>> It is good that the parameter is const(char) so that I could pass the
>> mutable s to it.
>>
>> But the orthogonal problem of the type of the return is troubling. The
>> result is clearly mutable yet it can't be returned as such:
>>
>> Error: cannot implicitly convert expression (s ~ '.') of type
>> const(char)[] to char[]
>>
>> We've talked about this before. There is nothing in the language that
>> makes me say "the returned object is unique; you can cast it to mutable
>> or immutable freely."
>
> In general, D doesn't have features where the programmer says that something
> is okay. It's too interested in making guarantees for that. Either it can
> guarantee something, or you force it with a cast. I can't think of even one
> feature where you say that _you_ guarantee that something is okay. Casting is
> your only option. [...]

I know. I used the wrong words. Yes, the compiler should see what I see: the returned object is unique and can be elevated to any mutability level.

> Your particular example is quite easily fixed though. The issue is that the
> string which was passed in is typed as const(char)[], and the expression s ~
> '.' naturally results in the same type. But it's quite clear that the
> resulting string could be of any constness, since it's a new string. So, just
> tell it what constness to have by casting it.

That's the other side of the problem: Why would the function dictate how the caller should treat this piece of data? The function should not arbitrarily put const or immutable on the data. That would be making it less useful. The data is mutable anyway.

inout doesn't solve this problem as it is a connection between the mutability of the parameter(s) and the result. The mutability type of the result has nothing to do with the parameters' in the case of functions like endWithDot().

As you say, maybe the situation will get better in D and functions will simply return char[] and the compiler will convert it to automatically.

I remembered that I had shown UniqueMutable when we discussed this issue last time:

import std.stdio;
import std.exception;

struct UniqueMutable(T)
{
    T data;
    bool is_used;

    this(ref T data)
    {
        this.is_used = false;
        this.data = data;
        data = null;
    }

    T as_mutable()
    {
        return as_impl!(T)();
    }

    immutable(T) as_immutable()
    {
        return as_impl!(immutable(T))();
    }

    private ConvT as_impl(ConvT)()
    {
        enforce(!is_used);
        ConvT result = cast(ConvT)(data);
        data = null;
        is_used = true;
        return result;
    }
}

UniqueMutable!T unique_mutable(T)(ref T data)
{
    return UniqueMutable!T(data);
}

UniqueMutable!(char[]) foo()
{
    char[] result = "hello".dup;
    result ~= " world";
    return unique_mutable(result);
}

void main()
{
    char[] mutable_result = foo().as_mutable;
    mutable_result[0] = 'H';
    string immutable_result = foo().as_immutable;
}

>
> - Jonathan M Davis

Ali

December 24, 2011
The core problem for a number of these situations is how types are handled with regards to expressions. In an expression such as

char[] arr = s ~ '.';

the type of the value being assigned is determined _before_ the assignment is done. So, even though in theory the compiler could make it work, it doesn't, because by the time it's looking at the type being assigned to, it's too late. There would need to be a fundamental change in how the language functions in order to fix issues like this.

pure can do it when it can not because it's able to look at what the return type is and changing the result of the expression accordingly but because it has guarantees which make it so that it knows that the return value could be converted to any level of constness and still be valid. The types used in the expressions internally are generally irrelevant.

So, while I completely agree that it would be an improvement if the compiler did a better job with implicit conversion when it could theoretically be done, I'm not sure how much of that we're actually going to end up seeing simply because of how the language and type system works in terms of the order of evaluation.

- Jonathan M Davis
December 24, 2011
On 12/24/2011 02:02 AM, Jonathan M Davis wrote:
> The core problem for a number of these situations is how types are handled
> with regards to expressions. In an expression such as
>
> char[] arr = s ~ '.';
>
> the type of the value being assigned is determined _before_ the assignment is
> done. So, even though in theory the compiler could make it work, it doesn't,
> because by the time it's looking at the type being assigned to, it's too late.
> There would need to be a fundamental change in how the language functions in
> order to fix issues like this.

Examples of resolved issues like this:

int[] foo()pure;
immutable(int)[] x = foo;


>
> pure can do it when it can not because it's able to look at what the return
> type is and changing the result of the expression accordingly but because it
> has guarantees which make it so that it knows that the return value could be
> converted to any level of constness and still be valid. The types used in the
> expressions internally are generally irrelevant.
>
> So, while I completely agree that it would be an improvement if the compiler
> did a better job with implicit conversion when it could theoretically be done,
> I'm not sure how much of that we're actually going to end up seeing simply
> because of how the language and type system works in terms of the order of
> evaluation.
>
> - Jonathan M Davis

I don't think this is very hard to get working.
December 24, 2011
23.12.2011 22:51, bearophile пишет:
>> ++a[] works, but a[]++ doesn't.
> Already known compiler bug.

Is it a joke? Array expression in D are for performance reasons to generate x2-x100 faster code without any compiler optimisations. Link to one of these epic comments (even x100 more epic because of '%' use instead of 'x###'):
https://github.com/D-Programming-Language/druntime/blob/master/src/rt/arraybyte.d#L1127

But `a[]++` should store a copy of `a`, increment elements and return stored copy. It is hidden GC allocation. We already have a silent allocation in closures, but here a _really large_ peace of data can be allocated. Yes, this allocation sometimes can be optimized out but not always.

IMHO, D should not have `a[]++` operator.