June 24, 2015
On Wednesday, 24 June 2015 at 13:45:27 UTC, John Colvin wrote:
> On Wednesday, 24 June 2015 at 13:40:28 UTC, Vladimir Panteleev wrote:
>> BTW, as for eager/mutating functions, they are almost all verbs:
>>
>> [...]
>
> Which is sensible. A range is an object that performs an action, an eager function is the action itself.

Yep, just another data point.
June 24, 2015
On Tuesday, 23 June 2015 at 22:51:08 UTC, Vladimir Panteleev wrote:
> On Tuesday, 23 June 2015 at 22:45:10 UTC, Vladimir Panteleev wrote:
>
> Proposed new name: withExtension

I feel this fails the litmus you established before: "These functions have the same functionality, but one of them is eager, and the other is lazy. Can you guess which is which?"

I don't think I'd interpret these two names as having the same functionality in the first place.  I'd probably learn their equivalence completely by accident and only remember it by rote.

-Wyatt
June 24, 2015
On Wednesday, 24 June 2015 at 13:43:04 UTC, Jacob Carlborg wrote:
> Can't we update isSomeString to detect this use case?

We _could_ but that would be a disaster. The whole point of isSomeString is to test whether something is a string exactly. The code used by a function that's overloaded specifically on strings needs to operate on strings. If isSomeString suddenly accepted something which implicitly converted to a string rather than a string, then most functions which used isSomeString in their template constraints, would fail to compile when used with such a range. In general, implicit conversions in template constraints are incredibly dangerous - especially when alias this is involved - because unless the conversion is actually done, the type in question won't act exactly as whatever it converts to, and when given something that implicitly converts, it will either not compile, or it will have incorrect behavior.

A while back, it was temporarily changed so that isSomeString, isIntegeral, etc. accepted implicit conversions, but that was reverted fairly quickly, precisely because it's so dangerous. Templated functions should only be dealing with implicit conversions when they force the conversion.

We could choose to write overloads for Phobos functions which accepted ranges that implicitly converted to string, explicitly convert them to string, and then call the string overload, but then they'd always allocate, whereas maybe the overload which operated on non-strings would have been better, because it wouldn't have required any allocation. So really, what we need is to either change is that strings are ranges of their code unit type rather than dchar, or we need to be using byCodeUnit and friends a lot more. I believe that Walter has been trying to do that with the lazy versions of functions, but any of them which result in ranges of dchar are going to no longer be strings, and even those that use byCodeUnit won't be able to take advantage of overloads for strings or arrays anymore, because they won't be strings.

Part of what we need to do is go through Phobos and make it so that the various range-based functions which operate on strings operate on ranges of char just as well, then byCodeUnit and its ilk can be optimized appropriately. But they're not going to work with the current overloads which take strings, because they are neither arrays nor strings.

- Jonathan M Davis
June 24, 2015
On Wednesday, 24 June 2015 at 13:35:06 UTC, Vladimir Panteleev wrote:
> I think someone suggested lowerCased and upperCased somewhere, I think there are fine too. There is some precedent (transposed and indexed).

Err, that someone was me. I thought my initial suggestion was asLowerCase / asUpperCase.
June 24, 2015
On Wednesday, 24 June 2015 at 13:50:09 UTC, Vladimir Panteleev wrote:
> - I think the implementation is better done through composition

Perhaps, though I was thinking of this as being just a temporary step for migration until it is deprecated - it helps a lot of code continue to just work, but at the cost of a silent allocation which we would want to avoid.

If it needed composition on the usage point, it'd defeat the point of minimizing code breakage.

> - On the performance side, one point is that this grows the size of the struct by two machine words (string's .ptr and .length). This type is likely to be passed by value through function parameters, too.

Aye, that's a compromise again - it could just allocate a new string in that eager method, but since it is implicit, that could easily waste a lot more time than the extra cached string.

Ideally though, a year from now, that'd be deprecated and removed in favor of having the user migrate to explicit allocation on their end, becoming fully lazy. (That's what I really want to do: make phobos all lazy and make the allocation a user-level decision, I'm just trying to offer something that has a less breakage migration path.)



So my plan is:

1) Change to lazy with the implicit conversion in a version(future_phobos) else {} block. Tell everyone this conversion sucks and they want to get away from it. Use -version=future_phobos to see what code is likely to break when they recompile so they can start handling it.

2) Next release, put deprecated("use .array or lazy instead") (or maybe not .array as it can yield dchar when you want char, but whatever actually works right) on that method.

3) A year later, move this code into a version(D_067_compatible) block so the old eager behavior is now opt-in. So now, the default build no longer has the alias this, eager method, nor the string cache member.

4) Eventually, kill that version too to clean up the code.



That should be a reasonable migration path, with minimal code breakage, ample warning, and a really easy fix that the compiler tells you about to update your code.
June 24, 2015
On Wednesday, 24 June 2015 at 14:29:34 UTC, Adam D. Ruppe wrote:
> If it needed composition on the usage point, it'd defeat the point of minimizing code breakage.

Not at the call site, but in the function (i.e. the function defines a voldemort struct, constructs one, wraps that into the helper that adds implicit string conversion, and returns that).

> Ideally though, a year from now, that'd be deprecated and removed in favor of having the user migrate to explicit allocation on their end, becoming fully lazy.

As I understand, we're not doing that any more.
June 24, 2015
On Wed, Jun 24, 2015 at 11:28:46AM +0000, Jonathan M Davis via Digitalmars-d wrote:
> On Wednesday, 24 June 2015 at 11:12:27 UTC, John Chapman wrote:
> >On Wednesday, 24 June 2015 at 01:04:01 UTC, Adam D. Ruppe wrote:
> >>The code breakage is minimal
> >
> >Won't this break isSomeString? Phobos uses this everywhere.
> 
> It won't break isSomeString. isSomeString will continue to work the same.  What it will mean is that the result of toLower won't pass isSomeString anymore, and if you pass it to a range-based function which has an overload for strings, it won't match it and will be treated the same as a range like FilterResult and not get the string optimizations. If you want it to actually be a string, then you'll need to use to!string on it (even std.array.array wouldn't work, since that would convert it to dchar[], not string).
> 
> So, that could be a reason why this isn't a great idea, but it once again highlights why having autodecoding is a bad idea, and it shows that as we increase how much we're doing with functions which return lazy ranges, the cost of having autodecoding will only increase, because we'll being dealing with strings directly less and less.
[...]

Yet another nail in the coffin of autodecoding. I really wish we had toughed it out earlier and begun phasing it out. It's kinda late for that now... but maybe it might still be worth it?


T

-- 
I am Ohm of Borg. Resistance is voltage over current.
June 24, 2015
On Wednesday, 24 June 2015 at 15:40:54 UTC, H. S. Teoh wrote:
> Yet another nail in the coffin of autodecoding. I really wish we had toughed it out earlier and begun phasing it out. It's kinda late for that now... but maybe it might still be worth it?

As I understand, Andrei's opinion still is that auto-decoding was the better choice, so I think that's extremely unlikely to happen.
June 24, 2015
On Wednesday, 24 June 2015 at 14:31:42 UTC, Vladimir Panteleev wrote:
> Not at the call site, but in the function (i.e. the function defines a voldemort struct, constructs one, wraps that into the helper that adds implicit string conversion, and returns that).

Oh yeah, we could do that.
June 24, 2015
On 6/24/15 9:50 AM, Vladimir Panteleev wrote:
> On Wednesday, 24 June 2015 at 01:04:01 UTC, Adam D. Ruppe wrote:
>> We disagreed on this on irc, but I ask you to consider the following
>> which limits the code breakage a lot more than my first proposal in chat:
>>
>> [...]
>
> Some thoughts:
>
> - I think the implementation is better done through composition (i.e. a
> function that takes any range, and returns a type that works like that
> range but also allows implicit conversion to string. Not sure how
> feasible this is, maybe multiple alias this will help.
>
> - On the performance side, one point is that this grows the size of the
> struct by two machine words (string's .ptr and .length). This type is
> likely to be passed by value through function parameters, too.
>
> - Another perf. issue is that this introduces additional cost every time
> the implicit conversion to string is done (you need to at least check if
> the string value has been calculated).

I think these points are not quite as bad as you think:

1. Any code that was written to use the string version is currently passing a string for the extension. Only provide a cached conversion to string if that is the case (i.e. you have 2 strings passed in), and we'll be fine for existing code. Store the cached string into one of the two stored ranges. Don't even bother allowing conversion from non-string ranges to a string, just don't compile. This solves the space problem.

2. Code that is using the string version can call array, or can specifically say "string x = ..." instead of "auto x = ..." to avoid extra checks if they want to squeeze out that little test instruction. I don't see why we should care about an extra check for code that is deprecated or undesirable, and I don't see how the extra check is that bad either.

I really like Adam's idea. A lot.

-Steve