March 07, 2014 Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
In "Lots of low hanging fruit in Phobos" the issue came up about the automatic encoding and decoding of char ranges. Throughout D's history, there are regular and repeated proposals to redesign D's view of char[] to pretend it is not UTF-8, but UTF-32. I.e. so D will automatically generate code to decode and encode on every attempt to index char[]. I have strongly objected to these proposals on the grounds that: 1. It is a MAJOR performance problem to do this. 2. Very, very few manipulations of strings ever actually need decoded values. 3. D is a systems/native programming language, and systems/native programming languages must not hide the underlying representation (I make similar arguments about proposals to make ints issue errors on overflow, etc.). 4. Users should choose when decode/encode happens, not the language. and I have been successful at heading these off. But one slipped by me. See this in std.array: @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[])) { assert(a.length, "Attempting to fetch the front of an empty array of " ~ T.stringof); size_t i = 0; return decode(a, i); } What that means is that if I implement an algorithm that accepts, as input, an InputRange of char's, it will ALWAYS try to decode it. This means that even: from.copy(to) will decode 'from', and then re-encode it for 'to'. And it will do it SILENTLY. The user won't notice, and he'll just assume that D performance sux. Even if he does notice, his options to make his code run faster are poor. If the user wants decoding, it should be explicit, as in: from.decode.copy(encode!to) The USER should decide where and when the decoding goes. 'decode' should be just another algorithm. (Yes, I know that std.algorithm.copy() has some specializations to take care of this. But these specializations would have to be written for EVERY algorithm, which is thoroughly unreasonable. Furthermore, copy()'s specializations only apply if BOTH source and destination are arrays. If just one is, the decode/encode penalty applies.) Is there any hope of fixing this? |
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright: > systems/native programming languages must not hide the underlying representation (I make similar arguments about proposals to make ints issue errors on overflow, etc.). But it's good to have in Phobos a compiler-intrinsics-based efficient overflow detection on a user-defined struct type that behaves like built-in ints in all other aspects. > Is there any hope of fixing this? I don't think we can change that in D2. You can change it in D3. Bye, bearophile |
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 3/6/2014 6:54 PM, bearophile wrote:
> Walter Bright:
>
>> systems/native programming languages must not hide the underlying
>> representation (I make similar arguments about proposals to make ints issue
>> errors on overflow, etc.).
>
> But it's good to have in Phobos a compiler-intrinsics-based efficient overflow
> detection on a user-defined struct type that behaves like built-in ints in all
> other aspects.
Yes, so that the user selects it, rather than having it wired in everywhere and the user has to figure out how to defeat it.
|
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 3/6/2014 6:54 PM, bearophile wrote:
> Walter Bright:
>> Is there any hope of fixing this?
>
> I don't think we can change that in D2. You can change it in D3.
You use ranges a lot. Would it break any of your code?
|
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 3/6/2014 6:37 PM, Walter Bright wrote:
> Is there any hope of fixing this?
Is there any way we can provide an upgrade path for this? Silent breakage is terrible. Any ideas?
|
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright:
> You use ranges a lot. Would it break any of your code?
I need to try the changes to be sure. But the magnitude of this change is so large that I guess some code will surely break.
One advantage of your change is that this code will work:
auto s = "hello".dup;
s.sort();
Bye,
bearophile
|
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | Walter Bright:
>> But it's good to have in Phobos a compiler-intrinsics-based efficient overflow
>> detection on a user-defined struct type that behaves like built-in ints in all
>> other aspects.
>
> Yes, so that the user selects it, rather than having it wired in everywhere and the user has to figure out how to defeat it.
I don't think people have ever suggested that.
In a recent discussion you seemed against the idea of a special compiler support for that user defined type.
Bye,
bearophile
|
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Thu, Mar 06, 2014 at 06:59:36PM -0800, Walter Bright wrote: > On 3/6/2014 6:54 PM, bearophile wrote: > >Walter Bright: > >>Is there any hope of fixing this? > > > >I don't think we can change that in D2. You can change it in D3. > > You use ranges a lot. Would it break any of your code? Whoa. You're not serious about changing this now, are you? Because even though I would support such a change, you have to realize the magnitude of code breakage that will happen. A lot of code that iterates over narrow strings will break, and worse yet, they will break *silently*. Calling count() on a narrow string will not return the expected value, for example. And existing code that iterates over narrow strings expecting dchars to come out of it will suddenly silently convert to char, and may pass by unnoticed until somebody runs the program with a multibyte character in the input. This is very high risk change IMO. You're welcome to create a (temporary) Phobos fork that reverts narrow string auto-decoding, of course, and people can try it out to see how much actual breakage is happening. If you really want to push for this, that might be the safest way to test the waters before committing to such a major change. Silent breakage is not easy to test for, unfortunately. :( T -- Truth, Sir, is a cow which will give [skeptics] no more milk, and so they are gone to milk the bull. -- Sam. Johnson |
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On 3/6/2014 7:06 PM, Walter Bright wrote:
> On 3/6/2014 6:37 PM, Walter Bright wrote:
>> Is there any hope of fixing this?
>
> Is there any way we can provide an upgrade path for this? Silent breakage is
> terrible. Any ideas?
Ok, I have a plan. Each step will be separated by at least one version:
1. implement decode() as an algorithm for string types, so one can write:
string s;
s.decode.algorithm...
suggest that people start doing that instead of:
s.algorithm...
2. Emit warning when people use std.array.front(s) with strings.
3. Deprecate std.array.front for strings.
4. Error for std.array.front for strings.
5. Implement new std.array.front for strings that doesn't decode.
|
March 07, 2014 Re: Major performance problem with std.array.front() | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On 3/6/2014 7:22 PM, bearophile wrote:
> One advantage of your change is that this code will work:
>
> auto s = "hello".dup;
> s.sort();
Yes, I hadn't thought of that.
The auto-decoding front() introduces all kinds of asymmetry in how ranges work, and asymmetry is bad as it negatively impacts composability.
|
Copyright © 1999-2021 by the D Language Foundation