Major performance problem with std.array.front() (page 6)

On Fri, Mar 07, 2014 at 05:24:59PM +0000, Vladimir Panteleev wrote: > On Friday, 7 March 2014 at 03:52:42 UTC, Walter Bright wrote: > >Ok, I have a plan. Each step will be separated by at least one version: > > > >1. implement decode() as an algorithm for string types, so one can > >write: > > > > string s; > > s.decode.algorithm... > > > >suggest that people start doing that instead of: > > > > s.algorithm... > > I think .decode should be something more explicit (byCodePoint OSLT), just so it's clear that it's not magical and does not solve all problems. +1. I think "byCodePoint" is far more self-documenting and less misleading than "decode". string s; s.byCodePoint.algorithm... I'm already starting to like it. T -- It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?

On Fri, Mar 07, 2014 at 05:08:02PM +0000, Dicebot wrote: > On Friday, 7 March 2014 at 17:04:30 UTC, Vladimir Panteleev wrote: > >I think that if we are to draw a line somewhere on what to support and not, the decision should not be embedded as deep into the language. Ideally, it would be clearly visible in the code that you are counting code points. > > Well if you consider really breaking changes, simply prohibiting plain random access to char[] and forcing to use either .raw or .decode is one thing I'd love to see (with .byGrapheme as library cherry on top) I don't understand what advantage this would bring. T -- Frank disagreement binds closer than feigned agreement.

I only hope it won't break my code. It mainly deals with string / character processing and our project in D is now almost ready for take off (at least for a beta flight). It deals with characters like "é", it is not dealing with English input. Hope the landing will be soft!

On Friday, 7 March 2014 at 17:39:41 UTC, H. S. Teoh wrote: >> Well if you consider really breaking changes, simply prohibiting >> plain random access to char[] and forcing to use either .raw or >> .decode is one thing I'd love to see (with .byGrapheme as library >> cherry on top) > > I don't understand what advantage this would bring. Making sure that whatever interpretation is chosen by the programmer it is actually a conscious choice and he does not hold any false illusions.

On 3/7/14, 1:56 AM, Dmitry Olshansky wrote: > 07-Mar-2014 07:22, bearophile пишет: >> Walter Bright: >> >>> You use ranges a lot. Would it break any of your code? >> >> I need to try the changes to be sure. But the magnitude of this change >> is so large that I guess some code will surely break. >> >> One advantage of your change is that this code will work: >> >> auto s = "hello".dup; >> s.sort(); > > Which it shouldn't unless there is an ascii type or some such. Correct. This is a win, not a failure, of the current approach. To sort the bytes in "hello" write: s.representation.sort(); which is indicative to the human and technically correct. Andrei

On 3/7/14, 9:24 AM, Vladimir Panteleev wrote: >> 5. Implement new std.array.front for strings that doesn't decode. > > Until then, how will people use strings with algorithms when they mean > to use them per-byte? A .raw property which casts to ubyte[]? There's no "until then". A current ".representation" property already exists that casts all string types appropriately. Andrei

On 2014-03-07 14:47:26 +0000, "Kagamin" <spam@here.lot> said: > On Friday, 7 March 2014 at 13:40:31 UTC, Michel Fortin wrote: >> if you want to parse XML then you'll need to work with code points (in theory, in practice you might still want direct access to code units for performance reasons) > > AFAIK, xml control characters are all ascii, and what's between them you can slice or dup without consideration, so code units should be more than enough. If you don't fully check for well-formness (as XML parsers ought to do according to the XML spec) then sure you can limit yourself to ASCII. You'll let through illegal characters in element and attribute names though. -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca

On 3/7/2014 5:56 AM, Adam D. Ruppe wrote: > On Friday, 7 March 2014 at 04:19:16 UTC, Walter Bright wrote: >> I'd rather fix the compiler's codegen than add a pragma. > > The codegen isn't broken, the current this pointer behavior is needed for full > compatibility with the C ABI. It would be opt in to an ABI tweak that the caller > needs to be aware of rather than an traditional optimization where the outside > world would never know. Oh, I see what you mean. But I think it does generate the same code, if you use it the same way. There is no 'get' function for ints; you aren't using it the same way.

On 3/7/2014 7:24 AM, Adam D. Ruppe wrote: > But you can't inline asm function, I intend to fix that for dmd, but haven't had the time. > and checking the overflow flag needs asm. (or a compiler intrinsic.) For that, I was thinking of having the compiler recognize one of the common coding patterns for detecting overflow, and then generating efficient overflow checks. Then documenting the pattern as being specially detected. This means the code will still be successful for compilers that don't detect the pattern, and no language changes would be required.

March 07, 2014

Re: Major performance problem with std.array.front()

Posted by Dmitry Olshansky
in reply to Andrei Alexandrescu

Permalink

Dmitry Olshansky

Posted in reply to Andrei Alexandrescu

Permalink

07-Mar-2014 23:11, Andrei Alexandrescu пишет:
> On 3/7/14, 9:24 AM, Vladimir Panteleev wrote:
>>> 5. Implement new std.array.front for strings that doesn't decode.
>>
>> Until then, how will people use strings with algorithms when they mean
>> to use them per-byte? A .raw property which casts to ubyte[]?
>
> There's no "until then".
>
> A current ".representation" property already exists that casts all
> string types appropriately.

There is however a big glaring failure: std.algorithm specialized for char[], wchar[] but not for any RandomAccessRange!char or RandomAccessRange!wchar.

So if I for instance get a custom slice type (e.g. a ring buffer), then I'm out of luck w/o both "auto-magic dchar range" and special code in std.algo that works with chars as code units.

If there is a way to exploit the duality of RA range of code units being
"is a" BD range of code points we certainly have failed with making it work (first of all doing horrible job at generic-ness as mentioned).

-- 
Dmitry Olshansky

Forums