Investigation: downsides of being generic and correct (page 3)

On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote: > 1. In general, if you want to operate on ASCII, and you want your code to be > fast, use immutable(ubyte)[], not immutable(char)[]. Obviously, that's not > gonig to work in this case, because the function is in std.string, but maybe > that's a reason for some std.string functions to have ubyte overloads which > are ASCII-specific. I was thinking exactly about that. Only thing I want to be advised on - is it better to add those overloads in std.string or separate module is better from the point of self-documentation? > 2. We actually discussed removing all of the pattern stuff completely and > replacing it with regexes. Is is kind of pre-approved? I am willing to add this to my TODO list together with needed benchmarks, but had some doubts that std.string depending on std.regex will be tolerated. > 3. While some functions in Phobos are well-optimized, there are plenty of them > which aren't. They do the job, but no one has taken the time to optimize their > implementations. This should be fixed, but again, it requires that someone > spends the time to do the optimizations, and while that has been done for some > functions, it definitely hasn't been done for all. And if python is faster than > D at something, odds are that either the code in question is poorly written or > that whatever Phobos functions it's using haven't been properly optimized yet. I understand that. What I tried to bring attention to is how big difference it may be for someone who just picks random functions and writes some simple code. It is very tempting to just say "Phobos (D) sucks" and don't get into details. In other words I consider it more of informational/marketing issue than a technical one. > - Jonathan M Davis Thanks for your response, it was really helpful.

May 17, 2013

Re: Investigation: downsides of being generic and correct

Posted by Jonathan M Davis
in reply to Dicebot

Permalink

Jonathan M Davis

Posted in reply to Dicebot

Permalink

On Friday, May 17, 2013 11:15:24 Dicebot wrote:
> On Thursday, 16 May 2013 at 19:15:57 UTC, Jonathan M Davis wrote:
> > 1. In general, if you want to operate on ASCII, and you want
> > your code to be
> > fast, use immutable(ubyte)[], not immutable(char)[]. Obviously,
> > that's not
> > gonig to work in this case, because the function is in
> > std.string, but maybe
> > that's a reason for some std.string functions to have ubyte
> > overloads which
> > are ASCII-specific.
> 
> I was thinking exactly about that. Only thing I want to be advised on - is it better to add those overloads in std.string or separate module is better from the point of self-documentation?

I'm not sure. My first inclination would be to simply put them as overloads in the same module, but that probably merits some discussion. And while I think that having ubyte overloads for strings for ASCII is something that we should at least explore, it probably merits some discussion as well, as we haven't really done a lot with handling ASCII outside of std.ascii at this point (which currently only operates on characters, not strings). My first inclination is to handle ASCII where necessary by accepting arrays of ubytes, but others here may have other ideas about that (which may or may not be better).

A side note of that is that we might want to consider is having a function called assumeASCII which casts from string to immutable(ubyte)[] (similar to assumeUnique). I think that that might have been suggested before, but even if it has, we've never actually added it.

> > 2. We actually discussed removing all of the pattern stuff
> > completely and
> > replacing it with regexes.
> 
> Is is kind of pre-approved? I am willing to add this to my TODO list together with needed benchmarks, but had some doubts that std.string depending on std.regex will be tolerated.

AFAIK, there would be no problem with doing so. Maybe Dmitry would have something to say about it, since he's the regex guru, but IIRC, the last time it was discussed, it was pretty clear that we wanted those functions to be using std.regex instead of patterns. So, if you did the work and did it at the appropriate quality level, I expect that it would be merged in. And we might or might now deprecate the pattern functions at that point (that was originally my intention and is why I never fixed their names, but we're not deprecating much now, so I don't know if we'll want to in this case).

> I understand that. What I tried to bring attention to is how big difference it may be for someone who just picks random functions and writes some simple code. It is very tempting to just say "Phobos (D) sucks" and don't get into details. In other words I consider it more of informational/marketing issue than a technical one.

We need to do more to optimize Phobos, but given our stance of correctness by default, we're kind of stuck with string functions taking a performance hit in a number of common cases simply due to the necessary decoding of code points. We can do better at making them fast, and reduce problems like this, but ultimately, if you want fast ASCII-only operations, you almost certainly need to operate on something like ubyte[] rather than string, and that requires educating people. It's one of the costs of trying to be both correct and performant.

- Jonathan M Davis

On Thursday, May 16, 2013 12:54:35 Walter Bright wrote: > On 5/16/2013 12:15 PM, Jonathan M Davis wrote: > > And if python is faster than > > D at something, odds are that either the code in question is poorly > > written or that whatever Phobos functions it's using haven't been > > properly optimized yet. > We should also be aware that while Python code itself is slow, its library functions are heavily optimized C code. So, if the benchmark consists of calling a Python library function, it'll run as fast as any optimized C code. I keep forgetting about that. That's a good thing to keep in mind when comparing performance - though part of me thinks that it says very poor things about your language if you have to write your code in other languages in order to make it fast enough (even if it were only the standard library where that happened). - Jonathan M Davis

On Friday, 17 May 2013 at 08:28:38 UTC, Jacob Carlborg wrote: > On 2013-05-16 21:54, Walter Bright wrote: > >> We should also be aware that while Python code itself is slow, its >> library functions are heavily optimized C code. So, if the benchmark >> consists of calling a Python library function, it'll run as fast as any >> optimized C code. > > But someone using Python won't care about that. Most of them will think they just use Python and have no idea there's optimized C code under the hood. I'm not sure how we can respond to that. If naive D code has to be significantly faster than optimised C for people to not go "D sucks, it's only as fast as python" then we're pretty much doomed by peoples stupidity.

On Friday, 17 May 2013 at 10:09:11 UTC, John Colvin wrote: > If naive D code has to be significantly faster than optimised C for people to not go "D sucks, it's only as fast as python" then we're pretty much doomed by peoples stupidity. No. The whole benefit of D is lost if you have to tweak everything in complex way to get it run fast. It means we failed at designing nice API. Dev don't have years to sped on every existing language to know if it is good or not and figure out all the subtelties.

On Friday, 17 May 2013 at 11:26:27 UTC, deadalnix wrote: > On Friday, 17 May 2013 at 10:09:11 UTC, John Colvin wrote: >> If naive D code has to be significantly faster than optimised C for people to not go "D sucks, it's only as fast as python" then we're pretty much doomed by peoples stupidity. > > No. The whole benefit of D is lost if you have to tweak everything in complex way to get it run fast. Define fast. In some cases, if a naive call to a generic phobos function is as fast as an equivalent python library function then i'd say that's pretty good. Those python library functions are often impressively fast.

On 05/17/2013 11:41 AM, Jonathan M Davis wrote: > We need to do more to optimize Phobos, but given our stance of correctness by > default, we're kind of stuck with string functions taking a performance hit in > a number of common cases simply due to the necessary decoding of code points. > We can do better at making them fast, and reduce problems like this, but > ultimately, if you want fast ASCII-only operations, you almost certainly need > to operate on something like ubyte[] rather than string, and that requires > educating people. It's one of the costs of trying to be both correct and > performant. At least I'm now educated on this :") // Samuel

Forums