View mode: basic / threaded / horizontal-split · Log in · Help
November 28, 2010
Re: String compare performance
== Quote from Jonathan M Davis (jmdavisProg@gmx.com)'s article
> On Saturday 27 November 2010 22:13:39 Austin Hastings wrote:
> > On 11/27/2010 11:53 AM, bearophile wrote:
> > > While translating a common Python script to D, I have found something
> > > interesting, so I have reduced it to a little benchmark.
> > >
> > > Below there is the reduced Python2 code (it uses Psyco), and a little
> > > program to generate some test data. The timing of the first D2 version
> > > is not good compared to the Python-Psyco program (the generation of the
> > > *300 array is a quick thing), so I have created two more D2 versions to
> > > show myself that D2 wasn't broken :-)
> > >
> > > The reduced code looks like a syntetic benchmark, but it has about the
> > > same performance profile of a 60 lines long Python script (the original
> > > code was using xrange(0,len(...)-3,3)  instead of xrange(len(...)-3),
> > > but the situation doesn't change much).
> >
> > It's not clear to me if the point of your post is "how do I make this go
> > faster?" or "Waa! D underperforms Python by default".
> What I think is pretty clear from this is that at some point, some work needs to
> be done to optimize array comparisons when memcmp can be used instead of having
> to call the individual opEquals() of each element. It's not the sort of thing
> that's likely to be a priority, but it really should be done at some point.
> - Jonathan M Davis

Just food for thought. GDC uses memcmp when using string comparisons in the first
implementation.

if (codon == "TAG" || codon == "TGA" || codon == "TAA")

And it is still the slowest case of the lot.

Regards
November 28, 2010
Re: String compare performance
Iain Buclaw:

> GDC uses memcmp when using string comparisons in the first implementation.
> 
> if (codon == "TAG" || codon == "TGA" || codon == "TAA")
> 
> And it is still the slowest case of the lot.

The asm of the first version of the function compiled with gdc uses "repz cmpsb". And while I don't remember precise timings for it, I think it was about two times faster than the first version compiled with dmd (on Windows).

Regarding the idea of using memcmp, I have added a comment to the issue 5282.

Bye,
bearophile
November 28, 2010
Re: String compare performance
On Sat, 27 Nov 2010 22:41:45 -0800
Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> > What I think is pretty clear from this is that at some point, some work
> > needs to be done to optimize array comparisons when memcmp can be used
> > instead of having to call the individual opEquals() of each element. It's
> > not the sort of thing that's likely to be a priority, but it really should
> > be done at some point.  
> 
> Enhancement Request: http://d.puremagic.com/issues/show_bug.cgi?id=5282

Yes, but other comments seems to show memcmp only doubles speed. This would only bring us twice as slow as python ;-)

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com
November 28, 2010
Re: String compare performance
spir:

> Yes, but other comments seems to show memcmp only doubles speed. This would only bring us twice as slow as python ;-)

The solution I suggest for this problem is: when DMD knows at compile-time the length of one of the two strings to equate, and such length is small (like < 6), then it may replace the string eq code like this:
(codon == "TAG")
With code like:
(codon.length == 3 && codon[0] == 'T' && codon[1] == 'A' && codon[2] == 'G')

Bye,
bearophile
November 29, 2010
Re: String compare performance
On 2010-11-28 20:57:38 -0500, bearophobic <notbear@cave.net> said:

> Stewart Gordon Wrote:
> 
>> On 27/11/2010 23:04, Kagamin wrote:
>>> bearophile Wrote:
>>> 
>>>>> Also, is there a way to bit-compare given memory areas at much
>>>>> higher speed than element per element (I mean for arrays in
>>>>> general)?
>>>> 
>>>> I don't know. I think you can't.
>>> 
>>> You can use memcmp, though only for utf-8 strings.
>> 
>> Only for utf-8 strings?  Why's that?  I would've thought memcmp to be
>> type agnostic.
>> 
>> Stewart.
> 
> D community is amazing cult of premature optimization fans. Any one of 
> you heard of canonically equivalent sequences? The integrated Unicode 
> support is a clusterfuck. Please do compare ASCII strings with memcmp, 
> but no Unicode. Where did the original poster pull this problem from, 
> his ass? "My system runs 100,000,000,000 instructions per second, but 
> this comparison of 4 letter strings uses 5 cycles too much! This is the 
> only problem on the way to world domination with my $500 Microsoft Word 
> clone!". No wait, the problems are completely imaginatory.

Comparing unicode UTF-* strings using memcmp is fine as long as what 
you want to know is whether the code points are the same. If your point 
was that per-code-point comparisons aren't the right way to compare 
Unicode strings (in most situations), then I support this view too. 
Though if that's what you wanted to say, you could have made your point 
clearer.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
December 08, 2010
Re: String compare performance
On 29/11/2010 02:11, Michel Fortin wrote:
> On 2010-11-28 20:57:38 -0500, bearophobic <notbear@cave.net> said:
>
>> Stewart Gordon Wrote:
>>
>>> On 27/11/2010 23:04, Kagamin wrote:
>>>> bearophile Wrote:
>>>>
>>>>>> Also, is there a way to bit-compare given memory areas at much
>>>>>> higher speed than element per element (I mean for arrays in
>>>>>> general)?
>>>>>
>>>>> I don't know. I think you can't.
>>>>
>>>> You can use memcmp, though only for utf-8 strings.
>>>
>>> Only for utf-8 strings? Why's that? I would've thought memcmp to be
>>> type agnostic.
>>>
>>> Stewart.
>>
>> D community is amazing cult of premature optimization fans. Any one of
>> you heard of canonically equivalent sequences? The integrated Unicode
>> support is a clusterfuck. Please do compare ASCII strings with memcmp,
>> but no Unicode. Where did the original poster pull this problem from,
>> his ass? "My system runs 100,000,000,000 instructions per second, but
>> this comparison of 4 letter strings uses 5 cycles too much! This is
>> the only problem on the way to world domination with my $500 Microsoft
>> Word clone!". No wait, the problems are completely imaginatory.
>
> Comparing unicode UTF-* strings using memcmp is fine as long as what you
> want to know is whether the code points are the same. If your point was
> that per-code-point comparisons aren't the right way to compare Unicode
> strings (in most situations), then I support this view too. Though if
> that's what you wanted to say, you could have made your point clearer.
>
>

Why are people still replying to nameless trolls? There has been several 
cases of that in recent threads. :/

-- 
Bruno Medeiros - Software Engineer
December 08, 2010
Re: String compare performance
Bruno Medeiros Wrote:

> On 29/11/2010 02:11, Michel Fortin wrote:
> > On 2010-11-28 20:57:38 -0500, bearophobic <notbear@cave.net> said:
> >
> >> Stewart Gordon Wrote:
> >>
> >>> On 27/11/2010 23:04, Kagamin wrote:
> >>>> bearophile Wrote:
> >>>>
> >>>>>> Also, is there a way to bit-compare given memory areas at much
> >>>>>> higher speed than element per element (I mean for arrays in
> >>>>>> general)?
> >>>>>
> >>>>> I don't know. I think you can't.
> >>>>
> >>>> You can use memcmp, though only for utf-8 strings.
> >>>
> >>> Only for utf-8 strings? Why's that? I would've thought memcmp to be
> >>> type agnostic.
> >>>
> >>> Stewart.
> >>
> >> D community is amazing cult of premature optimization fans. Any one of
> >> you heard of canonically equivalent sequences? The integrated Unicode
> >> support is a clusterfuck. Please do compare ASCII strings with memcmp,
> >> but no Unicode. Where did the original poster pull this problem from,
> >> his ass? "My system runs 100,000,000,000 instructions per second, but
> >> this comparison of 4 letter strings uses 5 cycles too much! This is
> >> the only problem on the way to world domination with my $500 Microsoft
> >> Word clone!". No wait, the problems are completely imaginatory.
> >
> > Comparing unicode UTF-* strings using memcmp is fine as long as what you
> > want to know is whether the code points are the same. If your point was
> > that per-code-point comparisons aren't the right way to compare Unicode
> > strings (in most situations), then I support this view too. Though if
> > that's what you wanted to say, you could have made your point clearer.
> >
> >
> 
> Why are people still replying to nameless trolls? There has been several 
> cases of that in recent threads. :/

Trololol. Maybe they're a bit dumb, my brother. If they some day become smarter, they'll stop using D. They see how much shit it is.

I miss my wife. Oh god.... bring back my life! Bring me my.. sandwich!
December 09, 2010
Re: String compare performance
On 08/12/2010 15:02, Brüno Mediocre wrote:
> Bruno Medeiros Wrote:
>>
>> Why are people still replying to nameless trolls? There has been several
>> cases of that in recent threads. :/
>
> Trololol. Maybe they're a bit dumb, my brother. If they some day become smarter, they'll stop using D. They see how much shit it is.
>
> I miss my wife. Oh god.... bring back my life! Bring me my.. sandwich!

Lol, "Brüno Mediocre", well thought, that's actually funny.

-- 
Brüno Mediocre - Software Engineer
December 09, 2010
Re: String compare performance
Bruno Medeiros schrieb:
> On 08/12/2010 15:02, Brüno Mediocre wrote:
>> Bruno Medeiros Wrote:
>>>
>>> Why are people still replying to nameless trolls? There has been several
>>> cases of that in recent threads. :/
>>
>> Trololol. Maybe they're a bit dumb, my brother. If they some day 
>> become smarter, they'll stop using D. They see how much shit it is.
>>
>> I miss my wife. Oh god.... bring back my life! Bring me my.. sandwich!
> 
> Lol, "Brüno Mediocre", well thought, that's actually funny.
> 

Nah, I think it's a rather mediocre joke.
Next ›   Last »
1 2
Top | Discussion index | About this forum | D home