String compare performance (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » String compare performance (page 2)

November 28, 2010

Re: String compare performance

Posted by Iain Buclaw
in reply to Jonathan M Davis

Iain Buclaw

Posted in reply to Jonathan M Davis

== Quote from Jonathan M Davis (jmdavisProg@gmx.com)'s article
> On Saturday 27 November 2010 22:13:39 Austin Hastings wrote:
> > On 11/27/2010 11:53 AM, bearophile wrote:
> > > While translating a common Python script to D, I have found something interesting, so I have reduced it to a little benchmark.
> > >
> > > Below there is the reduced Python2 code (it uses Psyco), and a little program to generate some test data. The timing of the first D2 version is not good compared to the Python-Psyco program (the generation of the *300 array is a quick thing), so I have created two more D2 versions to show myself that D2 wasn't broken :-)
> > >
> > > The reduced code looks like a syntetic benchmark, but it has about the
> > > same performance profile of a 60 lines long Python script (the original
> > > code was using xrange(0,len(...)-3,3)  instead of xrange(len(...)-3),
> > > but the situation doesn't change much).
> >
> > It's not clear to me if the point of your post is "how do I make this go faster?" or "Waa! D underperforms Python by default".
> What I think is pretty clear from this is that at some point, some work needs to
> be done to optimize array comparisons when memcmp can be used instead of having
> to call the individual opEquals() of each element. It's not the sort of thing
> that's likely to be a priority, but it really should be done at some point.
> - Jonathan M Davis

Just food for thought. GDC uses memcmp when using string comparisons in the first implementation.

if (codon == "TAG" || codon == "TGA" || codon == "TAA")

And it is still the slowest case of the lot.

Regards

November 28, 2010

Re: String compare performance

Posted by bearophile
in reply to Iain Buclaw

bearophile

Posted in reply to Iain Buclaw

Iain Buclaw:

> GDC uses memcmp when using string comparisons in the first implementation.
> 
> if (codon == "TAG" || codon == "TGA" || codon == "TAA")
> 
> And it is still the slowest case of the lot.

The asm of the first version of the function compiled with gdc uses "repz cmpsb". And while I don't remember precise timings for it, I think it was about two times faster than the first version compiled with dmd (on Windows).

Regarding the idea of using memcmp, I have added a comment to the issue 5282.

Bye,
bearophile

November 28, 2010

Re: String compare performance

Posted by spir

spir

On Sat, 27 Nov 2010 22:41:45 -0800
Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> > What I think is pretty clear from this is that at some point, some work needs to be done to optimize array comparisons when memcmp can be used instead of having to call the individual opEquals() of each element. It's not the sort of thing that's likely to be a priority, but it really should be done at some point.
> 
> Enhancement Request: http://d.puremagic.com/issues/show_bug.cgi?id=5282

Yes, but other comments seems to show memcmp only doubles speed. This would only bring us twice as slow as python ;-)

Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com

November 28, 2010

Re: String compare performance

Posted by bearophile
in reply to spir

bearophile

Posted in reply to spir

spir:

> Yes, but other comments seems to show memcmp only doubles speed. This would only bring us twice as slow as python ;-)

The solution I suggest for this problem is: when DMD knows at compile-time the length of one of the two strings to equate, and such length is small (like < 6), then it may replace the string eq code like this:
(codon == "TAG")
With code like:
(codon.length == 3 && codon[0] == 'T' && codon[1] == 'A' && codon[2] == 'G')

Bye,
bearophile

November 29, 2010

Re: String compare performance

Posted by Michel Fortin
in reply to bearophobic

Michel Fortin

Posted in reply to bearophobic

On 2010-11-28 20:57:38 -0500, bearophobic <notbear@cave.net> said:

> Stewart Gordon Wrote:
> 
>> On 27/11/2010 23:04, Kagamin wrote:
>>> bearophile Wrote:
>>> 
>>>>> Also, is there a way to bit-compare given memory areas at much
>>>>> higher speed than element per element (I mean for arrays in
>>>>> general)?
>>>> 
>>>> I don't know. I think you can't.
>>> 
>>> You can use memcmp, though only for utf-8 strings.
>> 
>> Only for utf-8 strings?  Why's that?  I would've thought memcmp to be
>> type agnostic.
>> 
>> Stewart.
> 
> D community is amazing cult of premature optimization fans. Any one of you heard of canonically equivalent sequences? The integrated Unicode support is a clusterfuck. Please do compare ASCII strings with memcmp, but no Unicode. Where did the original poster pull this problem from, his ass? "My system runs 100,000,000,000 instructions per second, but this comparison of 4 letter strings uses 5 cycles too much! This is the only problem on the way to world domination with my $500 Microsoft Word clone!". No wait, the problems are completely imaginatory.

Comparing unicode UTF-* strings using memcmp is fine as long as what you want to know is whether the code points are the same. If your point was that per-code-point comparisons aren't the right way to compare Unicode strings (in most situations), then I support this view too. Though if that's what you wanted to say, you could have made your point clearer.


-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/

December 08, 2010

Re: String compare performance

Posted by Bruno Medeiros
in reply to Michel Fortin

Bruno Medeiros

Posted in reply to Michel Fortin

On 29/11/2010 02:11, Michel Fortin wrote:
> On 2010-11-28 20:57:38 -0500, bearophobic <notbear@cave.net> said:
>
>> Stewart Gordon Wrote:
>>
>>> On 27/11/2010 23:04, Kagamin wrote:
>>>> bearophile Wrote:
>>>>
>>>>>> Also, is there a way to bit-compare given memory areas at much
>>>>>> higher speed than element per element (I mean for arrays in
>>>>>> general)?
>>>>>
>>>>> I don't know. I think you can't.
>>>>
>>>> You can use memcmp, though only for utf-8 strings.
>>>
>>> Only for utf-8 strings? Why's that? I would've thought memcmp to be
>>> type agnostic.
>>>
>>> Stewart.
>>
>> D community is amazing cult of premature optimization fans. Any one of
>> you heard of canonically equivalent sequences? The integrated Unicode
>> support is a clusterfuck. Please do compare ASCII strings with memcmp,
>> but no Unicode. Where did the original poster pull this problem from,
>> his ass? "My system runs 100,000,000,000 instructions per second, but
>> this comparison of 4 letter strings uses 5 cycles too much! This is
>> the only problem on the way to world domination with my $500 Microsoft
>> Word clone!". No wait, the problems are completely imaginatory.
>
> Comparing unicode UTF-* strings using memcmp is fine as long as what you
> want to know is whether the code points are the same. If your point was
> that per-code-point comparisons aren't the right way to compare Unicode
> strings (in most situations), then I support this view too. Though if
> that's what you wanted to say, you could have made your point clearer.
>
>

Why are people still replying to nameless trolls? There has been several cases of that in recent threads. :/

-- 
Bruno Medeiros - Software Engineer

December 08, 2010

Re: String compare performance

Posted by Brüno Mediocre
in reply to Bruno Medeiros

Brüno Mediocre

Posted in reply to Bruno Medeiros

Bruno Medeiros Wrote:

> On 29/11/2010 02:11, Michel Fortin wrote:
> > On 2010-11-28 20:57:38 -0500, bearophobic <notbear@cave.net> said:
> >
> >> Stewart Gordon Wrote:
> >>
> >>> On 27/11/2010 23:04, Kagamin wrote:
> >>>> bearophile Wrote:
> >>>>
> >>>>>> Also, is there a way to bit-compare given memory areas at much higher speed than element per element (I mean for arrays in general)?
> >>>>>
> >>>>> I don't know. I think you can't.
> >>>>
> >>>> You can use memcmp, though only for utf-8 strings.
> >>>
> >>> Only for utf-8 strings? Why's that? I would've thought memcmp to be type agnostic.
> >>>
> >>> Stewart.
> >>
> >> D community is amazing cult of premature optimization fans. Any one of you heard of canonically equivalent sequences? The integrated Unicode support is a clusterfuck. Please do compare ASCII strings with memcmp, but no Unicode. Where did the original poster pull this problem from, his ass? "My system runs 100,000,000,000 instructions per second, but this comparison of 4 letter strings uses 5 cycles too much! This is the only problem on the way to world domination with my $500 Microsoft Word clone!". No wait, the problems are completely imaginatory.
> >
> > Comparing unicode UTF-* strings using memcmp is fine as long as what you want to know is whether the code points are the same. If your point was that per-code-point comparisons aren't the right way to compare Unicode strings (in most situations), then I support this view too. Though if that's what you wanted to say, you could have made your point clearer.
> >
> >
> 
> Why are people still replying to nameless trolls? There has been several cases of that in recent threads. :/

Trololol. Maybe they're a bit dumb, my brother. If they some day become smarter, they'll stop using D. They see how much shit it is.

I miss my wife. Oh god.... bring back my life! Bring me my.. sandwich!

December 09, 2010

Re: String compare performance

Posted by Bruno Medeiros
in reply to Brüno Mediocre

Bruno Medeiros

Posted in reply to Brüno Mediocre

On 08/12/2010 15:02, Brüno Mediocre wrote:
> Bruno Medeiros Wrote:
>>
>> Why are people still replying to nameless trolls? There has been several
>> cases of that in recent threads. :/
>
> Trololol. Maybe they're a bit dumb, my brother. If they some day become smarter, they'll stop using D. They see how much shit it is.
>
> I miss my wife. Oh god.... bring back my life! Bring me my.. sandwich!

Lol, "Brüno Mediocre", well thought, that's actually funny.

-- 
Brüno Mediocre - Software Engineer

December 09, 2010

Re: String compare performance

Posted by Daniel Gibson
in reply to Bruno Medeiros

Daniel Gibson

Posted in reply to Bruno Medeiros

Bruno Medeiros schrieb:
> On 08/12/2010 15:02, Brüno Mediocre wrote:
>> Bruno Medeiros Wrote:
>>>
>>> Why are people still replying to nameless trolls? There has been several
>>> cases of that in recent threads. :/
>>
>> Trololol. Maybe they're a bit dumb, my brother. If they some day become smarter, they'll stop using D. They see how much shit it is.
>>
>> I miss my wife. Oh god.... bring back my life! Bring me my.. sandwich!
> 
> Lol, "Brüno Mediocre", well thought, that's actually funny.
> 

Nah, I think it's a rather mediocre joke.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation