Thread overview
[Issue 9645] New: std.algorithm.splitter on string with char as separator performs badly in certain situations
March 04, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9645

           Summary: std.algorithm.splitter on string with char as
                    separator performs badly in certain situations
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: schveiguy@yahoo.com


--- Comment #0 from Steven Schveighoffer <schveiguy@yahoo.com> 2013-03-04 10:55:33 PST ---
std.algorithm.splitter where the separator is a char can perform worse than the version where the separator is a substring.  It should be at worse equivalent, since you can always degrade to the substring version given the single char separator.  I would expect it to be faster.

In cases where the content to separator ratio is 1 to 1 up to a certain point (did not test for it), then the single-char version performs better than its substring counterpart.  But in cases where the ratio of content to separator is large (arguably the more common case), it does horribly.

A simple test (ratio of 35:1 for content to separator):

import std.datetime;
import std.algorithm;
import std.stdio;

string line = "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbba";

void main()
{
    enum iterations = 1_000_000;
    auto t1 = Clock.currTime();
    foreach(i; 0..iterations)
    {
        foreach(s; splitter(line, 'a'))
        {
            if(i == 0)
                writeln(s);
        }
    }
    writefln("char splitter took %s", Clock.currTime()-t1);

    t1 = Clock.currTime();
    foreach(i; 0..iterations)
    {
        foreach(s; splitter(line, "a"))
        {
            if(i == 0)
                writeln(s);
        }
    }
    writefln("substring splitter took %s", Clock.currTime()-t1);
}

result (macbook pro 2.2GHz core i7):

bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

char splitter took 2 secs, 702 ms, and 188 μs bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

substring splitter took 1 sec, 71 ms, and 736 μs

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
March 04, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9645


bearophile_hugs@eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs@eml.cc


--- Comment #1 from bearophile_hugs@eml.cc 2013-03-04 13:37:29 PST ---
See also Issue 8013

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 22, 2013
http://d.puremagic.com/issues/show_bug.cgi?id=9645


monarchdodra@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra@gmail.com


--- Comment #2 from monarchdodra@gmail.com 2013-08-22 07:36:21 PDT ---
(In reply to comment #0)
> char splitter took 2 secs, 702 ms, and 188 μs
>
> substring splitter took 1 sec, 71 ms, and 736 μs

Changes where made to improve the situation, However, due to performance bug 10848, it is still slower.

That said, I have a standing performance improvement pull for find:
https://github.com/D-Programming-Language/phobos/pull/1492
fix Issue 10403 - memchr optimization for std.algorithm.find
http://d.puremagic.com/issues/show_bug.cgi?id=10403

With it, I'm getting:
char splitter took 429 ms, 298 μs, and 8 hnsecs
substring splitter took 2 secs, 323 ms, 301 μs, and 7 hnsecs

Gone from twice as slow to five times as fast. Not bad :)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------