March 02, 2013
On 3/2/13 2:32 PM, Steven Schveighoffer wrote:
> This is not a personal attack, please don't take it that way.

Not at all. I'm just confused about this exchange. You mention some correct facts ("MySplitter finished in 6 seconds instead of 10"), then some mistaken suppositions ("a string-specific version could be written"), and derive conclusions from both ("Maybe there is a bad constraint somewhere"). Hard to deal with the mix.

> My
> anecdotal tests with hand writing a custom splitter range to handle the
> OP's program gave me a 40% savings. Whether it's find or not, I'm not
> sure, but there definitely is room for improvement.

I think I understand where the confusion comes from. If you're referring to MySplitter, that's not comparable. It uses this at the core:

for(; i + 1 < source.length; i++)
{
    if(source[i] == '\r' && source[i + 1] == '\n')
    {
        found = true;
        break;
    }
    ...
}

This is not close to the code that would work with arrays. We could specialize things for small arrays, but that hasn't been done yet. My point is it's not comparable with the classic brute force subsequence search.

What is comparable is shown below. indexOf2 defines the straight array-in-array brute force search. With dmd on my machine it runs in some 7 seconds. So does indexOf (this was my point). Then indexOf3 uses a simple trick that the Java implementation does: cache the first character. That reduces the running time to 5 seconds. Then indexOf4 optimizes the loop searching that first character, bringing run time to about 4 seconds. On that machine the Java implementation runs in 3.5 seconds.


Andrei

import std.stdio, std.string, std.datetime;

long indexOf2(string haystack, string needle)
{
    if (!needle.length) return 0;
    if (needle.length > haystack.length) return -1;
    auto limit = haystack.length - needle.length;
 bigloop:
    for (size_t i; i <= limit; ++i)
    {
        foreach (q; 0 .. needle.length)
        {
            if (haystack[i + q] != needle[q]) continue bigloop;
        }
        return i;
    }
    return -1;
}

long indexOf3(string haystack, string needle)
{
    if (!needle.length) return 0;
    if (needle.length > haystack.length) return -1;
    immutable c = needle[0];
    auto limit = haystack.length - needle.length;
    assert(limit < uint.max, "not handled");
 bigloop:
    for (uint i; i <= limit; ++i)
    {
        if (haystack.ptr[i] != c) continue;
        foreach (q; 1 .. needle.length)
        {
            if (haystack[i + q] != needle[q]) continue bigloop;
        }
        return i;
    }
    return -1;
}

long indexOf4(string haystack, string needle)
{
    if (!needle.length) return 0;
    if (needle.length > haystack.length) return -1;
    immutable c = needle[0];
    auto limit = haystack.length - needle.length;
    size_t i = 0;
 bigloop:
    for (;;)
    {
        while (i <= limit && haystack[i++] != c) {}
        if (i-- > limit) break;
        foreach (q; 1 .. needle.length)
        {
            if (haystack[i + q] != needle[q]) continue bigloop;
        }
        return i;
    }
    return -1;
}

unittest
{
    assert(indexOf2("hello, world", ",") == 5);
    assert(indexOf2("hello, world", "a") == -1);
    assert(indexOf3("hello, world", ",") == 5);
    assert(indexOf3("hello, world", "a") == -1);
    assert(indexOf4("hello, world", ",") == 5);
    assert(indexOf4("hello, world", "a") == -1);
}

void main(string[] args)
{
  auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345@10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\nTo: <sip:12345@comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345@comm.example.com>;tag=2910497618150713\r\n\r\n";
  auto t1 = Clock.currTime();
  for (int i=0; i<10000000; i++)
  {
    long index1 = 0;
    while (true)
    {
      auto index2 = indexOf(message[index1 .. $], "\r\n");
      if (index2 == -1)
        break;
      auto notused = message[index1 .. index1 + index2];
      if (args.length == 42) writeln(notused);
      index1 += index2 + 2;
    }
  }
  writeln(Clock.currTime()-t1);
}





March 02, 2013
On Sat, 02 Mar 2013 16:16:22 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> I fear there's a misunderstanding somewhere. Splitter and find are already specialized appropriately for strings.

OK, I would then counter that I have created a splitter range that beats its performance, by a lot.  As far as I can tell, I'm not doing anything special.

All I feel that I'm doing differently is instead of using find or indexOf, I'm looking at each char to see if it matches the first in the separator, and if it does, then I'm comparing the remaining characters.  Pretty braindead, probably room for improvement.

Splitter should beat this, or match it at least.  From testing the sample code in this thread, my simple version beats splitter by 40%.  Something isn't right somewhere, the difference should not be that drastic.

-Steve
March 02, 2013
On 3/2/13 5:48 PM, Steven Schveighoffer wrote:
> On Sat, 02 Mar 2013 16:16:22 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> I fear there's a misunderstanding somewhere. Splitter and find are
>> already specialized appropriately for strings.
>
> OK, I would then counter that I have created a splitter range

Prolly a link would clarify that.

Andrei
March 03, 2013
On Sat, 02 Mar 2013 18:14:28 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 3/2/13 5:48 PM, Steven Schveighoffer wrote:
>> On Sat, 02 Mar 2013 16:16:22 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> I fear there's a misunderstanding somewhere. Splitter and find are
>>> already specialized appropriately for strings.
>>
>> OK, I would then counter that I have created a splitter range
>
> Prolly a link would clarify that.

Forget this, I read your other post.  The MySplitter range is the one I was referring to.

I saw your points and will respond there, but not today.  I don't have time right now to continue the debate.

-Steve
March 03, 2013
On Sat, 2013-03-02 at 12:52 -0800, Walter Bright wrote:
> On 3/2/2013 12:08 PM, H. S. Teoh wrote:
> > On Sat, Mar 02, 2013 at 12:02:08PM -0800, Walter Bright wrote:
> >> On 3/2/2013 11:42 AM, Russel Winder wrote:
> >>> On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
> >>>> On 3/2/2013 7:43 AM, Russel Winder wrote:
> >>>>> For writing interpreters, RPython spanks C.
> >>>>
> >>>> What's RPython doing that makes it faster?
> >>>
> >>> Allowing PyPy to have a good JIT compiler.
> >>
> >> I don't understand. Does that JIT generate faster code than a C compiler would generate?
> >
> > I don't know, but my wild guess is that a JIT optimizes the *right* hotspots based on real-time performance measurements, whereas a lot of C programmers are obsessed with optimizing what they *think* are the hotspots, but which really aren't.
> 
> I meant what the C *compiler* generates.

Yes because the C/C++/D/etc. compilers are attempting to predict the control flow of the program in execution and optimize all cases for all possibilities. JIT's are just focussing on the runtime bottlenecks with the actual data as being used. This allows for more focussed code generation in the actual context. I would suspect that in many cases the generated code is effectively the same but JITs can often do unexpected and faster codes because they have more data to optimize with and less optimization to do.

To say more would require actual comparative data and I suspect no-one on this list will do that.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder


March 03, 2013
03-Mar-2013 12:03, Russel Winder пишет:
> On Sat, 2013-03-02 at 12:52 -0800, Walter Bright wrote:
>> On 3/2/2013 12:08 PM, H. S. Teoh wrote:
>>> On Sat, Mar 02, 2013 at 12:02:08PM -0800, Walter Bright wrote:
>>>> On 3/2/2013 11:42 AM, Russel Winder wrote:
>>>>> On Sat, 2013-03-02 at 11:30 -0800, Walter Bright wrote:
>>>>>> On 3/2/2013 7:43 AM, Russel Winder wrote:
>>>>>>> For writing interpreters, RPython spanks C.
>>>>>>
>>>>>> What's RPython doing that makes it faster?
>>>>>
>>>>> Allowing PyPy to have a good JIT compiler.
>>>>
>>>> I don't understand. Does that JIT generate faster code than a C
>>>> compiler would generate?
>>>
>>> I don't know, but my wild guess is that a JIT optimizes the *right*
>>> hotspots based on real-time performance measurements, whereas a lot of C
>>> programmers are obsessed with optimizing what they *think* are the
>>> hotspots, but which really aren't.
>>
>> I meant what the C *compiler* generates.
>
> Yes because the C/C++/D/etc. compilers are attempting to predict the
> control flow of the program in execution and optimize all cases for all
> possibilities. JIT's are just focussing on the runtime bottlenecks with
> the actual data as being used. This allows for more focussed code
> generation in the actual context. I would suspect that in many cases the
> generated code is effectively the same but JITs can often do unexpected
> and faster codes because they have more data to optimize with and less
> optimization to do.
>

Only one thing missing here is that JITs typically has much less time to do all and any of this.

> To say more would require actual comparative data and I suspect no-one
> on this list will do that.
>
Indeed unlikely.

-- 
Dmitry Olshansky
March 03, 2013
On Sunday, 3 March 2013 at 08:03:53 UTC, Russel Winder wrote:
> Yes because the C/C++/D/etc. compilers are attempting to predict the
> control flow of the program in execution and optimize all cases for all
> possibilities. JIT's are just focussing on the runtime bottlenecks with
> the actual data as being used. This allows for more focussed code
> generation in the actual context. I would suspect that in many cases the
> generated code is effectively the same but JITs can often do unexpected
> and faster codes because they have more data to optimize with and less
> optimization to do.
>

That is theory, but in practice, it doesn't work that well : you have instrument the code to get measure to optimize according to runtime data. Plus you can't monkey patch codegen (it is unspecified how x86 CPU load instructions, and so it isn't guaranteed that the CPU won't see garbage). That cost, plus the cost of the optimization itself will make the whole thing worth it only if the win when you optimize is high.

> To say more would require actual comparative data and I suspect no-one
> on this list will do that.

It isn't worth. As explained above, you can conclude that this is highly dependent of the code you manipulate and the runtime data that is throw at it. Note that I discussed with people from LLVM on how this can be done in statically compiled code. In fact, it can but it is rather complicated and not worth automate for now.
March 03, 2013
On 3/3/2013 12:03 AM, Russel Winder wrote:
> Yes because the C/C++/D/etc. compilers are attempting to predict the
> control flow of the program in execution and optimize all cases for all
> possibilities. JIT's are just focussing on the runtime bottlenecks with
> the actual data as being used. This allows for more focussed code
> generation in the actual context. I would suspect that in many cases the
> generated code is effectively the same but JITs can often do unexpected
> and faster codes because they have more data to optimize with and less
> optimization to do.

I've heard this was the advantage of JITs for the last 15 years. I haven't seen any data to justify it, and I have trouble thinking of any significant improvements to code gen that could be made with runtime data.

That is for JITting statically typed languages. The situation is a bit better for dynamic ones, because the JIT can do a better job than a compiler because most of the time, only one type is used in a particular spot, and the JIT can pick that up and "statically" type it, as opposed to a compiler which cannot.

March 03, 2013
Walter Bright:

> and I have trouble thinking of any significant improvements to code gen that could be made with runtime data.

I've seen that the JavaHotSpot is able to unroll loops with a length known only at run-time. It unrolls them only when the run-time statistics say it's advantageous.

(In theory the same is possible with profile-driven optimization).

Bye,
bearophile
March 03, 2013
On Sunday, 3 March 2013 at 11:57:36 UTC, bearophile wrote:
> Walter Bright:
>
>> and I have trouble thinking of any significant improvements to code gen that could be made with runtime data.
>
> I've seen that the JavaHotSpot is able to unroll loops with a length known only at run-time. It unrolls them only when the run-time statistics say it's advantageous.
>
> (In theory the same is possible with profile-driven optimization).
>
> Bye,
> bearophile

I believe the Hotspot runtime compiler is about the only one to perform runtime optimizations like that. I don't know if the CIL does this kind of things.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17