March 02, 2013
On Saturday, 2 March 2013 at 01:05:35 UTC, cvk012c wrote:
> On Saturday, 2 March 2013 at 00:47:02 UTC, Steven Schveighoffer wrote:
>> On Fri, 01 Mar 2013 19:19:35 -0500, cvk012c <cvk012c@motorolasolutions.com> wrote:
>>
>>> On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
>>>> cvk012c:
>>>>
>>>>> I think that similar Perl and Java scripts will outperform D easily.
>>>>> Thanks Andrei and simendsjo for a quick response though.
>>>>
>>>> Why don't you write a Java version? It takes only few minutes, and you will have one more data point.
>>>>
>>>
>>> You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options:  -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4.
>>> D experts, please, take a look at my code and tell me what is wrong with it.
>>
>> The issue is a combination of the fact that:
>> 1. splitter is designed for any range, not just strings.  Not an excuse really, but a string-specific version could be written that does better (clearly).
>
> In my latest version of D script I didn't use splitter at all. I used string specific indexOf function. Still result is very bad. For text based protocols, such as SIP, performance of string manipulating functions is very important. Unfortunately, looks like it is not D strongest point at this time.

My version is functionally equivalent, and measures 55 - 94 ms (depending on compiler; LDC is best). Your version performed in about 7s on my machine.

Similar optimization taking advantage of immutability can be done on your python translation that results in measurements of <1ms.


import std.stdio,std.string,std.datetime;

void main()
{
  auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345@10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\nTo: <sip:12345@comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345@comm.example.com>;tag=2910497618150713\r\n\r\n";
  auto t1 = Clock.currTime();
  for (int i=0; i<10_000_000; i++)
  {
    while (true)
    {
      auto index = indexOf(message, "\r\n");
      if (index == -1)
        break;
      auto notused = message[0..index];
      message = message[index+2..$];
    }
  }
  writeln(Clock.currTime()-t1);
}
March 02, 2013
On Fri, 01 Mar 2013 20:05:34 -0500, cvk012c <cvk012c@motorolasolutions.com> wrote:

> In my latest version of D script I didn't use splitter at all. I used string specific indexOf function. Still result is very bad. For text based protocols, such as SIP, performance of string manipulating functions is very important. Unfortunately, looks like it is not D strongest point at this time.

indexOf uses the same mechanism as splitter to find the separators.  If it doesn't improve anything, I'd say that is where the problem lies (std.algorithm.find).

-Steve
March 02, 2013
On 3/1/2013 5:19 PM, anon123 wrote:
> My version is functionally equivalent,

I don't think so. Take a look at what happens to the message variable. It is never restored to its original string on the next iteration of the loop.

March 02, 2013
On Fri, 01 Mar 2013 20:19:19 -0500, anon123 <z@z.z> wrote:

> My version is functionally equivalent, and measures 55 - 94 ms (depending on compiler; LDC is best). Your version performed in about 7s on my machine.

Try printing out message each time through the 10,000,000 iterations

-Steve
March 02, 2013
On Friday, 1 March 2013 at 22:02:02 UTC, Timon Gehr wrote:
> On 03/01/2013 10:28 PM, cvk012c wrote:
>> ...
>>
>> On my hardware with -inline options it now takes about 15 secs which is
>> still slower than Python but with both -inline and -noboundscheck it
>> takes 13 secs and finally beats Python.
>> But I still kind of disappointed because I expected a much better
>> performance boost and got only 7%. Counting that Python is not the
>> fastest scripting language I think that similar Perl and Java scripts
>> will outperform D easily.
>
> Never make such statements without doing actual measurements. Furthermore, it is completely meaningless anyway. Performance benchmarks always compare language implementations, not languages.
>
> (Case in point: You get twice the speed by using another compiler backend implementation.)

Still, there is a case to be made for a performance tests suite that could be run after (or before) each release of the language, like http://speed.pypy.org
.
March 02, 2013
On Saturday, 2 March 2013 at 00:47:02 UTC, Steven Schveighoffer wrote:
> Try my hand-written version (elsewhere in thread).  I think it can be done better too (use pointers instead of arrays).
>

That is usually a bad idea as it will fuck up pretty bad the aliasing analysis capabilities of the compiler.
March 02, 2013
On Saturday, 2 March 2013 at 01:10:39 UTC, Walter Bright wrote:
> On 3/1/2013 1:28 PM, cvk012c wrote:
>> But I still kind of disappointed because I expected a much better performance
>> boost and got only 7%. Counting that Python is not the fastest scripting
>> language I think that similar Perl and Java scripts will outperform D easily.
>> Thanks Andrei and simendsjo for a quick response though.
>
> Python's splitter, which you are measuring, isn't written in Python. It is written in C. You're actually comparing a bit of C code with a bit of D code.

This.

If you wrote the splitter in pure python you would see an enormous performance gap between it and the D version.

Having said that, maybe there are even more improvements we can make to improve speed in that part of photos, but considering we're already on a par with some quite mature and optimised C, there really isn't a problem.
March 02, 2013
deadalnix:

> That is usually a bad idea as it will fuck up pretty bad the aliasing analysis capabilities of the compiler.

With LDC I've seen that using raw pointers sometimes gives a little extra performance, but with DMD I've some times a lower performance.

Bye,
bearophile
March 02, 2013
On 3/1/13 8:05 PM, cvk012c wrote:
> In my latest version of D script I didn't use splitter at all. I used
> string specific indexOf function. Still result is very bad. For text
> based protocols, such as SIP, performance of string manipulating
> functions is very important. Unfortunately, looks like it is not D
> strongest point at this time.

That conclusion would be hasty if not missing the whole point. You essentially measured the speed of one loop in various translators implementing various languages. Java code doing straight computation is on a par with C speed, no two ways about that. Python code using library primitives ain't no slouch either. Performance tuning in these languages becomes more difficult in larger applications where data layout, allocation, and indirect function calls start to dominate.

The claim of systems languages being fast materializes only when you need to optimize data layout and nonstandard, custom algorithms. At that point systems languages give you additional options, whereas with high-level languages optimization becomes a very difficult proposition.


Andrei
March 02, 2013
On Sat, 2013-03-02 at 10:33 -0500, Andrei Alexandrescu wrote: […]
> That conclusion would be hasty if not missing the whole point. You essentially measured the speed of one loop in various translators implementing various languages. Java code doing straight computation is on a par with C speed, no two ways about that. Python code using library primitives ain't no slouch either. Performance tuning in these languages becomes more difficult in larger applications where data layout, allocation, and indirect function calls start to dominate.
[…]

Interestingly, there isn't only one Python implementation. There is only one language but there is CPython, PyPy, Jython, IronPython, to mention but 4.

On computationally intensive code, PyPy (Python execution environment in
RPython) is generally 10 to 30 times faster than CPython (Python
execution environment written in C).

C is a (reasonably) well known and used language thought to create fast code. RPython is Python but with some restrictions that is statically compiled.  For writing interpreters, RPython spanks C. PyPy is not the only language using RPython to implement the interpreter. C's days in this game are seriously numbered.

-- 
Russel. ============================================================================= Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder@ekiga.net 41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel@winder.org.uk London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder