March 01, 2013
On Fri, 01 Mar 2013 16:28:08 -0500, cvk012c <cvk012c@motorolasolutions.com> wrote:

> On my hardware with -inline options it now takes about 15 secs which is still slower than Python but with both -inline and -noboundscheck it takes 13 secs and finally beats Python.
> But I still kind of disappointed because I expected a much better performance boost and got only 7%. Counting that Python is not the fastest scripting language I think that similar Perl and Java scripts will outperform D easily.
> Thanks Andrei and simendsjo for a quick response though.

Phobos kind of refuses to treat strings like arrays of characters, it insists on decoding.

With DMD and a hand-written splitter, it takes 6 seconds instead of 10 on my system (64-bit macosx).

struct MySplitter
{
    private string s;
    private string source;
    this(string src)
    {
        source = src;
        popFront();
    }

    @property string front()
    {
        return s;
    }

    @property bool empty()
    {
        return s.ptr == null;
    }

    void popFront()
    {
        s = source;
        if(!source.length)
        {
            source = null;
        }
        else
        {
            size_t i = 0;
            bool found = false;
            for(; i + 1 < source.length; i++)
            {
                if(source[i] == '\r' && source[i + 1] == '\n')
                {
                    found = true;
                    break;
                }
            }
            s = source[0..i];
            if(found)
                source = source[i + 2..$];
            else
                source = source[$..$];
        }
    }
}

I'm sure splitter could be optimized to do the same thing I'm doing.

Probably can reduce that a bit using pointers instead of strings.

-Steve
March 01, 2013
On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
> Phobos kind of refuses to treat strings like arrays of characters, it
> insists on decoding.

There's no decoding in find and splitter as far as I remember.

Andrei
March 01, 2013
On Fri, 01 Mar 2013 17:35:04 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
>> Phobos kind of refuses to treat strings like arrays of characters, it
>> insists on decoding.
>
> There's no decoding in find and splitter as far as I remember.

Looking at splitter, it uses skipOver to skip over the separator, and that seems to call R.front and R.popFront.  Actually, it calls it for both the source string and the separator.

Maybe fixing skipOver would fix the problem?  Or don't call skipOver at all, since find isn't doing decoding, why do decoding when skipping the separator?

I still feel we need a specialized string type...

-Steve
March 01, 2013
On 3/1/13 5:47 PM, Steven Schveighoffer wrote:
> On Fri, 01 Mar 2013 17:35:04 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
>>> Phobos kind of refuses to treat strings like arrays of characters, it
>>> insists on decoding.
>>
>> There's no decoding in find and splitter as far as I remember.
>
> Looking at splitter, it uses skipOver to skip over the separator, and
> that seems to call R.front and R.popFront. Actually, it calls it for
> both the source string and the separator.

You may be looking at the wrong splitter overload.

Andrei
March 01, 2013
On Fri, 01 Mar 2013 18:07:45 -0500, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> On 3/1/13 5:47 PM, Steven Schveighoffer wrote:
>> On Fri, 01 Mar 2013 17:35:04 -0500, Andrei Alexandrescu
>> <SeeWebsiteForEmail@erdani.org> wrote:
>>
>>> On 3/1/13 5:31 PM, Steven Schveighoffer wrote:
>>>> Phobos kind of refuses to treat strings like arrays of characters, it
>>>> insists on decoding.
>>>
>>> There's no decoding in find and splitter as far as I remember.
>>
>> Looking at splitter, it uses skipOver to skip over the separator, and
>> that seems to call R.front and R.popFront. Actually, it calls it for
>> both the source string and the separator.
>
> You may be looking at the wrong splitter overload.

Yes, I was.  Very difficult to tell with the way template constraints are written!

So it's just pure heuristics.  Not hard to see how that would affect a 10 million cycle program.

We may be able to create a string-specific version of splitter that would take advantage of the representation.

-Steve
March 01, 2013
On Fri, 01 Mar 2013 18:22:54 -0500, Steven Schveighoffer <schveiguy@yahoo.com> wrote:

> So it's just pure heuristics.  Not hard to see how that would affect a 10 million cycle program.
>
> We may be able to create a string-specific version of splitter that would take advantage of the representation.

Just found a disturbing artifact: splitter(message, '\n') is more than twice as slow as splitter(message, "\n")

-Steve
March 02, 2013
On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
> cvk012c:
>
>> I think that similar Perl and Java scripts will outperform D easily.
>> Thanks Andrei and simendsjo for a quick response though.
>
> Why don't you write a Java version? It takes only few minutes, and you will have one more data point.
>

You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options:  -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4.
D experts, please, take a look at my code and tell me what is wrong with it.

Java version:

public class Test
{
  public static void main(String[] args)
  {
    String message = "REGISTER sip:comm.example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345@10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\nTo: <sip:12345@comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345@comm.example.com>;tag=2910497618150713\r\n\r\n";
    long t1 = System.currentTimeMillis();
    for (int i=0; i<10000000; i++)
    {
      int index1 = 0;
      while (true)
      {
        int index2 = message.indexOf("\r\n", index1);
        if (index2 == -1)
          break;
        String notused = message.substring(index1, index2);
        index1 = index2+2;
      }
    }
    System.out.println(System.currentTimeMillis()-t1);
  }
}



D version:

import std.stdio,std.string,std.datetime;

void main()
{
  auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length: 0\r\nContact: <sip:12345@10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\nTo: <sip:12345@comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia: SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID: 2910497622026445\r\nFrom: <sip:12345@comm.example.com>;tag=2910497618150713\r\n\r\n";
  auto t1 = Clock.currTime();
  for (int i=0; i<10000000; i++)
  {
    auto substring = message;
    while (true)
    {
      auto index = indexOf(substring, "\r\n");
      if (index == -1)
        break;
      auto notused = substring[0..index];
      substring = substring[index+2..$];
    }
  }
  writeln(Clock.currTime()-t1);
}
March 02, 2013
On Fri, 01 Mar 2013 19:19:35 -0500, cvk012c <cvk012c@motorolasolutions.com> wrote:

> On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
>> cvk012c:
>>
>>> I think that similar Perl and Java scripts will outperform D easily.
>>> Thanks Andrei and simendsjo for a quick response though.
>>
>> Why don't you write a Java version? It takes only few minutes, and you will have one more data point.
>>
>
> You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options:  -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4.
> D experts, please, take a look at my code and tell me what is wrong with it.

Try my hand-written version (elsewhere in thread).  I think it can be done better too (use pointers instead of arrays).

The issue is a combination of the fact that:
1. splitter is designed for any range, not just strings.  Not an excuse really, but a string-specific version could be written that does better (clearly).
2. dmd is not always the best optimizer.  I saw one other person who said using a different d compiler resulted in a quicker time.
3. Any time you are looping 10 million times, small insignificant differences will be magnified.

Note one other thing -- Be VERY wary of test cases that are fully-visible at compile time.  Very smart compilers are known to reduce your code to something that isn't realistic.  I remember seeing a similar comparison not too long ago where one wondered why g++ was so much faster (0.09 seconds or something) than D (4 or more seconds).  Turns out, the g++ compiler optimized out his ENTIRE program :).

You may want to read in the "message" string at runtime to avoid such issues.

-Steve
March 02, 2013
On Saturday, 2 March 2013 at 00:47:02 UTC, Steven Schveighoffer wrote:
> On Fri, 01 Mar 2013 19:19:35 -0500, cvk012c <cvk012c@motorolasolutions.com> wrote:
>
>> On Friday, 1 March 2013 at 21:52:13 UTC, bearophile wrote:
>>> cvk012c:
>>>
>>>> I think that similar Perl and Java scripts will outperform D easily.
>>>> Thanks Andrei and simendsjo for a quick response though.
>>>
>>> Why don't you write a Java version? It takes only few minutes, and you will have one more data point.
>>>
>>
>> You are right. Why not. But instead of using Java split() method I used combination of indexOf() and substring() methods to do the same job. The reason: Java split method implemented as a regular expression which will be unfair to compare to D splitter. Again, I created a similar D version of the script, compiled it with all suggested options:  -release -O -inline -noboundscheck and this time D version is more then twice slower than Java: 8.4 seconds vs 4.
>> D experts, please, take a look at my code and tell me what is wrong with it.
>
> The issue is a combination of the fact that:
> 1. splitter is designed for any range, not just strings.  Not an excuse really, but a string-specific version could be written that does better (clearly).

In my latest version of D script I didn't use splitter at all. I used string specific indexOf function. Still result is very bad. For text based protocols, such as SIP, performance of string manipulating functions is very important. Unfortunately, looks like it is not D strongest point at this time.
March 02, 2013
On 3/1/2013 1:28 PM, cvk012c wrote:
> But I still kind of disappointed because I expected a much better performance
> boost and got only 7%. Counting that Python is not the fastest scripting
> language I think that similar Perl and Java scripts will outperform D easily.
> Thanks Andrei and simendsjo for a quick response though.

Python's splitter, which you are measuring, isn't written in Python. It is written in C. You're actually comparing a bit of C code with a bit of D code.