March 29, 2007
Last week there were a series of posts regarding some optimized code within phobos streams. A question posed was, without those same optimizations, would tango.io be slower than the improved phobos [1]

As these new phobos IO functions are now available, Andrei's "benchmark" [2] was run on both Win32 and linux to see where tango.io could use some improvement.

The results indicate:

1) on linux, the fastest variation of the revised phobos code runs 40% slower than the generic tango.io equivalent. On the other hand, the new phobos code seems a bit faster than perl

2) on win32, similar testing shows tango.io to be more than six times faster than the improved phobos code. Tweaking the tango.io library a little makes it over eight times faster than the phobos equivalent [3]

3) On Win32, generic tango.io is more than twice as efficient as the fastest C version identified. It's also notably faster than MinGW 'cat', which apparently performs various under-the-cover optimizations.

4) by making some further optimizations in the phobos client-code using setvbuf() and fputs(), the improved phobos version can be sped up significantly; at that point tango.io is only three times faster than phobos on Win32. These adjustments require knowledge of tweaking the underlying C library; thus, they may belong to the group of C++ tweaks which Walter quibbled with last week. The setvbuf() tweaks make no noticable difference on linux, though the fputs() improvements are accounted for in #1 (above)


Note that tango.io is not explicitly optimized for this behaviour. While some quick hacks to the library have been shown to make it around 20% faster than the generic package (for this specfic test), the efficiency benefits are apparently derived through the approach more than anything else. With some changes to a core tango.io module, similar performance multipliers could presumeably be exhibited on linux platforms also. That is: tango.io is relatively sedate on linux, compared to its win32 variation.

FWIW: if some of those "Language Shootout" tests are IO-bound, perhaps tango.io might help? Can't imagine they'd apply that as a "language" test, but stranger things have happened before.


Here's the tango.io client (same as last week):

-------------
import tango.io.Console;

void main()
{
  char[] content;

  while (Cin.nextLine (content, true))
         Cout (content);
}
------------


and here's the fastest phobos equivalent. Removing the setvbuf() code makes it consume around twice as much time on Win32. Note that this version is faster than the equivalent code posted last week, though obviously more specialized and verbose:

------------
import std.stdio;
import std.cstream;

void main() {
    char[] buf = new char[1000 ];
    size_t len;
    const size_t BUFSIZE = 2 * 1024;

    setvbuf(stdin, null, _IOFBF, BUFSIZE);
    setvbuf(stdout, null, _IOFBF, BUFSIZE);

    while (( len = readln(buf)) != 0) {
        assert(len < 1000);
        buf[len] = '\0';
        fputs(buf.ptr, stdout);
    }
}
------------


[1] Timing measurements can be supplied to those interested.

[2] The recent changes within phobos apparently stemmed from Andrei piping large text files through his code, and this "benchmark" is a reflection of that process.

[3] That ~20% optimization has been removed from the generic package at this time, since we feel it doesn't contribute very much to the overall IO picture. It can be restored if people find that necessary, and there is no change to client code.
March 29, 2007
kris wrote:
> Last week there were a series of posts regarding some optimized code within phobos streams. A question posed was, without those same optimizations, would tango.io be slower than the improved phobos [1]
> 
> As these new phobos IO functions are now available, Andrei's "benchmark" [2] was run on both Win32 and linux to see where tango.io could use some improvement.
[snip]

On my machine, Tango does 4.3 seconds and the following phobos program (with Walter's readln) does 5.4 seconds:

#!/usr/bin/env rundmd
import std.stdio;

void main() {
  char[] line;
  while (readln(line)) {
    write(line);
  }
}

where write is a function that isn't yet in phobos, of the following implementation:

size_t write(char[] s) {
  return fwrite(s.ptr, 1, s.length, stdout);
}

Also, the Tango version has a bug. Running Tango's cat without any pipes does not read lines from the console and outputs them one by one, as it should; instead, it reads many lines and buffers them internally, echoing them only after the user has pressed end-of-file (^D on Linux), or possibly after the user has entered a large amount of data (I didn't have the patience). The system cat program and the phobos implementation correctly process each line as it was entered.

This bug should be fixed for the programs to be comparable. After that, it would help giving numbers comparing all of tango, phobos, and cat, with the perl baseline.


Andrei
March 29, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
> 
>> Last week there were a series of posts regarding some optimized code within phobos streams. A question posed was, without those same optimizations, would tango.io be slower than the improved phobos [1]
>>
>> As these new phobos IO functions are now available, Andrei's "benchmark" [2] was run on both Win32 and linux to see where tango.io could use some improvement.
> 
> [snip]
> 
> On my machine, Tango does 4.3 seconds and the following phobos program (with Walter's readln) does 5.4 seconds:

On Win32, the difference is very much larger. As noted before, several times faster. Those benefits will likely translate to linux going forward.

> 
> #!/usr/bin/env rundmd
> import std.stdio;
> 
> void main() {
>   char[] line;
>   while (readln(line)) {
>     write(line);
>   }
> }
> 
> where write is a function that isn't yet in phobos, of the following implementation:
> 
> size_t write(char[] s) {
>   return fwrite(s.ptr, 1, s.length, stdout);
> }

Wondered where that had gone


> 
> Also, the Tango version has a bug. Running Tango's cat without any pipes does not read lines from the console and outputs them one by one, as it should; instead, it reads many lines and buffers them internally, echoing them only after the user has pressed end-of-file (^D on Linux), or possibly after the user has entered a large amount of data (I didn't have the patience). The system cat program and the phobos implementation correctly process each line as it was entered.

If you mean something that you've written, that could presumeably be rectified by adding the isatty() test Walter had mentioned before. That has not been added to tango.io since (a) it would likely make programs behave differently depending on whether they were redirected or not. It's not yet clear whether that is an appropriate specialization, as default behaviour, and (b) there has been no ticket issued for it

Again, please submit a ticket so we don't forget about that detail. We'd be interested to hear if folk think the "isatty() test" should be default behaviour, or would perhaps lead to corner-case issues instead

March 29, 2007
kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>
>>> Last week there were a series of posts regarding some optimized code within phobos streams. A question posed was, without those same optimizations, would tango.io be slower than the improved phobos [1]
>>>
>>> As these new phobos IO functions are now available, Andrei's "benchmark" [2] was run on both Win32 and linux to see where tango.io could use some improvement.
>>
>> [snip]
>>
>> On my machine, Tango does 4.3 seconds and the following phobos program (with Walter's readln) does 5.4 seconds:
> 
> On Win32, the difference is very much larger. As noted before, several times faster. Those benefits will likely translate to linux going forward.

If I understand things correctly, it looks like the hope is to derive more speed from further dropping phobos and C I/O compatibility, a path that I personally don't consider attractive.

Also, the fact that the tango version is "more than twice as efficient as the fastest C version identified" suggests a problem with the testing method or with the C code. Are they comparable? If you genuinely have a method to push bits through two times faster than the fastest C can do, you may want as well go ahead and patent it. Your method would speed up many programs, since many use C's I/O and are I/O bound. It's huge news. I'm not even kidding. But I doubt that that's the case.

>> Also, the Tango version has a bug. Running Tango's cat without any pipes does not read lines from the console and outputs them one by one, as it should; instead, it reads many lines and buffers them internally, echoing them only after the user has pressed end-of-file (^D on Linux), or possibly after the user has entered a large amount of data (I didn't have the patience). The system cat program and the phobos implementation correctly process each line as it was entered.
> 
> If you mean something that you've written, that could presumeably be rectified by adding the isatty() test Walter had mentioned before. That has not been added to tango.io since (a) it would likely make programs behave differently depending on whether they were redirected or not. It's not yet clear whether that is an appropriate specialization, as default behaviour

What is absolutely clear is that the current version has a bug. It can't read a line from the user and write it back. There cannot be any question that that's a problem.

>, and (b) there has been no ticket issued for it
> 
> Again, please submit a ticket so we don't forget about that detail. We'd be interested to hear if folk think the "isatty() test" should be default behaviour, or would perhaps lead to corner-case issues instead

I was actually pointing out a larger issue: incompatibility with phobos' I/O and C I/O. Tango's version is now faster (thank God we got past the \n issue and bummer it's not the default parameter of nextLine) but it is incompatible with both phobos' and C's stdio. (It's possible that the extra speed is derived from skipping C's stdio and using read and write directly.) Probably you could reimplement phobos and bundle it with Tango to give the users the option to link phobos code with Tango code properly, but still C stdio compatibility is lost, and phobos code has access to it.


Andrei
March 29, 2007
kris wrote:
> On Win32, the difference is very much larger. As noted before, several times faster.

I suspect that much of the slowness difference is from using C's fputs, along with the need to append a 0 to use fputs.

std.stdio.readln will also automatically convert to char[] if the stream is in wide character mode (as will all the phobos stdio functions). This test is inlined and fast under Windows, but is a function call under Linux which will hurt performance significantly.

> If you mean something that you've written, that could presumeably be rectified by adding the isatty() test Walter had mentioned before. That has not been added to tango.io since (a) it would likely make programs behave differently depending on whether they were redirected or not. It's not yet clear whether that is an appropriate specialization, as default behaviour, and (b) there has been no ticket issued for it
> 
> Again, please submit a ticket so we don't forget about that detail. We'd be interested to hear if folk think the "isatty() test" should be default behaviour, or would perhaps lead to corner-case issues instead

Using isatty() to switch between line and block buffered I/O access is routine when using C's stdio, and in fact is relied upon in DMC's internal implementation of buffering. It's been this way for 25 years, every C stdio implementation I've heard of uses it, and I've never heard a complaint about it.
March 29, 2007
Andrei Alexandrescu (See Website For Email) wrote:
> kris wrote:
> 
>> Andrei Alexandrescu (See Website For Email) wrote:
>>
>>> kris wrote:
>>>
>>>> Last week there were a series of posts regarding some optimized code within phobos streams. A question posed was, without those same optimizations, would tango.io be slower than the improved phobos [1]
>>>>
>>>> As these new phobos IO functions are now available, Andrei's "benchmark" [2] was run on both Win32 and linux to see where tango.io could use some improvement.
>>>
>>>
>>> [snip]
>>>
>>> On my machine, Tango does 4.3 seconds and the following phobos program (with Walter's readln) does 5.4 seconds:
>>
>>
>> On Win32, the difference is very much larger. As noted before, several times faster. Those benefits will likely translate to linux going forward.
> 
> 
> If I understand things correctly, it looks like the hope is to derive more speed from further dropping phobos and C I/O compatibility, a path that I personally don't consider attractive.

Nope. That's not the case at all. The expectation (or 'hope', if you like) is that we can make the linux version operate more like the Win32 version

> 
> Also, the fact that the tango version is "more than twice as efficient as the fastest C version identified" suggests a problem with the testing method or with the C code. Are they comparable? If you genuinely have a method to push bits through two times faster than the fastest C can do, you may want as well go ahead and patent it. Your method would speed up many programs, since many use C's I/O and are I/O bound. It's huge news. 

That's good for D then?

There's no reason why C could not take the same approach yet, one might imagine, the IO strategies exposed and the wide variety of special cases may 'discourage' the implementation of a more efficient approach? That's just pure speculation on my part, and I'm quite positive the C version could be sped up notably if one reimplemented a bunch of things.

> I'm not even kidding. But I doubt that that's the case.

You're most welcome to your doubts, Andrei. However, just because "C does it that way" doesn't mean it is, or ever was, the "best" approach


> 
>>> Also, the Tango version has a bug. Running Tango's cat without any pipes does not read lines from the console and outputs them one by one, as it should; instead, it reads many lines and buffers them internally, echoing them only after the user has pressed end-of-file (^D on Linux), or possibly after the user has entered a large amount of data (I didn't have the patience). The system cat program and the phobos implementation correctly process each line as it was entered.
>>
>>
>> If you mean something that you've written, that could presumeably be rectified by adding the isatty() test Walter had mentioned before. That has not been added to tango.io since (a) it would likely make programs behave differently depending on whether they were redirected or not. It's not yet clear whether that is an appropriate specialization, as default behaviour
> 
> 
> What is absolutely clear is that the current version has a bug. It can't read a line from the user and write it back. There cannot be any question that that's a problem.

Only with the way that you've written your program. In the general case, that is not true at all. But please do submit that bug-report :)


> 
>> , and (b) there has been no ticket issued for it
>>
>> Again, please submit a ticket so we don't forget about that detail. We'd be interested to hear if folk think the "isatty() test" should be default behaviour, or would perhaps lead to corner-case issues instead
> 
> 
> I was actually pointing out a larger issue: incompatibility with phobos' I/O and C I/O. Tango's version is now faster (thank God we got past the \n issue and bummer it's not the default parameter of nextLine) but it is incompatible with both phobos' and C's stdio. (It's possible that the extra speed is derived from skipping C's stdio and using read and write directly.) Probably you could reimplement phobos and bundle it with Tango to give the users the option to link phobos code with Tango code properly, but still C stdio compatibility is lost, and phobos code has access to it.

The issue you raise here is that of interleaved and shared access to global entities, such as the console, where some incompatability between tango.io and C IO is exhibited.

If you really dig into it, you'll perhaps conclude that (a) the number of real-world scenario where this would truly become an issue is diminishingly small, and (b) the vast (certainly on Win32) performance improvement is worth that tradeoff. Even then, it is certainly possible to intercept C IO functions and route them to tango.io equivalents instead.

It has been said before, but is probably worth repeating:

- Tango is not a phobos clone. Nor is it explicitly designed to be compatible with phobos; sometimes it is worthwhile taking a different approach. Turns out that phobos can be run alongside tango in many situations.

- Tango is for D programmers; not C programmers.

- Tango, as a rule, is intended to be flexible, modular, efficient and practical. The goal is to provide D with an exceptional library, and we reserve the right to break a few eggs along the way ;)
March 29, 2007
Walter Bright wrote:
> kris wrote:
> 
>> On Win32, the difference is very much larger. As noted before, several times faster.
> 
> 
> I suspect that much of the slowness difference is from using C's fputs, along with the need to append a 0 to use fputs.

Okay. Oh, seemingly dout.write() has some io-synch problems when used in this manner?


> std.stdio.readln will also automatically convert to char[] if the stream is in wide character mode (as will all the phobos stdio functions). This test is inlined and fast under Windows, but is a function call under Linux which will hurt performance significantly.

Well, phobos is running as fast as perl under linux, so perhaps it doesn't seem to be much of an issue there? Under Win32, tango.io seems to leave everything else in the dust.

Seems kinda obvious why that is, when you look at what the "benchmark" is really testing? To me, it's likely spending most of its time constructing each line, so it's really not an IO test per se? Tango takes an alternate approach to such tasks, which would explains why it is so fast under Win32. What surprises us is that tango.io is almost sedate on linux by comparison. I can't explain that right now, but suspect it may have something to do with file locks, or something :)


>> If you mean something that you've written, that could presumeably be rectified by adding the isatty() test Walter had mentioned before. That has not been added to tango.io since (a) it would likely make programs behave differently depending on whether they were redirected or not. It's not yet clear whether that is an appropriate specialization, as default behaviour, and (b) there has been no ticket issued for it
>>
>> Again, please submit a ticket so we don't forget about that detail. We'd be interested to hear if folk think the "isatty() test" should be default behaviour, or would perhaps lead to corner-case issues instead
> 
> 
> Using isatty() to switch between line and block buffered I/O access is routine when using C's stdio, and in fact is relied upon in DMC's internal implementation of buffering. It's been this way for 25 years, every C stdio implementation I've heard of uses it, and I've never heard a complaint about it.

That's useful input ... thanks. It was noted that a program used both as a console process and a child process might behave differently, since flush would be automatic on the console yet not always so for the child (with redirected handles) ?
March 29, 2007
kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:

> - Tango is for D programmers; not C programmers.

D programmers sometimes like to call 3rd party code written in other languages, and pretty much any interop in D has to happen via C compatibility.  E.g. pyD.  So I'm guessing if my D code calls on some Python code that prints to the console that somewhere down the line that eventually ends up on C's stdout.  I could be wrong, but at least that's why I *think* Andrei and Walter keep saying that C compatibility is important.

Andrei -- by "compatibility" does that mean if I rebind stdio/stdout to something different that both D and C's output go to the new place?  Or is it still necessary to rebind them individually?  I did this once for some legacy code in C++, and found that I had to rebind 3 things: the C streams, the C++ old-style streams from <iostream.h> (was under MSVC 6), and the new-style C++ streams from <iostream>.    And then I had to do the interleaving myself, which didn't really work (because all the streams were just writing to output buffers individually).  If what you're talking about with compatibility would avoid that kind mess, that is certainly be a good thing.

--bb
March 29, 2007
kris wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> kris wrote:
>>
>>> Andrei Alexandrescu (See Website For Email) wrote:
>>>
>>>> kris wrote:
>>>>
>>>>> Last week there were a series of posts regarding some optimized code within phobos streams. A question posed was, without those same optimizations, would tango.io be slower than the improved phobos [1]
>>>>>
>>>>> As these new phobos IO functions are now available, Andrei's "benchmark" [2] was run on both Win32 and linux to see where tango.io could use some improvement.
>>>>
>>>>
>>>> [snip]
>>>>
>>>> On my machine, Tango does 4.3 seconds and the following phobos program (with Walter's readln) does 5.4 seconds:
>>>
>>>
>>> On Win32, the difference is very much larger. As noted before, several times faster. Those benefits will likely translate to linux going forward.
>>
>>
>> If I understand things correctly, it looks like the hope is to derive more speed from further dropping phobos and C I/O compatibility, a path that I personally don't consider attractive.
> 
> Nope. That's not the case at all. The expectation (or 'hope', if you like) is that we can make the linux version operate more like the Win32 version
> 
>>
>> Also, the fact that the tango version is "more than twice as efficient as the fastest C version identified" suggests a problem with the testing method or with the C code. Are they comparable? If you genuinely have a method to push bits through two times faster than the fastest C can do, you may want as well go ahead and patent it. Your method would speed up many programs, since many use C's I/O and are I/O bound. It's huge news. 
> 
> That's good for D then?
> 
> There's no reason why C could not take the same approach yet, one might imagine, the IO strategies exposed and the wide variety of special cases may 'discourage' the implementation of a more efficient approach? That's just pure speculation on my part, and I'm quite positive the C version could be sped up notably if one reimplemented a bunch of things.
> 
>> I'm not even kidding. But I doubt that that's the case.
> 
> You're most welcome to your doubts, Andrei. However, just because "C does it that way" doesn't mean it is, or ever was, the "best" approach

I think we're not on the same page here. What I'm saying is that, unless you cut a deal with Microsoft to provide you with a secret D I/O API that nobody knows about, all fast APIs in existence come with a C interface. It's very hard to contend that. Probably you are referring to the C stdio, and I'm in agreement with that. Of course there's a variety of means to be faster than stdio on any given platform, at various compatibility costs. It's known how to do that. "Hot water has been invented."

>>>> Also, the Tango version has a bug. Running Tango's cat without any pipes does not read lines from the console and outputs them one by one, as it should; instead, it reads many lines and buffers them internally, echoing them only after the user has pressed end-of-file (^D on Linux), or possibly after the user has entered a large amount of data (I didn't have the patience). The system cat program and the phobos implementation correctly process each line as it was entered.
>>>
>>>
>>> If you mean something that you've written, that could presumeably be rectified by adding the isatty() test Walter had mentioned before. That has not been added to tango.io since (a) it would likely make programs behave differently depending on whether they were redirected or not. It's not yet clear whether that is an appropriate specialization, as default behaviour
>>
>>
>> What is absolutely clear is that the current version has a bug. It can't read a line from the user and write it back. There cannot be any question that that's a problem.
> 
> Only with the way that you've written your program. In the general case, that is not true at all. But please do submit that bug-report :)

This is the fourth time we need to discuss this. Why do I need to _argue_ that this is a bug, I don't understand.

Let me spell it again: Cin.nextLine is incorrect. It cannot be used (without possibly some extra incantations I don't know about) to implement a program that does this:

$ ./test.d
Please enter your name: Moe
Hello, Moe!
$ _

I don't have an account on the Tango site, and in a fraction of the time it would take me to create one, you can submit the bug report.

>>> , and (b) there has been no ticket issued for it
>>>
>>> Again, please submit a ticket so we don't forget about that detail. We'd be interested to hear if folk think the "isatty() test" should be default behaviour, or would perhaps lead to corner-case issues instead
>>
>>
>> I was actually pointing out a larger issue: incompatibility with phobos' I/O and C I/O. Tango's version is now faster (thank God we got past the \n issue and bummer it's not the default parameter of nextLine) but it is incompatible with both phobos' and C's stdio. (It's possible that the extra speed is derived from skipping C's stdio and using read and write directly.) Probably you could reimplement phobos and bundle it with Tango to give the users the option to link phobos code with Tango code properly, but still C stdio compatibility is lost, and phobos code has access to it.
> 
> The issue you raise here is that of interleaved and shared access to global entities, such as the console, where some incompatability between tango.io and C IO is exhibited.
> 
> If you really dig into it, you'll perhaps conclude that (a) the number of real-world scenario where this would truly become an issue is diminishingly small, and (b) the vast (certainly on Win32) performance improvement is worth that tradeoff. Even then, it is certainly possible to intercept C IO functions and route them to tango.io equivalents instead.

What Win32 primitives does tango use?

> It has been said before, but is probably worth repeating:
> 
> - Tango is not a phobos clone. Nor is it explicitly designed to be compatible with phobos; sometimes it is worthwhile taking a different approach. Turns out that phobos can be run alongside tango in many situations.
> 
> - Tango is for D programmers; not C programmers.
> 
> - Tango, as a rule, is intended to be flexible, modular, efficient and practical. The goal is to provide D with an exceptional library, and we reserve the right to break a few eggs along the way ;)

Sounds great.


Andrei
March 29, 2007
> It has been said before, but is probably worth repeating:
> 
> - Tango is not a phobos clone. Nor is it explicitly designed to be compatible with phobos; sometimes it is worthwhile taking a different approach. Turns out that phobos can be run alongside tango in many situations.
> 
> - Tango is for D programmers; not C programmers.
> 
> - Tango, as a rule, is intended to be flexible, modular, efficient and practical. The goal is to provide D with an exceptional library, and we reserve the right to break a few eggs along the way ;)


Totally agree!
« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11
Top | Discussion index | About this forum | D home