August 01, 2013
On 7/31/13 4:17 PM, bearophile wrote:
> Walter Bright:
>
>> Ironically, the component program from the article I wrote:
>> ...
>> is 2x faster than the Haskell version:
>
> Benchmarking code written in two different languages is tricky, there
> are so many sources of mistakes, even if you know well both languages.

I measured the timings. It was in a discussion between Walter, myself, and a common friend. That friend said Haskell will do a lot better on the same task. I measured by piping 1,000,000 lines of real log data through the tested programs.

His first version was:

import Data.List main = interact $ unlines . sort . lines

This took 51 seconds. The friend got a bit miffed complaining I only measured to ridicule his code (but this is hardly the first time hard numbers offended someone) and went to ask on Haskell fora on how to make the code faster. His second version was:

import Data.List import qualified Data.ByteString.Lazy.Char8 as L
main = L.interact $ L.unlines . sort . L.lines

This version took 7 seconds. A debug version of the D code took 3 seconds.

> But I accept your timing.

This is most gracious considering the crass statement with which you opened.

> And I say that it's good :-) We should aim to
> be better than the Intel Labs Haskell Research Compiler (HRC) :-)

Is that a lot better than ghc?


Andrei
August 01, 2013
On Wed, Jul 31, 2013 at 11:52:35PM +0000, Justin Whear wrote:
> On Thu, 01 Aug 2013 00:23:52 +0200, bearophile wrote:
> > 
> > The situation should be improved for D/dmd/Phobos, otherwise such D component programming remains partially a dream, or a toy.
> > 
> > Bye,
> > bearophile
> 
> I disagree with your "toy" assessment.  I've been using this chaining, component style for a while now and have really enjoyed the clarity it's brought to my code.  I hadn't realized how bug-prone non-trivial loops tend to be until I started writing this way and avoided them entirely.
[...]

One of the more influential courses I took in college was on Jackson
Structured Programming. It identified two sources of programming
complexity (i.e., where bugs are most likely to occur): (1) mismatches
between the structure of the program and the structure of the data
(e.g., you're reading an input file that has a preamble, body, and
epilogue, but your code has a single loop over lines in the file); (2)
writing loop invariants (or equivalently, loop conditions).

Most non-trivial loops in imperative code have both, which makes them doubly prone to bugs. In the example I gave above, the mismatch between the code structure (a single loop) and the file structure (three sequential sections) often prompts people to add boolean flags, state variables, and the like, in order to resolve the conflict between the two structures. Such ad hoc structure resolutions are a breeding ground for bugs, and often lead to complicated loop conditions, which invite even more bugs.

In contrast, if you structure your code according to the structure of the input (i.e., one loop for processing the preamble, one loop for processing the body, one loop for processing the epilogue), it becomes considerably less complex, easier to read (and write!), and far less bug prone. Your loop conditions become simpler, and thus easier to reason about and leave less room for bugs to hide.

But to be able to process the input in this way requires that you encapsulate your input so that it can be processed by 3 different loops. Once you go down that road, you start to arrive at the concept of input ranges... then you abstract away the three loops into three components, and behold, component style programming!

In fact, with component style programming, you can also address another aspect of (1): when you need to simultaneously process two data structures whose structures don't match. For example, if you want to lay out a yearly calendar using writeln, the month/day cells must be output in a radically different order than the logical foreach(m;1..12) { foreach(day;1..31) } structure). Writing this code in the traditional imperative style produces a mass of spaghettii code: either you have bizarre loops with convoluted loop conditions for generating the dates in the order you want to print them, or you have to fill out some kind of grid structure in a complicated order so that you can generate the dates in order.

Using ranges, though, this becomes considerably more tractable: you can have an input range of dates in chronological order, two output ranges corresponding to chunking by week / month, which feed into a third output range that buffers the generated cells and prints them once enough has been generated to fill a row of output. By separating out these non-corresponding structures into separate components, you greatly simplify the code within each component and thus reduce the number of bugs (e.g. it's far easier to ensure you never put more than 7 days in a week, since the weekly output range is all in one place, as opposed to sprinkled everywhere across multiple nested loops in the imperative style calendar code). The code that glues these components together is also separated out and becomes easier to understand and debug: you simply read from the input range of dates, write to the two output ranges, and check if they are full (this isn't part of the range API but added so for this particular example); if the weekly range is full, start a new week; if the monthly range is full, start a new month. Then the final output range takes care of when to actually produce output -- you just write stuff to it and don't worry about it in the glue code.

OK, this isn't really a good example of the linear pipeline style code we're talking about, but it does show how using ranges as components can untangle very complicated code into simple, tractable parts that are readable and easy to debug.


T

-- 
If you compete with slaves, you become a slave. -- Norbert Wiener
August 01, 2013
On 7/31/2013 5:46 PM, H. S. Teoh wrote:
> [...]

Thank you for an excellent and concise summary of what component programming is all about!
August 01, 2013
Walter Bright:

> Speed is only one measure of utility.

I agree, I program often in Python, and it can be very useful, despite being sometimes not fast at all.

But as Haskell folks sometimes say, a modern language should try to allow a high level style of coding while still keeping a "good enough" efficiency.

------------------------

Justin Whear:

> I hadn't realized how bug-prone non-trivial loops tend to be until I started writing this way and avoided them entirely.

I agree.


> Thus far, I don't think I've rewritten anything out of the component programming style, so while probably not optimal, it's been more than good enough.

Take a look at this thread in D.learn:

http://forum.dlang.org/thread/mailman.304.1375190212.22075.digitalmars-d-learn@puremagic.com

------------------------

Andrei Alexandrescu:

> Is that a lot better than ghc?

According to this article it seems better, but I have no direct experience of it:

http://www.leafpetersen.com/leaf/publications/hs2013/haskell-gap.pdf

Bye,
bearophile
August 01, 2013
The one thing that confused me at first when I read Walter's article was that I thought he was talking about the *other* component programming, a method commonly used by game developers to avoid deep class hierarchies.

http://gameprogrammingpatterns.com/component.html
August 01, 2013
On Wednesday, 31 July 2013 at 22:23:54 UTC, bearophile wrote:
> Justin Whear:
>
>> If anything, component programming is just functional programming + templates and some nice syntactic sugar.
>> And a healthy dose of pure awesome.
>
> What D calls "component programming" is very nice and good, but in D it's almost a joke.
>
> Currently this code inlines nothing (the allocations, the difference and the product):
>
>
> import std.numeric: dotProduct;
> int main() {
>     enum N = 50;
>     auto a = new int[N];
>     auto b = new int[N];
>     auto c = new int[N];
>     c[] = a[] - b[];
>     int result = dotProduct(c, c);
>     return result;
> }
>
>
> If you write it in component-style (using doubles here):
>
>
> import std.math;
> import std.algorithm, std.range;
>
> int main() {
>     enum N = 50;
>     alias T = double;
>     auto a = new T[N];
>     auto b = new T[N];
>
>     return cast(int)zip(a, b)
>            .map!(p => (p[0] - p[1]) ^^ 2)
>            .reduce!q{a + b};
> }
>
>
> The situation gets much worse, you see many functions in the binary, that even LDC2 often not able to inline. The GHC Haskell compiler turns similar "components" code in efficient SIMD asm (that uses packed doubles, like double2), it inlines everything, merges the loops, produces a small amount of asm output, and there is no "c" intermediate array. In GHC "component programming" is mature (and Intel is developing an Haskell compiler that is even more optimizing), while in D/dmd/Phobos this stuff is just started. GHC has twenty+ years of head start on this and it shows.
>
> The situation should be improved for D/dmd/Phobos, otherwise such D component programming remains partially a dream, or a toy.
>
> Bye,
> bearophile

I was honestly thinking whether I should reply to this rant or not... Obviously I picked the first. - Component programming, as you probably know yourself already, is not about making super-fast, super-optimized applications, but about making it easy both to write the code and to understand the code, as well as making it easy to combine components (algorithms mostly) and get the result quickly, where by "quickly" I think about time I need to write the code.

If you really want a super-optimized solution you will in most cases write the piece in question in C. Well, that is at least what my experience tells me. Luckily, I do business applications most of the time, so performance is rarely an issue. CONVENIENCE is! In other words, I shamelessly admit, I only care about the time I have to spend coding in order to implement something that is of value to the business.
August 01, 2013
On Thursday, 1 August 2013 at 00:47:43 UTC, H. S. Teoh wrote:

> Most non-trivial loops in imperative code have both, which makes them
> doubly prone to bugs. In the example I gave above, the mismatch between
> the code structure (a single loop) and the file structure (three
> sequential sections) often prompts people to add boolean flags, state
> variables, and the like, in order to resolve the conflict between the
> two structures. Such ad hoc structure resolutions are a breeding ground
> for bugs, and often lead to complicated loop conditions, which invite
> even more bugs.
>
>
> T

I agree, and to be honest, loops have given me more than one headache. It's so easy to lose track of what is going on where and why. And if you have ever had the pleasure of adding to or debugging code that handles three or more different issues in one loop, then you will know how mind boggling loops can be.

Your example is very good (you should write an article about it) and similar examples occur in web development all the time (creating tables, lists etc). I once wrote an event calendar for a homepage and _partly_ disentagled the loop for simplicity's sake. I say "partly" because it is still a bit "loopy". And I guess this is what component programming is all about, disentangling code.

The only difficulty I have is the opposition to OOP. I don't really see how the two concepts are mutually exclusive. OOP can benefit from component programming and vice versa.

Component programming is a good choice for web programming where loops abound. I'm tired of the infinite loops (pardon the pun again) in JavaScript and the like. Sure there's gotta be a better way.
August 01, 2013
On Thursday, 1 August 2013 at 00:47:43 UTC, H. S. Teoh wrote:
> On Wed, Jul 31, 2013 at 11:52:35PM +0000, Justin Whear wrote:
>> On Thu, 01 Aug 2013 00:23:52 +0200, bearophile wrote:
>> > 
>> > The situation should be improved for D/dmd/Phobos, otherwise such D
>> > component programming remains partially a dream, or a toy.
>> > 
>> > Bye,
>> > bearophile
>> 
>> I disagree with your "toy" assessment.  I've been using this chaining,
>> component style for a while now and have really enjoyed the clarity
>> it's brought to my code.  I hadn't realized how bug-prone non-trivial
>> loops tend to be until I started writing this way and avoided them
>> entirely.
> [...]
>
> One of the more influential courses I took in college was on Jackson
> Structured Programming. It identified two sources of programming
> complexity (i.e., where bugs are most likely to occur): (1) mismatches
> between the structure of the program and the structure of the data
> (e.g., you're reading an input file that has a preamble, body, and
> epilogue, but your code has a single loop over lines in the file); (2)
> writing loop invariants (or equivalently, loop conditions).
>
> Most non-trivial loops in imperative code have both, which makes them
> doubly prone to bugs. In the example I gave above, the mismatch between
> the code structure (a single loop) and the file structure (three
> sequential sections) often prompts people to add boolean flags, state
> variables, and the like, in order to resolve the conflict between the
> two structures. Such ad hoc structure resolutions are a breeding ground
> for bugs, and often lead to complicated loop conditions, which invite
> even more bugs.
>
> In contrast, if you structure your code according to the structure of
> the input (i.e., one loop for processing the preamble, one loop for
> processing the body, one loop for processing the epilogue), it becomes
> considerably less complex, easier to read (and write!), and far less bug
> prone. Your loop conditions become simpler, and thus easier to reason
> about and leave less room for bugs to hide.
>
> But to be able to process the input in this way requires that you
> encapsulate your input so that it can be processed by 3 different loops.
> Once you go down that road, you start to arrive at the concept of input
> ranges... then you abstract away the three loops into three components,
> and behold, component style programming!
>
> In fact, with component style programming, you can also address another
> aspect of (1): when you need to simultaneously process two data
> structures whose structures don't match. For example, if you want to lay
> out a yearly calendar using writeln, the month/day cells must be output
> in a radically different order than the logical foreach(m;1..12) {
> foreach(day;1..31) } structure). Writing this code in the traditional
> imperative style produces a mass of spaghettii code: either you have
> bizarre loops with convoluted loop conditions for generating the dates
> in the order you want to print them, or you have to fill out some kind
> of grid structure in a complicated order so that you can generate the
> dates in order.
>
> Using ranges, though, this becomes considerably more tractable: you can
> have an input range of dates in chronological order, two output ranges
> corresponding to chunking by week / month, which feed into a third
> output range that buffers the generated cells and prints them once
> enough has been generated to fill a row of output. By separating out
> these non-corresponding structures into separate components, you greatly
> simplify the code within each component and thus reduce the number of
> bugs (e.g. it's far easier to ensure you never put more than 7 days in a
> week, since the weekly output range is all in one place, as opposed to
> sprinkled everywhere across multiple nested loops in the imperative
> style calendar code). The code that glues these components together is
> also separated out and becomes easier to understand and debug: you
> simply read from the input range of dates, write to the two output
> ranges, and check if they are full (this isn't part of the range API but
> added so for this particular example); if the weekly range is full,
> start a new week; if the monthly range is full, start a new month. Then
> the final output range takes care of when to actually produce output --
> you just write stuff to it and don't worry about it in the glue code.
>
> OK, this isn't really a good example of the linear pipeline style code
> we're talking about, but it does show how using ranges as components can
> untangle very complicated code into simple, tractable parts that are
> readable and easy to debug.
>
>
> T

Add in some code examples and that could make a nice article.
August 01, 2013
On 08/01/2013 03:40 AM, bearophile wrote:
> Take a look at this thread in D.learn:
> 
> http://forum.dlang.org/thread/mailman.304.1375190212.22075.digitalmars-d-learn@puremagic.com

Yea, this was a frustration. :-(  It was really nice to be able to write simple, clean, elegant code using D -- it was sad to discover that while this was great for a prototype, the performance gap was far too large to make it a viable long-term solution.

Most of the issues seem to centre around GC, so there might be some low-hanging fruit there for performance improvements.

August 01, 2013
On 7/31/13 6:40 PM, bearophile wrote:
> According to this article it seems better, but I have no direct
> experience of it:
>
> http://www.leafpetersen.com/leaf/publications/hs2013/haskell-gap.pdf

ERROR 404 - PAGE NOT FOUND

Andrei