Jump to page: 1 2
Thread overview
Code layout for range-intensive D code
Jun 09, 2012
bearophile
Jun 09, 2012
Denis Shelomovskij
Jun 10, 2012
Nick Sabalausky
Jun 10, 2012
Dmitry Olshansky
Jun 10, 2012
ixid
Jun 10, 2012
Dmitry Olshansky
Jun 10, 2012
Peter Alexander
Jun 10, 2012
Jonathan M Davis
Jun 11, 2012
ixid
Jun 11, 2012
Jonathan M Davis
Jun 11, 2012
ixid
Jun 11, 2012
Philippe Sigaud
Jun 11, 2012
Jonathan M Davis
Jun 11, 2012
bearophile
June 09, 2012
The introduction of UFCS in D offers new ways to format D code, especially when your code uses many high order functions. What is a good layout of the D code in such situations? I have tried several alternative layouts, and in the end I found to appreciate a layout similar to the one used in F# code. Below I show a kind of  extreme example :-)

A textual matrix of bits like this is the input of a little nonogram puzzle:

0 1 1 1 1 0
1 0 0 1 1 1
1 0 1 1 1 1
1 1 1 1 1 1
0 1 1 1 1 0


A program has to produce an output like this, in the first part of the output it looks at the columns and counts the lengths of the groups of "1", and in the second part of the output it does the same on the rows:

3
1 2
1 3
5
5
3

4
1 3
1 4
6
4


This is a possible solution program:


import std.stdio, std.algorithm, std.string, std.range, std.conv;

void main() {

    auto t = "table.txt"
             .File()
             .byLine()
             .map!(r => r.removechars("^01".dup))()
             .array();

    const transposed = t[0]
                       .length
                       .iota()
                       .map!(i => t.transversal(i).array())()
                       .array();

    (t ~ [(char[]).init] ~ transposed)
    .map!(r => r
               .group()
               .filter!(p => p[0] == '1')()
               .map!(p => p[1].text())()
               .join(" ")
         )()
    .join("\n")
    .writeln();
}


(Note: the second argument of removechars is "^01".dup because removechars is a bit stupid, it requires the same type argument on both arguments, and the 'r' given by byLine() is a char[]. Here the code performs the string->char[] conversion many times because the typical inputs for this program are small enough, otherwise it's a premature optimization.)


As you see you have to break the lines, because the processing chains often become too much long for single lines.
At first I have put the dots at the end of the lines, but later I have found that putting the dots at their start is better, it reminds me we are inside a processing chain still.
Putting a single operation on each line (instead of two or three) helps readability, allowing a bit of single-line nesting like in ".map!(i => t.transversal(i).array())()".
And putting the dots and first part aligned vertically helps the eye find what chain we are in. In the last part of the program you see a nested chain too, inside a map.

I think this code layout is similar to the one used with F# pipe operators. In F# code that layout is a kind of standard, I see it used by most F# programmers. Maybe some D programmers will want to use this kind of layout for such kind of higher-order-function-heavy code.

I have found that breaking the chains and giving variable names to the intermediate parts of those processing chains doesn't help the readability a lot, and the names for those intermediate temporary variables tend to be dull and repetitive. On the other hand putting just one processing step on each row gives space for a short comment on each row, where you thik you need it:

auto t = File("table.txt", "r")                 // comment #1
         .byLine()                              // comment #2
         .map!(r => r.removechars("^01".dup))() // comment #3
         .array();                              // comment #4


In practice I think comment #3 is the only useful here, barely.

Bye,
bearophile
June 09, 2012
09.06.2012 14:43, bearophile пишет:
> The introduction of UFCS in D offers new ways to format D code,
> especially when your code uses many high order functions.

I have to mention that one shouldn't write range-intensive D code for now. It's too risky to use high level functions in D because it can result in any kind of memory corruption because of at least "Issue 7965 - Invalid outer function scope pointer in some cases":
http://d.puremagic.com/issues/show_bug.cgi?id=7965

Not sure about other such issues but at least that one makes high level functions almost unusable because of a danger.

-- 
Денис В. Шеломовский
Denis V. Shelomovskij


June 10, 2012
"bearophile" <bearophileHUGS@lycos.com> wrote in message news:lkplokawtyisvwhvfsew@forum.dlang.org...
> The introduction of UFCS in D offers new ways to format D code, especially when your code uses many high order functions. What is a good layout of the D code in such situations? I have tried several alternative layouts, and in the end I found to appreciate a layout similar to the one used in F# code. Below I show a kind of  extreme example :-)
>
[...]
>
>     auto t = "table.txt"
>              .File()
>              .byLine()
>              .map!(r => r.removechars("^01".dup))()
>              .array();
>
>     const transposed = t[0]
>                        .length
>                        .iota()
>                        .map!(i => t.transversal(i).array())()
>                        .array();
>
>     (t ~ [(char[]).init] ~ transposed)
>     .map!(r => r
>                .group()
>                .filter!(p => p[0] == '1')()
>                .map!(p => p[1].text())()
>                .join(" ")
>          )()
>     .join("\n")
>     .writeln();
> }
>
>

That's basically what I do, except I always indent by exactly one tab.


June 10, 2012
On 09.06.2012 14:43, bearophile wrote:
[snip]
> import std.stdio, std.algorithm, std.string, std.range, std.conv;
>
> void main() {
>
> auto t = "table.txt"
> .File()
> .byLine()
> .map!(r => r.removechars("^01".dup))()
> .array();
>
> const transposed = t[0]
> .length
> .iota()

//hopefully I'm not alone in that this:
iota(t[0].length)
	.map!(...)
	...
is so much more readable.



-- 
Dmitry Olshansky
June 10, 2012
Having to append .array() all the time is rather annoying. I can't help but feel that there's a better solution than this. Are lazy Result methods really the default way of doing things? I'd rather have eager versions.

June 10, 2012
On 11.06.2012 1:49, ixid wrote:
> Having to append .array() all the time is rather annoying. I can't help
> but feel that there's a better solution than this. Are lazy Result
> methods really the default way of doing things? I'd rather have eager
> versions.
>

And then try to get lazy version back. Now that would be harder then appending .array() wouldn't it? ;)

-- 
Dmitry Olshansky
June 10, 2012
On Sunday, 10 June 2012 at 21:49:14 UTC, ixid wrote:
> Having to append .array() all the time is rather annoying. I can't help but feel that there's a better solution than this. Are lazy Result methods really the default way of doing things? I'd rather have eager versions.

Problem with eager versions is that you end up doing multiple memory allocations and passes when combining range manipulators. e.g.

r.take(n).filter!(blah).map!(whatever).array()

Only allocates once, and is only one linear pass over the memory.

If they were all eager, 'take' would create a new array, then 'filter' would pass over that and create a new array, then 'map' would pass over that and create yet another array. Not very efficient.


June 10, 2012
On Sunday, June 10, 2012 23:49:12 ixid wrote:
> Having to append .array() all the time is rather annoying. I can't help but feel that there's a better solution than this. Are lazy Result methods really the default way of doing things? I'd rather have eager versions.

Eager versions are only good if you don't want to pass the result to another function. And since Phobos (and plenty of user code) uses ranges all over the place, you'd end up allocating memory all over the place, whereas right now, you only allocate it when you explicitly call a function which allocates memory to hold the data - e.g. array. Something like

foreach(e; map!"to!string(a)"(filter!"a <= 50"(arr, 42)))
{
    doSomething(e);
    if(someCondition)
        break;
}

would end up allocating  for both the result of filter and map, when it doesn't actually need to allocate for _either_ of them. Right now, it can process them lazily, only filtering and mapping for the elements that actually get processed. By using lazy ranges, you avoid both unnecessary allocations and avoid having to process all of the elements in a range if you don't need to.

On top of all of that, if a function doesn't return a lazy range, you _can't_ use it with infinite ranges, so if range-based functions all eagerly operated on ranges, infinite ranges would be useless.

- Jonathan M Davis
June 11, 2012
Thank you. May I ask though, is the argument against automatically appending .array() when a single or chain of lazy functions are used to set a variable or set of variables just syntactic salt against accidentally doing it eagerly?


June 11, 2012
On Monday, June 11, 2012 02:11:24 ixid wrote:
> Thank you. May I ask though, is the argument against automatically appending .array() when a single or chain of lazy functions are used to set a variable or set of variables just syntactic salt against accidentally doing it eagerly?

D doesn't do much of anything like that automatically, Aside from the fact tha foreach supports the ranges, _nothing_ in the compiler supports them. They're entirely a library artifact. So, the compiler isn't going to do _anything_ to them automatically. And besides, it's not necessarily the case that you want the result of a range-based function to be an array. What if I want to keep the result of map on the stack?

auto a = map!"to!string(a)"(arr);

I could pass it to multiple functions later without having to allocate anything. Forcing it to be an array would stop that. And it would be downright nasty for that to happen when dealing with containers, because a number of containers require that you pass them the _exact_ range type that you give them (e.g. remove does this). Converting those ranges to arrays just because you assigned them to a variable would be a _big_ problem.

And as for infinite ranges again, if they automatically were converted to arrays when assigned to variables, then you couldn't have variables for them at all, because they _have_ to be eager. Take a random number generator for instance. If all ranges were automatically converted to arrays when assigned to variables, then you couldn't do

auto generator = rndGen();

You'd be stuck in an infinite loop if you tried.

Having to use std.array.array to explicitly convert a range to an array really doesn't cost you much, and trying to do it automatically would be incredibly error-prone and costly. And really, the more that you use rang-based functions, the less that you need to convert ranges to arrays.

- Jonathan M Davis
« First   ‹ Prev
1 2