May 08, 2014
On 5/8/2014 12:22 PM, Nick Sabalausky wrote:
> On 5/8/2014 2:46 PM, Walter Bright wrote:
>>
>> But to make a lazy version from an eager one means reimplementing it.
>>
>
> Or yield()-ing inside the eager one's sink.

The data is still supplied to it.

May 08, 2014
On Thursday, 8 May 2014 at 18:21:32 UTC, H. S. Teoh via
Digitalmars-d wrote:
> I've thought about input ranges vs. output ranges for a bit.  I think it
> doesn't make sense for functions that process data to take an output
> range: output ranges are data sinks, and should only be used for the
> endpoint of a data processing pipeline. Since the string function
> doesn't know whether or not it's the last in a pipeline (only the
> calling code can know this), it should return an input range. If the
> user code wants to put the result into an output range, then it should
> simply use std.algorithm.copy.

I agree with H. S. Teoh. Indeed, I was thinking of trying to
create an alternative version of std.format which returned an
InputRange, instead of taking an OutputRange. The benefit of this
becomes obvious when you want to use std.format() in your own
ranges, which currently impairs laziness, pipelining, avoiding
memory allocations, etc.
May 08, 2014
On 5/8/14, 11:27 AM, Walter Bright wrote:
> On 5/8/2014 10:46 AM, Andrei Alexandrescu wrote:
>> A discussion is building around
>> https://github.com/D-Programming-Language/phobos/pull/2149, which is a
>> nice
>> initiative by Walter to allow Phobos users to avoid or control memory
>> allocation.
>
> The setExtension() function is itself not very important, but what is
> important is an example for how to put together ranges.
>
> Some design goals:
>
> 1. purity, @safe, nothrow, @nogc
> 2. composability
> 3. have them work in a consistent way, so there's less for a user to learn

4. Rule of least surprise

Some may be surprised that auto c = setExtension(a, b) doesn't actually just do it. So changing a and/or b before using c would yield surprising result for someone coming from a background of eager-string languages.


Andrei

May 08, 2014
On 5/8/2014 2:20 PM, Andrei Alexandrescu wrote:
> Some may be surprised that auto c = setExtension(a, b) doesn't actually just do
> it. So changing a and/or b before using c would yield surprising result for
> someone coming from a background of eager-string languages.

It's true that when I first encountered C#'s LINQ, I was surprised that it was lazy.

It's also true that most of std.algorithm is lazy. Apart from coming up with a new naming convention (and renaming algorithms in Phobos), I don't see any obvious solution to what's lazy and what's not.

One possibility is to informally (i.e. in the documentation rather than the core language spec) call something an 'algorithm' if it is lazy and 'function' if it is eager.

May 08, 2014
On 5/8/14, 1:41 PM, "Luís Marques" <luis@luismarques.eu>" wrote:
> On Thursday, 8 May 2014 at 18:21:32 UTC, H. S. Teoh via
> Digitalmars-d wrote:
>> I've thought about input ranges vs. output ranges for a bit.  I think it
>> doesn't make sense for functions that process data to take an output
>> range: output ranges are data sinks, and should only be used for the
>> endpoint of a data processing pipeline. Since the string function
>> doesn't know whether or not it's the last in a pipeline (only the
>> calling code can know this), it should return an input range. If the
>> user code wants to put the result into an output range, then it should
>> simply use std.algorithm.copy.
>
> I agree with H. S. Teoh. Indeed, I was thinking of trying to
> create an alternative version of std.format which returned an
> InputRange, instead of taking an OutputRange. The benefit of this
> becomes obvious when you want to use std.format() in your own
> ranges, which currently impairs laziness, pipelining, avoiding
> memory allocations, etc.

Interesting. So then the range returned by format() will save everything passed to it, which means...

int fun(int[] a)
{
   auto before = format("Before: %s", a);
   foreach (ref e; a) ++e;
   auto after = format("After: %s", a);
   writeln(before, "\n--->\n", after);
}

*ouch*

Andrei

May 08, 2014
On Thu, May 08, 2014 at 02:38:18PM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
> On 5/8/14, 1:41 PM, "Luís Marques" <luis@luismarques.eu>" wrote:
> >On Thursday, 8 May 2014 at 18:21:32 UTC, H. S. Teoh via Digitalmars-d wrote:
> >>I've thought about input ranges vs. output ranges for a bit.  I think it doesn't make sense for functions that process data to take an output range: output ranges are data sinks, and should only be used for the endpoint of a data processing pipeline. Since the string function doesn't know whether or not it's the last in a pipeline (only the calling code can know this), it should return an input range. If the user code wants to put the result into an output range, then it should simply use std.algorithm.copy.
> >
> >I agree with H. S. Teoh. Indeed, I was thinking of trying to create an alternative version of std.format which returned an InputRange, instead of taking an OutputRange. The benefit of this becomes obvious when you want to use std.format() in your own ranges, which currently impairs laziness, pipelining, avoiding memory allocations, etc.
> 
> Interesting. So then the range returned by format() will save
> everything passed to it, which means...
> 
> int fun(int[] a)
> {
>    auto before = format("Before: %s", a);
>    foreach (ref e; a) ++e;
>    auto after = format("After: %s", a);
>    writeln(before, "\n--->\n", after);
> }
> 
> *ouch*
[...]

Yeah, I'm not so sure it's a good idea for std.format to be lazy. But there are valid use cases for both, so now I'm wondering if we should adopt Walter's idea of using a naming convention to distinguish between algorithms (lazy) and functions (eager).


T

-- 
WINDOWS = Will Install Needless Data On Whole System -- CompuMan
May 08, 2014
On Thu, May 08, 2014 at 02:54:32PM -0700, H. S. Teoh via Digitalmars-d wrote:
> On Thu, May 08, 2014 at 02:38:18PM -0700, Andrei Alexandrescu via Digitalmars-d wrote:
> > On 5/8/14, 1:41 PM, "Luís Marques" <luis@luismarques.eu>" wrote:
[...]
> > >I agree with H. S. Teoh. Indeed, I was thinking of trying to create an alternative version of std.format which returned an InputRange, instead of taking an OutputRange. The benefit of this becomes obvious when you want to use std.format() in your own ranges, which currently impairs laziness, pipelining, avoiding memory allocations, etc.
> > 
> > Interesting. So then the range returned by format() will save
> > everything passed to it, which means...
> > 
> > int fun(int[] a)
> > {
> >    auto before = format("Before: %s", a);
> >    foreach (ref e; a) ++e;
> >    auto after = format("After: %s", a);
> >    writeln(before, "\n--->\n", after);
> > }
> > 
> > *ouch*
> [...]
> 
> Yeah, I'm not so sure it's a good idea for std.format to be lazy. But there are valid use cases for both, so now I'm wondering if we should adopt Walter's idea of using a naming convention to distinguish between algorithms (lazy) and functions (eager).
[...]

On second thoughts, perhaps std.format *should* be lazy, or at least, the core implementation should be lazy, and std.format should be an eager wrapper around it (using std.algorithm.copy to copy the result of the underlying formatting engine to the output range).

The problem is, the current design of std.format is heavily dependent on output ranges (e.g., user-defined types that define a toString that takes a delegate as an output range), and changing this now is going to cause large-scale breakage of existing code. I don't see any clean way of continuing to support toString(delegate) without introducing yet more hidden allocations (to buffer the output of toString, for example), if we switch std.format to a lazy implementation.

The only way I can think of to do this is to have language-level support for yield(), then you can hook things up this way:

	module std.format;
	auto format(A...)(string fmt, A args) {
		struct Result {
			...
			@property auto front() {
				// assume typeof(args[i]) = user-defined type
				args[i].toString((const(char)[] data) {
					// This switches to a different
					// fiber than the one running
					// toString.
					yield(data);
				});
				...
			}
			...
		}
		return Result(...);
	}

	module usercode;
	struct T {
		void toString(scope void delegate(const(char)[]) sink)
		{
			...
			// Each of these calls will be yielded by the
			// sink, so they will "block" until the outer
			// caller has processes the new data.
			sink("abc");
			sink("def");
			...
		}
	}

This requires heavy-duty yield() support in the language, though. I don't know how likely such a thing will make it into the language...


T

-- 
Winners never quit, quitters never win. But those who never quit AND never win are idiots.
May 08, 2014
On Thursday, 8 May 2014 at 21:38:12 UTC, Andrei Alexandrescu
wrote:
> int fun(int[] a)
> {
>    auto before = format("Before: %s", a);
>    foreach (ref e; a) ++e;
>    auto after = format("After: %s", a);
>    writeln(before, "\n--->\n", after);
> }
>
> *ouch*

You might think that's bad, but there's nothing in that example
that is specific to formatting; that kind of behavior can happen
with any lazy range taking input by ref. (I guess that's one
drawback of providing functional programming without
immutability?) To be clear, I'm not saying that the idea was to
make the existing format function lazy, break existing code and
surprise people with that change in semantics. format() can be an
eager wrapper for the new lazy functionality. And
eagerness/laziness should be part of the specification IMO, even
more than asymptotic complexity should be part of the STL spec,
since (bug > performance bug).
May 08, 2014
On 5/8/14, 10:46 AM, Andrei Alexandrescu wrote:
[snip]

Here's a list of algorithms in std.algorithm that would need to be looked at if we want to handle ranges of char and wchar properly (at least versions with predicates). When I say "looked at" I mean "modified appropriately to handle ranges of char and wchar in addition to strings".

========================

Searching: all  any  canFind  count  countUntil  commonPrefix  endsWith  find  findAdjacent  findAmong  findSkip  findSplit  findSplitAfter findSplitBefore  minCount  minPos  mismatch  skipOver  startsWith  until

Comparison: cmp  equal  levenshteinDistance  levenshteinDistanceAndPath  mismatch

Iteration: filter  filterBidirectional  group  map  reduce  splitter  uniq

Sorting: completeSort  isPartitioned  isSorted  makeIndex nextPermutation  nextEvenPermutation  partialSort  partition  partition3  schwartzSort  sort  topN  topNCopy

Set operations: (probably none, because set ops are rare on strings)

Mutation: bringToFront  copy  fill  initializeAll  moveAll  moveSome remove  reverse  strip  stripLeft  stripRight  swapRanges  uninitializedFill

========================

Of these, only a fraction (maybe 60-70%) are actually used with strings. Still that leaves a significant amount of work to do if we want std.algorithm to work smoothly with ranges of char/wchar. Here "smoothly" means "if it compiles and runs it won't do something UTF-goofy".



Andrei

May 08, 2014
On 5/8/14, 3:33 PM, Andrei Alexandrescu wrote:
> Of these, only a fraction (maybe 60-70%) are actually used with strings.
> Still that leaves a significant amount of work to do if we want
> std.algorithm to work smoothly with ranges of char/wchar. Here
> "smoothly" means "if it compiles and runs it won't do something UTF-goofy".

The larger point here (I forgot) is that user code that's written in the style of std.algorithm would need similar adjustments in order to work with ranges of char/wchar.

Andrie