View mode: basic / threaded / horizontal-split · Log in · Help
October 26, 2012
Another day in the ordeal of cartesianProduct
http://d.puremagic.com/issues/show_bug.cgi?id=8900

:-(

(The code there is called cartesianProd but it's the reduced code, so it
doesn't really compute the cartesian product. But that's where it's
from.)

So far, the outstanding blockers for cartesianProduct are:
1) Compiler bug which causes unittest failure:

	std/range.d(4629): Error: variable lower used before set
	std/range.d(4630): Error: variable upper used before set

(Jonathan had a pull request with a Phobos workaround for this, which I
_think_ is already merged, but the autotester is still failing at this
point. :-/)

2) Issue 8542 (crosstalk between template instantiations)

3) And now, issue 8900 (zip fails to compile with repeat(char[]))

So there's still no joy for cartesianProduct. :-(

I'm getting a bit frustrated with the Phobos bugs related to ranges and
std.algorithm. I think we need to increase the number of unittests. And
by that I mean, GREATLY increase the number of unittests. Most of the
current tests are merely sanity tests for the most common usage
patterns, most basic types, or tests added as a result of fixed bugs.

This is inadequate.

We need to actively unittest corner cases, rare combinations, unusual
usages, etc.. Torture test various combinations of range constructs.
Algorithms. Nested range constructs. Nested algorithms. Deliberately
cook up nasty tests that try their best to break the code by using
unusual parameters, unusual range-like objects, strange data, etc.. Go
beyond the simple cases to test non-trivial things.  We need unittests
that pass unusual structs and objects into the range constructs and
algorithms, and make sure they actually work as we have been _assuming_
they should.

I have a feeling there are a LOT of bugs lurking in there behind
overlooked corner cases, off by 1 errors, and other such careless slips,
as well as code that only works for basic types like arrays, which
starts breaking when you hand it something non-trivial.  All these
issues must be weeded out and prevented from slipping back in.

Here's a start:

- Create a set of structs/classes (inside a version(unittest) block)
 that are input, forward, bidirectional, output, etc., ranges, that are
 NOT merely arrays.

- There should be some easy way, perhaps using std.random, of creating
 non-trivial instances of these things.  These should be put in a
 separate place, perhaps outside the std/ subdirectory, where they can
 be imported into unittest blocks by std.range, std.algorithm, whatever
 else that needs extensive testing.

- Use these ranges as input for testing range constructs and algorithms.

- For best results, use a compile-time loop to loop over a given
 combination of these range types, and run them through the same set of
 tests. This will improve the currently spotty test coverage.  Perhaps
 provide some templated functions that, given a set of range types
 (from the above structs/classes) and a set of functions, run through
 all combinations of them to make sure they all work. (We run unittests
 separately anyway, we aren't afraid of long-running tests.)


T

-- 
"How are you doing?" "Doing what?"
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
On Friday, 26 October 2012 at 22:43:44 UTC, H. S. Teoh wrote:
> I'm getting a bit frustrated with the Phobos bugs related to 
> ranges and
> std.algorithm. I think we need to increase the number of 
> unittests. And
> by that I mean, GREATLY increase the number of unittests. Most 
> of the
> current tests are merely sanity tests for the most common usage
> patterns, most basic types, or tests added as a result of fixed 
> bugs.
>
> This is inadequate.

Unit tests are not the solution to this. There are too many range 
combinations and types to effectively unit test everything. What 
we need is proper semantic support for ranges (and other 
conceptual classes of types). For example, this should not 
compile:

void foo(Range)(Range r) if (isForwardRange!Range)
{
    r.popBack();
}

But it will, because of the way templates work in D -- nothing is 
checked until you try to use it.

Imagine I could compile non-template code like this:

void foo(int x)
{
    writeln(x.name);
}

Imagine you had to try to call it before the static type error 
was caught. This is not a way to write robust code, especially 
with the combinatorial explosion of range types that exist.

Template constraints are nice, but not enough. We really need 
something *like* the proposed C++ concepts, or just something 
analogous to interfaces. I should be able to write something like 
this:

void foo(ForwardRange Range)(Range r)
{
    r.popBack();
}

And *immediately* get a type error without having to instantiate 
it with a variety of different range types. This is the only way 
I can see these kinds of problems going away. Unit testing does 
not scale with exponential use cases.
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
Peter Alexander:

> void foo(ForwardRange Range)(Range r)
> {
>     r.popBack();
> }
>
> And *immediately* get a type error without having to 
> instantiate it with a variety of different range types.

It's not an easy problem. To solve it the Rust language uses 
typeclasses adapted from Haskell. But unlike the usual Haskell 
compilers Rust doesn't use global type inferencing and it keeps 
the C++-style monomorphization, so at run-time its generic 
programming is as efficient as C++ generic programming.

I also hope Rust designers will take a look at this (Efficient 
Dynamic Dispatch without Virtual Function Tables. The SmallEiffel 
Compiler), an alternative to virtual tables. Maybe the D 
front-end could do the same on request with a compilation switch 
(with no changes in D language):

http://smarteiffel.loria.fr/papers/oopsla97.pdf

Bye,
bearophile
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
On Saturday, 27 October 2012 at 11:41:07 UTC, bearophile wrote:
> Peter Alexander:
>
>> void foo(ForwardRange Range)(Range r)
>> {
>>    r.popBack();
>> }
>>
>> And *immediately* get a type error without having to 
>> instantiate it with a variety of different range types.
>
> It's not an easy problem. To solve it the Rust language uses 
> typeclasses adapted from Haskell. But unlike the usual Haskell 
> compilers Rust doesn't use global type inferencing and it keeps 
> the C++-style monomorphization, so at run-time its generic 
> programming is as efficient as C++ generic programming.

Yeah, it's certainly not going to be easy. It's unfortunate that 
D adopted the whole C++ style "glorified macros" approach to 
templates -- it makes it very difficult to reason about (or 
automate reasoning about) the semantics of your code.

Retrofitting some sort of structure to templates will be a 
Herculean task, but I think it has to happen. It is clear to me 
that the development process we use now (write the template, try 
a few instantiations, pray) is unsustainable beyond simple 
templates.
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
On 10/26/12 6:45 PM, H. S. Teoh wrote:
> http://d.puremagic.com/issues/show_bug.cgi?id=8900
>
> :-(
>
> (The code there is called cartesianProd but it's the reduced code, so it
> doesn't really compute the cartesian product. But that's where it's
> from.)
>
> So far, the outstanding blockers for cartesianProduct are:
> 1) Compiler bug which causes unittest failure:
>
> 	std/range.d(4629): Error: variable lower used before set
> 	std/range.d(4630): Error: variable upper used before set
>
> (Jonathan had a pull request with a Phobos workaround for this, which I
> _think_ is already merged, but the autotester is still failing at this
> point. :-/)
>
> 2) Issue 8542 (crosstalk between template instantiations)
>
> 3) And now, issue 8900 (zip fails to compile with repeat(char[]))
>
> So there's still no joy for cartesianProduct. :-(

Often when there are many issues caused by a specific artifact, there 
are only a couple of compiler bugs manifesting themselves in various 
unpredictable ways.

> I'm getting a bit frustrated with the Phobos bugs related to ranges and
> std.algorithm. I think we need to increase the number of unittests. And
> by that I mean, GREATLY increase the number of unittests.

I don't think that's the best way. Walter and I have had a lasting 
disagreement about this as he firmly believes unittests are the way to 
make sure the compiler works well. In my opinion the psychological 
effect has been negative because this outlook has caused 
incompletely-defined features to get implemented (in a latent 
expectation that unittests will cause any issues to be detected).


Andrei
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
On 10/27/12 6:20 AM, Peter Alexander wrote:
> On Friday, 26 October 2012 at 22:43:44 UTC, H. S. Teoh wrote:
>> I'm getting a bit frustrated with the Phobos bugs related to ranges and
>> std.algorithm. I think we need to increase the number of unittests. And
>> by that I mean, GREATLY increase the number of unittests. Most of the
>> current tests are merely sanity tests for the most common usage
>> patterns, most basic types, or tests added as a result of fixed bugs.
>>
>> This is inadequate.
>
> Unit tests are not the solution to this. There are too many range
> combinations and types to effectively unit test everything.

Agreed.

> What we need
> is proper semantic support for ranges (and other conceptual classes of
> types). For example, this should not compile:
>
> void foo(Range)(Range r) if (isForwardRange!Range)
> {
> r.popBack();
> }

No, this has nothing to do with the problem at hand.


Andrei
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
On 10/27/12 8:23 AM, Peter Alexander wrote:
> Yeah, it's certainly not going to be easy. It's unfortunate that D
> adopted the whole C++ style "glorified macros" approach to templates --
> it makes it very difficult to reason about (or automate reasoning about)
> the semantics of your code.

I think D has reached a sweet spot with restricted templates.

> Retrofitting some sort of structure to templates will be a Herculean
> task, but I think it has to happen. It is clear to me that the
> development process we use now (write the template, try a few
> instantiations, pray) is unsustainable beyond simple templates.

It's not clear to me at all. The mechanism works very well and is more 
expressive than alternatives used by other languages.


Andrei
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
On Sat, Oct 27, 2012 at 09:06:11AM -0400, Andrei Alexandrescu wrote:
> On 10/27/12 8:23 AM, Peter Alexander wrote:
> >Yeah, it's certainly not going to be easy. It's unfortunate that D
> >adopted the whole C++ style "glorified macros" approach to templates
> >-- it makes it very difficult to reason about (or automate reasoning
> >about) the semantics of your code.
> 
> I think D has reached a sweet spot with restricted templates.
> 
> >Retrofitting some sort of structure to templates will be a Herculean
> >task, but I think it has to happen. It is clear to me that the
> >development process we use now (write the template, try a few
> >instantiations, pray) is unsustainable beyond simple templates.
> 
> It's not clear to me at all. The mechanism works very well and is more
> expressive than alternatives used by other languages.
[...]

I think the complaint is that the template constraints enforce ducktype
requirements on the *caller*, but not on the template code itself.
That's what Peter's example was about: the signature constraint declares
that the template takes an input range, yet the template body makes use
of non-input range properties of the argument.

I think there is some merit to being able to declare concepts; for
example:

	// This concept matches any type that has fields that satisfy
	// what's specified inside the body.
	concept InputRange(T) {
		bool empty;	// this matches @property bool empty() too
		T front;
		void popFront();
	}

	auto myRangeBasedFunc(InputRange r) {
		r.popBack();	// compile-time error: InputRange does
				// not define .popBack
	}

This way, *both* the user of the template and the template writer are
kept "honest" (i.e., they both have to conform to the requirements of an
input range). The compiler can thus statically check the template for
correctness *before* it ever gets instantiated.

Not only so, the compiler will be able to generate a meaningful error
message when the templates fail to match -- it can tell the user "the
template didn't match 'cos struct X that you tried to pass to it isn't
an InputRange, nor a ForwardRange, ... etc.".

Currently, if the template writer screws up, the compiler doesn't know
any better, and only when the user actually tries to pass something that
the template body can't handle, does the error manifest itself. Which
seems a bit late -- it should be the template writer who first notices
the problem (like if the compiler can statically verify the template
body and complain loudly before the code ever gets to the user). Plus,
the error message is usually a long cryptic sequence of instantiation
failures containing all sorts of internal Phobos types that the user has
no idea about. The compiler can hardly do much better, because it
doesn't have enough information to make a meaningful message.


T

-- 
Старый друг лучше новых двух.
October 27, 2012
Re: Another day in the ordeal of cartesianProduct
Am Sat, 27 Oct 2012 12:20:51 +0200
schrieb "Peter Alexander" <peter.alexander.au@gmail.com>:

> On Friday, 26 October 2012 at 22:43:44 UTC, H. S. Teoh wrote:
> > I'm getting a bit frustrated with the Phobos bugs related to 
> > ranges and
> > std.algorithm. I think we need to increase the number of 
> > unittests. And
> > by that I mean, GREATLY increase the number of unittests. Most 
> > of the
> > current tests are merely sanity tests for the most common usage
> > patterns, most basic types, or tests added as a result of fixed 
> > bugs.
> >
> > This is inadequate.
> 
> Unit tests are not the solution to this. There are too many range 
> combinations and types to effectively unit test everything.


Yes and no. We can't test all possible combinations, but we should try
to at least test all code paths:
------------------
void test(R)(R range)
{
   static if(isInputRange!R)
       r.popFront();
   else if(isForwardRange!R)
       r.saev();
}

unittest
{
   //Should test at least with InputRange & ForwardRange
}
-------------------

We sometimes even have templates that are not used in phobos at all
(and don't have unittests). Those are really evil as they can break
silently if some typo is introduced.
October 28, 2012
Re: Another day in the ordeal of cartesianProduct
On Saturday, 27 October 2012 at 13:06:09 UTC, Andrei Alexandrescu 
wrote:
> On 10/27/12 8:23 AM, Peter Alexander wrote:
>> Retrofitting some sort of structure to templates will be a 
>> Herculean
>> task, but I think it has to happen. It is clear to me that the
>> development process we use now (write the template, try a few
>> instantiations, pray) is unsustainable beyond simple templates.
>
> It's not clear to me at all. The mechanism works very well and 
> is more expressive than alternatives used by other languages.

I'm not sure I can agree it works well.

For example, here's what happened with bug 8900 mentioned in the 
OP:

std.range.zip creates a Zip object, which has a Tuple member. 
Tuple has a toString function, which calls formatElement, which 
calls formatValue, which calls formatRange, which (when there's a 
range of characters) has a code path for right-aligning the 
range. To right-align the range it needs to call walkLength.

The problem arises when you zip an infinite range of characters 
e.g. repeat('a').

Before pull request 880, walkLength accepted infinite ranges and 
just returned size_t.max. Pull request 880 added a constraint to 
walkLength to stop it accepting infinite ranges. Suddenly, you 
cannot zip a range of infinite chars because of a seemingly 
unrelated change.

This would have been caught if there was a unit test, but there 
wasn't, and as Dijkstra says, "testing can be a very effective 
way to show the presence of bugs, but it is hopelessly inadequate 
for showing their absence." There are probably other places that 
are broken, and many changes in the future will just introduce 
more bugs without tests.

Maybe we have different standards for "working well", but to me 
at least, this isn't what "working well" looks like.

Working well in this case would look like this:

- The person that put together pull request 880 would add the 
template constraint to walkLength.
- On the next compile he would get this error: "formatRange 
potentially calls walkLength with an infinite range." (or 
something along those lines).
- The person fixes formatRange, and all is well.

No need for unit tests, it's all caught as soon as possible 
without need for instantiation.
« First   ‹ Prev
1 2 3
Top | Discussion index | About this forum | D home