December 17, 2013
On Tue, Dec 17, 2013 at 10:13:43AM -0800, Andrei Alexandrescu wrote:
> On 12/17/13 8:26 AM, H. S. Teoh wrote:
> >Maybe this should be a community effort. Let each of us come up with a new iota, and we can compare and incorporate each other's ideas to produce the best implementation. How's that?
> 
> I think the deal with "a in iota(b, c)" gets odd quite fast if e.g. those are floating-point numbers and the questions shows up whether a must be exactly a value obtained by iterating from b to c.
[...]

Sorry, I was talking about improving iota support for non-builtin types that support ++ or +/*.

I don't agree that "a in iota(b,c)" is a good idea; it looks cute but hides a potential performance hit if not properly implemented. Iota is for iteration, not for general representation of numerical ranges. And yes, when floating-point values are involved, it quickly becomes quite nasty.  So I vote against it in favor of between(). Specifically, this variant among those proposed:

	bool between(string bounds="[]", T, U, V)(T val, U lower, V upper);

	unittest
	{
		assert(1.between(0u, 2.0f));	// should work with mixed types
		assert(3.14.between(3u, BigInt(4)); // and library types
	}


T

-- 
Stop staring at me like that! It's offens... no, you'll hurt your eyes!
December 17, 2013
17-Dec-2013 22:16, Andrei Alexandrescu пишет:
> On 12/17/13 9:17 AM, Andrej Mitrovic wrote:
>> On 12/17/13, Byron <byron.heads@gmail.com> wrote:
>>> I don't know why we can't do this instead:
>>>
>>> if (foo in ["alpha", "beta", "delta"] ) {
>>
>> You can come pretty close with:
>>
>> if (foo in A["alpha", "beta", "delta"] )
>
> Wouldn't
>
> if (foo in LiteralSet("alpha", "beta", "delta"))
>
> be a ton faster?
>

Or even:
if (foo in LiteralSet!("alpha", "beta", "delta"))

>
> Andrei
>


-- 
Dmitry Olshansky
December 17, 2013
On Tuesday, 17 December 2013 at 18:16:59 UTC, Andrei Alexandrescu wrote:

> if (foo in LiteralSet("alpha", "beta", "delta"))

if "_" referred to anything undefined then you could do:

if (foo in ["alpha":_,"beta":_,"delta":_])

Side-note: LLVM actually uses a value "undef" that it uses for reasoning, so that

(x & undef) == 0
(x | undef) == -1

Kind of cute.

Or even better:

if (foo in ["alpha":-),"beta":-),"delta":-)])
December 17, 2013
On Tuesday, 17 December 2013 at 18:15:29 UTC, Andrei Alexandrescu wrote:
> On 12/17/13 8:43 AM, Byron wrote:
>> I don't know why we can do this instead:
>>
>> if(foo in ["alpha", "beta", "delta"] ) {
>>
>> }
>
> It's a good idea.
>
>> basically have an opIn operator  x in y -> y.opIn(x)
>> U* opIn(T, U)(T key)
>> can define in runtime for arrays, and allow custom ones to be defines
>
> No, that should only work for literal arrays.

Any major reason why? Right now in reminds of + operator in java, only language devs can decide how its used.

>
>> and
>>
>> if(1 < x <= 10) {
>>
>> }
>>
>> I remember this being shot down before...
>
> "Don't change semantics of C code".

How does this change semantics of C code anymore then in, is, ~ does? Its okay if this has been beaten to death and you don't want to comment more on it.

>
>> Both of these are easier to read, and are more natural.  They also cover
>> more cases.
>
> No - "among" could take a custom predicate.

Is that not just a special case of reduce/filter/find then?

>
>
> Andrei


December 17, 2013
On Tuesday, 17 December 2013 at 18:06:27 UTC, Andrei Alexandrescu wrote:
> That's problematic. "Between" is a preposition. Naming a function as a preposition is fine as long as a verb is implied (e.g. "a in b" really means "a _is_ in b", or "a.between(b, c)" really means "a _is_ between b and c" etc).

I see... interesting. But this doesn't suggest that the concept is bad, just the name.

> "x in between(2, 7)" is cute but just that - it's a lucky strike that relies on word ordering in a particular phrase and is unlikely to work in many other places.

I agree it's cute and just lucky. I'm not attached to the name, but I'd like some sort of name that evokes the purpose like that does (as an example of something I wouldn't like reading, `x in iota(2, 7)` ...)

> Reifying "between" to the status of object is weird. One constructs a "between" object and then what are its primitives? How can one even talk about it? "Yeah I have a between here and I copy it to another between"...

To be honest, I'm just the kind of person to come up with very weird ideas, so it's not surprising people might find my idea weird. It doesn't necessarily have to be called "between" but some sort of object (being able to contain the state is important) could contain the concept of a range of values ("inclusive lowerbound", "exclusive upperbound", support things like "opIn" or an opCall to test a value for membership). It'd also be needed for it to have a simple way to get the smallest acceptable type for the range of values the "between" object could represent. So a for a Between!(uint, int) that would be a uint, and a Between!(int, uint) that would be a long, and so on. Obviously some things _don't_ have acceptable types, such as a Between!(long, ulong) (no integral type currently can actually hold all of those values).

Something like this, like I showed, could be used to pass to other functions like std.random.uniform which request a range to generate. Or you should be able to pass it to something like std.algorithm.find, std.algorithm.count, etc (predicates that take one parameter).

On Tuesday, 17 December 2013 at 01:02:28 UTC, Chris Cain wrote:
> but its main value would probably come improved readability and reduced code duplication

Actually, I thought about this a bit more and its value might be
greater than previously thought... what about the issues people
have with unsigned vs signed comparisons? (consider, someone in this very topic corrected your proposed code because of this very issue):

---
int a = -1;
uint b = 0;
assert(a < b); // oops, fails.
---

This isn't a new revelation or anything, and the solution, of
course, is to do more complex tests like `assert(a < 0 || a <
b);` but the compiler doing those sorts of things automatically
is questionable. Instead, covering the use-cases with
functions/objects like `between` where template magic can insert
these extra tests automatically might be a viable strategy.

And as another example of something falling prey to this "unexpected" behavior, look no further than the example I've already given: std.random.uniform ...

---
writeln(uniform(-1, 1u));
// std.random.uniform(): invalid bounding interval [-1, 1)
// Wat?
foreach(_; 0..5) {
    writeln(uniform(-2, uint.max));
    // Oops! Always prints out 4294967294
    // as if the bounding interval was [-2u, -1u)
}
---

https://d.puremagic.com/issues/show_bug.cgi?id=11758


These types of things are apparently very common issues. Having an object encapsulating the behavior needed to check this stuff and using it is preferable to reimplementing the checks each time (and potentially failing each time in subtle ways).
December 17, 2013
On 12/17/13 1:29 PM, Chris Cain wrote:
> It
> doesn't necessarily have to be called "between" but some sort of object
> (being able to contain the state is important) could contain the concept
> of a range of values ("inclusive lowerbound", "exclusive upperbound",
> support things like "opIn" or an opCall to test a value for membership).

Interval arithmetic (http://en.wikipedia.org/wiki/Interval_arithmetic) comes to mind. Like bounds-checked numbers, units, and probabilities, it's one of those bread-and-butter types that nobody got around to implement for Phobos yet.

In an interval arithmetic approach numbers would compare actually equal, i.e. 10 == IntervalInt(5, 100) would be true.


Andrei

December 17, 2013
On Tue, Dec 17, 2013 at 10:29:22PM +0100, Chris Cain wrote:
> On Tuesday, 17 December 2013 at 18:06:27 UTC, Andrei Alexandrescu wrote:
> >That's problematic. "Between" is a preposition. Naming a function as a preposition is fine as long as a verb is implied (e.g. "a in b" really means "a _is_ in b", or "a.between(b, c)" really means "a _is_ between b and c" etc).
> 
> I see... interesting. But this doesn't suggest that the concept is bad, just the name.
> 
> >"x in between(2, 7)" is cute but just that - it's a lucky strike that relies on word ordering in a particular phrase and is unlikely to work in many other places.
> 
> I agree it's cute and just lucky. I'm not attached to the name, but I'd like some sort of name that evokes the purpose like that does (as an example of something I wouldn't like reading, `x in iota(2, 7)` ...)
> 
> >Reifying "between" to the status of object is weird. One constructs a "between" object and then what are its primitives? How can one even talk about it? "Yeah I have a between here and I copy it to another between"...
> 
> To be honest, I'm just the kind of person to come up with very weird ideas, so it's not surprising people might find my idea weird. It doesn't necessarily have to be called "between" but some sort of object (being able to contain the state is important) could contain the concept of a range of values ("inclusive lowerbound", "exclusive upperbound", support things like "opIn" or an opCall to test a value for membership).

Ah, I see what you're getting at now. I think this idea has a merit on its own; I'm just not sure if it is useful as an actual intermediate data *type*.

But, putting that aside, I think the concept does serve its purpose. It's a pity that the word 'range' already has an assigned meaning in D, because otherwise that would be the best name in this case (i.e., range in the mathematical sense of being a contiguous subset of, say, the number line). So, for the lack of a better name, let's tentatively call it "Bounds" (as in, the set of elements bounded by upper and lower bounds), which may be defined, at least conceptually, as:

	struct Bounds(string shape="[]", T,U)
		if (is(T.init < U.init))
	{
		T lower;
		U upper;

		this(T lowerBound, U upperBound)
		{
			lower = lowerBound;
			upper = upperBound;
		}

		bool opBinaryRight(string op)(V val)
			if (op == "in" &&
				is(T.init < V.init) &&
				is(V.init < U.init))
		{
			static if (shape[0] == '[')
				static if (shape[1] == ']')
					return lower <= val <= upper;
				else static if (shape[1] == ')')
					return lower <= val < upper;
				else static assert(0);
			else static if (shape[0] == '(')
				static if (shape[1] == ']')
					return lower < val <= upper;
				else static if (shape[1] == ')')
					return lower < val < upper;
				else static assert(0);
			else static assert(0);
		}
	}

	// And we might as well throw in a convenience function for
	// constructing these things:
	auto bounds(string shape="[]", T,U)(T lower, U upper)
	{
		return Bounds!(shape,T,U)(lower, upper);
	}

	// So here's how you'd use it:
	unittest
	{
		int x = 123;
		assert(x in bounds(122, 124));

		// Mixed types are possible
		ulong y = ulong.max;
		float z = -z.max;
		assert(100 in bounds(y, z));
	}

	// If we want maximum efficiency, and the bounds are statically
	// known, we could also introduce a compile-time variant of
	// Bounds:
	struct CtBounds(T lower, U upper, string shape="[]", T, U)
		if (is(t < u))
	{
		// This struct cannot be instantiated at runtime.
		@disabled this();

		bool opBinaryRight(string op, V)(V val) if (op == "in")
		{
			... // same as above, except that now .lower and
			    // .upper are known at compile-time.
		}
	}

	unittest
	{
		// Now we can check bounds at compile time.
		static assert(1 in CtBounds!(0.0f, 2u));
	}

Of course, we can add opApply to these structs, then you'd be able to write:

	foreach (x; bounds(0, 100)) { ... }


> It'd also be needed for it to have a simple way to get the smallest acceptable type for the range of values the "between" object could represent. So a for a Between!(uint, int) that would be a uint, and a Between!(int, uint) that would be a long, and so on. Obviously some things _don't_ have acceptable types, such as a Between!(long, ulong) (no integral type currently can actually hold all of those values).

There's nothing wrong with Bounds!(long,ulong); it just won't have an opApply method, that's all. :) It can be conveniently static-if'd out in that case. It can still represent number ranges beyond the current range of built-in types, like [long.min, ulong.max], and you can test for membership with various types. This allows you to test variables of different types, like ints and uints, so the ability to represent such a range is still useful.


> Something like this, like I showed, could be used to pass to other functions like std.random.uniform which request a range to generate. Or you should be able to pass it to something like std.algorithm.find, std.algorithm.count, etc (predicates that take one parameter).

While you *could* implement the input range API for the Bounds struct for this purpose, it's probably better to define special overloads for find and count, since you really don't want to waste time iterating over elements instead of just directly computing the narrowed Bounds struct or subtracting the lower bound from the upper, respectively. For example:

	auto find(T, U, V)(Bounds!(T,U) bounds, V val)
	{
		if (val in bounds)
			return Bounds!(T,U)(val, bounds.upper);
		else
			return Bounds!(T,U)(0, -1); // empty bounds
	}

	size_t count(T, U)(Bounds!(T,U) bounds)
	{
		return bounds.upper - bounds.lower;
	}

I.e., O(1) complexity instead of O(upper-lower).


T

-- 
Question authority. Don't ask why, just do it.
December 17, 2013
On Tue, Dec 17, 2013 at 01:57:53PM -0800, Andrei Alexandrescu wrote:
> On 12/17/13 1:29 PM, Chris Cain wrote:
> >It doesn't necessarily have to be called "between" but some sort of object (being able to contain the state is important) could contain the concept of a range of values ("inclusive lowerbound", "exclusive upperbound", support things like "opIn" or an opCall to test a value for membership).
> 
> Interval arithmetic (http://en.wikipedia.org/wiki/Interval_arithmetic) comes to mind. Like bounds-checked numbers, units, and probabilities, it's one of those bread-and-butter types that nobody got around to implement for Phobos yet.
> 
> In an interval arithmetic approach numbers would compare actually equal, i.e. 10 == IntervalInt(5, 100) would be true.
[...]

Ah! "Interval" is the word I was looking for. :)

Ideally, Intervals should implement opApply, 'in', .find, .count, etc., and perhaps even interval intersection (but not union since that's not closed).

I don't like the idea of "IntervalInt"; the base type should be a template parameter:

	struct Interval(T) { ... }

In my previous post I considered allowing non-homogenous upper/lower bound types, but in interest of keeping things less messy, it may be better to just allow only a single type (then you wouldn't need kooky static-if's around things like opApply, etc.).

Intervals with discrete base types can probably implement the range APIs for maximum usability with std.algorithm, etc..


T

-- 
Let's eat some disquits while we format the biskettes.
December 18, 2013
On Tuesday, 17 December 2013 at 22:14:41 UTC, H. S. Teoh wrote:
> Ah, I see what you're getting at now. I think this idea has a merit on
> its own; I'm just not sure if it is useful as an actual intermediate
> data *type*.

The use over a function would be

1. Contain all of the complexity that working with intervals on (in this case) integers. It's been shown enough times that the straight-forward way of dealing with this is error-prone.
2. Maintain performance characteristics as much as possible. Without an object, a function doing this sort of thing would have to revalidate the bounds each time or, worse, NOT validate the bounds at all (with in contracts, we can validate each time because release code will take the contracts out, but it's still potentially an issue). With an object we can cache any type of validations and/or assertions needed and, potentially, improve performance in some cases.
3. Allow for existing functions to specialize when an interval is given, when appropriate.

> But, putting that aside, I think the concept does serve its purpose.
> It's a pity that the word 'range' already has an assigned meaning in D,
> because otherwise that would be the best name in this case (i.e., range
> in the mathematical sense of being a contiguous subset of, say, the
> number line). So, for the lack of a better name, let's tentatively call
> it "Bounds" (as in, the set of elements bounded by upper and lower
> bounds), which may be defined, at least conceptually, as:

Just to step up your idea to something a bit closer to complete (still have not thrown it into a compiler or anything yet):
http://www.dpaste.dzfl.pl/19c80ff9

(And I really like the idea of a CtInterval, but haven't done anything with it so I've excluded it in the above paste)

>> It'd also be needed for it to have a simple way to get the smallest
>> acceptable type for the range of values the "between" object could
>> represent. So a for a Between!(uint, int) that would be a uint, and a
>> Between!(int, uint) that would be a long, and so on. Obviously some
>> things _don't_ have acceptable types, such as a Between!(long, ulong)
>> (no integral type currently can actually hold all of those values).
>
> There's nothing wrong with Bounds!(long,ulong); it just won't have an
> opApply method, that's all. :) It can be conveniently static-if'd out in
> that case. It can still represent number ranges beyond the current range
> of built-in types, like [long.min, ulong.max], and you can test for
> membership with various types. This allows you to test variables of
> different types, like ints and uints, so the ability to represent such a
> range is still useful.

Well, I'm not suggesting that the interval not be allowed... but for things that use that interval, they may produce some sort of output. If they're using the interval to output, then they'll need to know what data type the output needs to be. It'd be convenient if some standard function existed to accomplish that task in a standard way.

The example I'm using for this is if std.random.uniform took in an interval, what would its output be? Obviously, it couldn't output something in [long.min, ulong.max], but it's possible it could spit out an answer in [byte.min, ubyte.max] since a short could contain all of those values.

>> Something like this, like I showed, could be used to pass to other
>> functions like std.random.uniform which request a range to generate.
>> Or you should be able to pass it to something like std.algorithm.find,
>> std.algorithm.count, etc (predicates that take one parameter).
>
> While you *could* implement the input range API for the Bounds struct
> for this purpose, it's probably better to define special overloads for
> find and count, since you really don't want to waste time iterating over
> elements instead of just directly computing the narrowed Bounds struct
> or subtracting the lower bound from the upper, respectively. For
> example:

Sorry, confusion based on using the word "range" again. When I said range, I meant bounds/interval in this case. Functions that request some sort of interval or bounds should use interval instead of trying to do anything on its own (since the "do your own thing" is increasingly being found to be errorprone).

So, something like this should work:

    unittest
    {
        import std.algorithm;
        assert(
            find!"a in b"([5, 6, 2, 9], interval(1, 4))
                == [2, 9]);
        // uses std.algorithm.find

        assert(
            count!"a in b"([5, 6, 1, 3, 9, 7, 2], interval(1,3))
                == 3);
        // uses std.algorithm.count

        import std.random;
        foreach(_; 0..10000)
            assert(uniform(interval(1,5)) in interval(1,5));
        // Nice assertion, right?
    }

It might also be useful in some circumstances to be able to know how many values are in the interval (sort of like a "length" or "size") but if you have an interval of [long.min, ulong.max] ... well, you know the problem.

Considering what Andrei said, we might could expand this concept to support the interval arithmetic. We'd also need to be able to support intervals like (-oo, oo), (-oo, x], [x, oo) ... where the membership test returns true, <=x, and >=x respectively (while taking care of the issues that exist with signed/unsigned comparisons, obviously). That said, not all functions will want to handle those types of intervals (std.random.uniform, for instance).
December 18, 2013
On Tuesday, 17 December 2013 at 22:23:47 UTC, H. S. Teoh wrote:
> In my previous post I considered allowing non-homogenous upper/lower
> bound types, but in interest of keeping things less messy, it may be
> better to just allow only a single type (then you wouldn't need kooky
> static-if's around things like opApply, etc.).

I disagree. The fact that it is so messy is precisely why it needs to handle it. Doing the messy/hard things in the standard library means that users don't have to do it. It's even more important since it's so error prone and the "obvious" way to do it is strictly wrong.