October 21, 2013
On Monday, 21 October 2013 at 09:40:13 UTC, Regan Heath wrote:
> I disagree.  Exceptions should never be used for flow control so the rule is to throw on exceptional occurrences ONLY not on something that you will ALWAYS eventually happen.

For such function it is exceptional situation. For precise reading different API is required anyway (==different function).
October 21, 2013
On Mon, Oct 21, 2013 at 11:53:44AM +0100, Regan Heath wrote:
> On Fri, 18 Oct 2013 18:38:12 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
[...]
> >Conceptually speaking, an array is a sequence of values of non-negative length. An array with non-zero length contains at least one element, and is therefore non-empty, whereas an array with zero length is empty. Same thing goes with a slice. A slice is a view into zero or more array elements. A slice with zero length is empty, and a slice with non-zero length contains at least one element.
> 
> This describes the empty/not empty distinction.
> 
> >There's nowhere in this conceptual scheme for such a thing as a "null array" that's distinct from an empty array.
> 
> And this is the problem/complaint.  You cannot represent specified/not specified, you can only represent empty/not empty.
> 
> I agree you cannot logically have an existing array that is somehow a "null array" and distinct/different from an empty array, but that's not what I want/am asking for.  I want to use an array 'reference' to represent that the array is non existent, has not been set, has not been defined, etc.  This is what null is for.

The thing is, D slices are value types even though the elements they point to are pointed to by reference. If you treat slices (slices themselves, that is, not the elements they refer to) as value types, then the problem goes away. If you want to have a *reference* to a slice, then you simply write T[]* and then it becomes nullable as expected.

I do agree that the current situation is confusing, though, mainly because you can write `if (arr is null)`, which then makes you think of it as a reference type. I think that should be prohibited, and slices should be treated as pure value types, and all comparisons should be checked with .length (or .empty if you import std.range).


T

-- 
Кто везде - тот нигде.
October 21, 2013
On Mon, Oct 21, 2013 at 10:40:14AM +0100, Regan Heath wrote:
> On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public@dicebot.lv> wrote:
> 
> >On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu wrote:
> >>That's bad API design, pure and simple. The function should e.g. return the string including the line terminator, and only return an empty (or null) string upon EOF.
> >
> >I'd say it should throw upon EOF as it is pretty high-level convenience function.
> 
> I disagree.  Exceptions should never be used for flow control so the rule is to throw on exceptional occurrences ONLY not on something that you will ALWAYS eventually happen.
[...]

	while (!file.eof) {
		auto line = file.readln(); // never throws
		...
	}


T

-- 
There are two ways to write error-free programs; only the third one works.
October 21, 2013
On Mon, 21 Oct 2013 15:01:04 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Mon, Oct 21, 2013 at 11:53:44AM +0100, Regan Heath wrote:
>> On Fri, 18 Oct 2013 18:38:12 +0100, H. S. Teoh
>> <hsteoh@quickfur.ath.cx> wrote:
> [...]
>> >Conceptually speaking, an array is a sequence of values of
>> >non-negative length. An array with non-zero length contains at least
>> >one element, and is therefore non-empty, whereas an array with zero
>> >length is empty. Same thing goes with a slice. A slice is a view into
>> >zero or more array elements. A slice with zero length is empty, and a
>> >slice with non-zero length contains at least one element.
>>
>> This describes the empty/not empty distinction.
>>
>> >There's nowhere in this conceptual scheme for such a thing as a "null
>> >array" that's distinct from an empty array.
>>
>> And this is the problem/complaint.  You cannot represent specified/not
>> specified, you can only represent empty/not empty.
>>
>> I agree you cannot logically have an existing array that is somehow a
>> "null array" and distinct/different from an empty array, but that's
>> not what I want/am asking for.  I want to use an array 'reference' to
>> represent that the array is non existent, has not been set, has not
>> been defined, etc.  This is what null is for.
>
> The thing is, D slices are value types even though the elements they
> point to are pointed to by reference. If you treat slices (slices
> themselves, that is, not the elements they refer to) as value types,
> then the problem goes away. If you want to have a *reference* to a
> slice, then you simply write T[]* and then it becomes nullable as
> expected.

True, and that's a pointer, and I am comfortable using pointers.. however I worry this will limit the compilers ability to optimise somehow.. and doesn't it make the code immediately un"safe"?

> I do agree that the current situation is confusing, though, mainly
> because you can write `if (arr is null)`, which then makes you think of
> it as a reference type. I think that should be prohibited, and slices
> should be treated as pure value types, and all comparisons should be
> checked with .length (or .empty if you import std.range).

IMO, this would be preferable to the current situation even thought I would rather go the other way and have a reference type.  I can see the argument that it would be safer and easier for most users, even though I do not believe I am in that category.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Mon, 21 Oct 2013 12:54:56 +0100, Jonathan M Davis <jmdavisProg@gmx.com> wrote:

> On Monday, October 21, 2013 11:58:07 Regan Heath wrote:
>> If what you say is true then slices would and could never be null... If
>> that were the case I would stop complaining and simply "box" them with
>> Nullable when I wanted a reference type.  But, D's strings/slices are some
>> kind of mutant half reference half value type, and that's the underlying
>> problem here.
>
> Yeah, dynamic arrays in D are just plain weird. They're halfway between
> reference types and value types, and it definitely causes confusion, and it
> totally screws with null (which definitely sucks). But they mostly work really
> well the way that they are, and in general, the way that slices work works
> really well. So, I don't know if what we have is ultimately the right design
> or not. I definitely don't like how null works for arrays though.
>
> Given how they work, we probably would have been better off if they couldn't be
> null. The ptr obviously could be null, but the array itself arguably shouldn't
> be able to be null. If we did that, then it would be clear that null wouldn't
> work with arrays, and no one would try. It would still kind of suck, since you
> wouldn't have null, but then at least it would be clear that null wouldn't
> work with arrays instead of having a situation where it kind of does and kind
> of doesn't.

Agreed.  This is preferable to the current situation, even if it's not my personal preferred solution.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Mon, 21 Oct 2013 15:02:35 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Mon, Oct 21, 2013 at 10:40:14AM +0100, Regan Heath wrote:
>> On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public@dicebot.lv> wrote:
>>
>> >On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu wrote:
>> >>That's bad API design, pure and simple. The function should e.g.
>> >>return the string including the line terminator, and only return
>> >>an empty (or null) string upon EOF.
>> >
>> >I'd say it should throw upon EOF as it is pretty high-level
>> >convenience function.
>>
>> I disagree.  Exceptions should never be used for flow control so the
>> rule is to throw on exceptional occurrences ONLY not on something
>> that you will ALWAYS eventually happen.
> [...]
>
> 	while (!file.eof) {
> 		auto line = file.readln(); // never throws
> 		...
> 	}

For a file this is implementable (without a buffer) but not for a socket or similar source/stream where a read MUST be performed to detect EOF.  So, if you're implementing a line reader over multiple sources, you would need to buffer.  Not the end of the world, but definitely more complicated than just returning a null, no?

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 21, 2013
On Mon, Oct 21, 2013 at 04:41:23PM +0100, Regan Heath wrote:
> On Mon, 21 Oct 2013 15:01:04 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> 
> >On Mon, Oct 21, 2013 at 11:53:44AM +0100, Regan Heath wrote:
[...]
> >>I agree you cannot logically have an existing array that is somehow a "null array" and distinct/different from an empty array, but that's not what I want/am asking for.  I want to use an array 'reference' to represent that the array is non existent, has not been set, has not been defined, etc.  This is what null is for.
> >
> >The thing is, D slices are value types even though the elements they point to are pointed to by reference. If you treat slices (slices themselves, that is, not the elements they refer to) as value types, then the problem goes away. If you want to have a *reference* to a slice, then you simply write T[]* and then it becomes nullable as expected.
> 
> True, and that's a pointer, and I am comfortable using pointers.. however I worry this will limit the compilers ability to optimise somehow.. and doesn't it make the code immediately un"safe"?

No, pointers are allowed in @safe. What is not allowed is pointer *arithmetic* and casting pointers into pointers of different types.


> >I do agree that the current situation is confusing, though, mainly because you can write `if (arr is null)`, which then makes you think of it as a reference type. I think that should be prohibited, and slices should be treated as pure value types, and all comparisons should be checked with .length (or .empty if you import std.range).
> 
> IMO, this would be preferable to the current situation even thought I would rather go the other way and have a reference type.  I can see the argument that it would be safer and easier for most users, even though I do not believe I am in that category.
[...]

Well, either way would work, though I do prefer treating slices as value types. It's just cleaner conceptually, IMO. But I suppose this is one of those things in which reasonable people may disagree.


T

-- 
Sometimes the best solution to morale problems is just to fire all of the unhappy people. -- despair.com
October 21, 2013
On Mon, Oct 21, 2013 at 04:47:05PM +0100, Regan Heath wrote:
> On Mon, 21 Oct 2013 15:02:35 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:
> 
> >On Mon, Oct 21, 2013 at 10:40:14AM +0100, Regan Heath wrote:
> >>On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public@dicebot.lv> wrote:
> >>
> >>>On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu wrote:
> >>>>That's bad API design, pure and simple. The function should e.g. return the string including the line terminator, and only return an empty (or null) string upon EOF.
> >>>
> >>>I'd say it should throw upon EOF as it is pretty high-level convenience function.
> >>
> >>I disagree.  Exceptions should never be used for flow control so the rule is to throw on exceptional occurrences ONLY not on something that you will ALWAYS eventually happen.
> >[...]
> >
> >	while (!file.eof) {
> >		auto line = file.readln(); // never throws
> >		...
> >	}
> 
> For a file this is implementable (without a buffer) but not for a socket or similar source/stream where a read MUST be performed to detect EOF.  So, if you're implementing a line reader over multiple sources, you would need to buffer.  Not the end of the world, but definitely more complicated than just returning a null, no?
[...]

This is actually a very interesting issue to me, and one which I've thought about a lot in the past. There are two incompatible (albeit with much overlap) approaches here. One is the Unix approach where EOF is unknown until you try to read past the end of a file (socket, etc.), and the other is where EOF is known *before* you perform a read.

Personally, I prefer the second approach as being conceptually cleaner: an input stream should "know" when it doesn't have any more data, so that its EOF state can be queried at any time. Conceptually speaking one shouldn't need to (try to) read from it before realizing there's nothing left.

However, I understand that the Unix approach is easier to implement, in the sense that if you have a network socket, it may be the case that when you attempt to read from it, it is still connected, but before any further data is received, the remote end disconnects. In this case, the OS can't reasonably predict when there will be more incoming data, so you do have to read the socket before finding out that the remote end is going to disconnect without sending anything more.

In terms of API design, though, I still lean towards the approach where EOF is always query-able, because it leads to cleaner code. This can be implemented on Posix by having .eof read a single byte (or whatever unit is expected) and buffering it, and the subsequent readln() takes this buffering into account. This slight complication in implementation is worth achieving the nicer user-facing API, IMO.


T

-- 
I've been around long enough to have seen an endless parade of magic new techniques du jour, most of which purport to remove the necessity of thought about your programming problem.  In the end they wind up contributing one or two pieces to the collective wisdom, and fade away in the rearview mirror. -- Walter Bright
October 22, 2013
On Mon, 21 Oct 2013 17:34:51 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Mon, Oct 21, 2013 at 04:41:23PM +0100, Regan Heath wrote:
>> On Mon, 21 Oct 2013 15:01:04 +0100, H. S. Teoh
>> <hsteoh@quickfur.ath.cx> wrote:
>>
>> >On Mon, Oct 21, 2013 at 11:53:44AM +0100, Regan Heath wrote:
> [...]
>> >>I agree you cannot logically have an existing array that is somehow
>> >>a "null array" and distinct/different from an empty array, but
>> >>that's not what I want/am asking for.  I want to use an array
>> >>'reference' to represent that the array is non existent, has not
>> >>been set, has not been defined, etc.  This is what null is for.
>> >
>> >The thing is, D slices are value types even though the elements they
>> >point to are pointed to by reference. If you treat slices (slices
>> >themselves, that is, not the elements they refer to) as value types,
>> >then the problem goes away. If you want to have a *reference* to a
>> >slice, then you simply write T[]* and then it becomes nullable as
>> >expected.
>>
>> True, and that's a pointer, and I am comfortable using pointers..
>> however I worry this will limit the compilers ability to optimise
>> somehow.. and doesn't it make the code immediately un"safe"?
>
> No, pointers are allowed in @safe. What is not allowed is pointer
> *arithmetic* and casting pointers into pointers of different types.

Ah, thanks.

>> >I do agree that the current situation is confusing, though, mainly
>> >because you can write `if (arr is null)`, which then makes you think
>> >of it as a reference type. I think that should be prohibited, and
>> >slices should be treated as pure value types, and all comparisons
>> >should be checked with .length (or .empty if you import std.range).
>>
>> IMO, this would be preferable to the current situation even thought I
>> would rather go the other way and have a reference type.  I can see
>> the argument that it would be safer and easier for most users, even
>> though I do not believe I am in that category.
> [...]
>
> Well, either way would work, though I do prefer treating slices as value
> types. It's just cleaner conceptually, IMO. But I suppose this is one of
> those things in which reasonable people may disagree.

I agree that conceptually if you slice something, you cannot get a 'null' reference.  So, a null state for slices makes no sense.  However, most people see arrays as slices, slices as arrays - do you?  If so, for arrays the same conceptual argument does not apply.  If not, how do we tell we have a slice, or an array?  If we can't tell, then we have to check for null with both anyway..

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/
October 22, 2013
On Mon, 21 Oct 2013 17:49:43 +0100, H. S. Teoh <hsteoh@quickfur.ath.cx> wrote:

> On Mon, Oct 21, 2013 at 04:47:05PM +0100, Regan Heath wrote:
>> On Mon, 21 Oct 2013 15:02:35 +0100, H. S. Teoh
>> <hsteoh@quickfur.ath.cx> wrote:
>>
>> >On Mon, Oct 21, 2013 at 10:40:14AM +0100, Regan Heath wrote:
>> >>On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public@dicebot.lv> wrote:
>> >>
>> >>>On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu  
>> wrote:
>> >>>>That's bad API design, pure and simple. The function should e.g.
>> >>>>return the string including the line terminator, and only return
>> >>>>an empty (or null) string upon EOF.
>> >>>
>> >>>I'd say it should throw upon EOF as it is pretty high-level
>> >>>convenience function.
>> >>
>> >>I disagree.  Exceptions should never be used for flow control so the
>> >>rule is to throw on exceptional occurrences ONLY not on something
>> >>that you will ALWAYS eventually happen.
>> >[...]
>> >
>> >	while (!file.eof) {
>> >		auto line = file.readln(); // never throws
>> >		...
>> >	}
>>
>> For a file this is implementable (without a buffer) but not for a
>> socket or similar source/stream where a read MUST be performed to
>> detect EOF.  So, if you're implementing a line reader over multiple
>> sources, you would need to buffer.  Not the end of the world, but
>> definitely more complicated than just returning a null, no?
> [...]
>
> This is actually a very interesting issue to me, and one which I've
> thought about a lot in the past. There are two incompatible (albeit with
> much overlap) approaches here. One is the Unix approach where EOF is
> unknown until you try to read past the end of a file (socket, etc.), and
> the other is where EOF is known *before* you perform a read.
>
> Personally, I prefer the second approach as being conceptually cleaner:
> an input stream should "know" when it doesn't have any more data, so
> that its EOF state can be queried at any time. Conceptually speaking one
> shouldn't need to (try to) read from it before realizing there's nothing
> left.
>
> However, I understand that the Unix approach is easier to implement, in
> the sense that if you have a network socket, it may be the case that
> when you attempt to read from it, it is still connected, but before any
> further data is received, the remote end disconnects. In this case, the
> OS can't reasonably predict when there will be more incoming data, so
> you do have to read the socket before finding out that the remote end
> is going to disconnect without sending anything more.
>
> In terms of API design, though, I still lean towards the approach where
> EOF is always query-able, because it leads to cleaner code. This can be
> implemented on Posix by having .eof read a single byte (or whatever unit
> is expected) and buffering it, and the subsequent readln() takes this
> buffering into account. This slight complication in implementation is
> worth achieving the nicer user-facing API, IMO.

I don't agree the user-facing API is nicer.  It is more complex both in concept and implementation.

API #1: 1 function, readline(), returns null on EOF.  You call readline() and check the result for null.  The check, naturally follows the attempt to read, which is the task you are trying to accomplish.  Simple, straight forward.

API #2: 2 functions, readline() throws on EOF, isEof() checks for EOF.  Your purpose is to read lines, so you call readline(), it is naturally easy to forget to call isEof().  Coding the example loop above requires you think about EOF /before/ you read a line, this is not how people think.  This API is therefore more complex, and less intuitive for no gain.

So, having a usable null state allows the simpler, more direct API.  Lack of it requires a more complicated design and a more complicated implementation.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/