September 11, 2008
Andrei Alexandrescu wrote:
> Sean Kelly wrote:
>> This would be easy to fix by making arrays / slices fatter (by adding a capacity field, for example), but I'm still not convinced that's the right thing to do.  However, it may well be preferable to eliminating appending completely.  The obvious alternative would be either to resurrect head const (not gonna happen) or to make append always reallocation (not at all ideal).
> 
> I couldn't imagine it put any better. Maybe time has come for starting to look into a good solution for this problem. The way things are now, ~= muddles the clean territory that slices cover.
> 
> Consider we define "unowned" arrays are arrays as allocated by new T[n]. They are "unowned" because no entity controls them except the garbage collector, which by definition recycles them when it's sure you couldn't tell.
> 
> An "owned" array would be something like a scope variable or an up-and-coming BlockArray!(T) with a destructor.
> 
> Slices are beautiful for iterating owned and unowned arrays just as well. You can have the slice refer to any range of any array no problem. Calling a ~ b creates a new, unowned array containing their concatenation. Assigning a = new T[n]; binds a to a fresh unowned array. And so on.
> 
> And all of a sudden we step on a kaka in this beautiful garden. Under very special, undetectable circumstances, a range becomes Hitler, annexes an adjacent range, and starts walking all over it. Sigh.

I'm very glad you share my thoughts on ~=. The current D T[] is a tool that has been stuffed with too many concepts.

Arrays and slices are two fundamentally different concepts that T[] actually manage to capture impressively well, but unfortunately not fully. And it is the last bit that makes the puzzle complicated.

One of the biggest differences between an array and a slice lies in the ownership of the data. And as far as I see it, arrays are conceptually better implemented as reference types, while slices are a natural value type.

So by removing ~= from T[], T[] becomes a pure slice type.

This is all the old T[new] discussion once again, but with the gained insight that instead of T[new] one could just as well use a pure library type.

-- 
Oskar
September 11, 2008
Oskar Linde (and Andrei Alexandrescu):
> So by removing ~= from T[], T[] becomes a pure slice type.

Appending to the built-in dynamic arrays is a fundamental operation (I use it hundred of times in my code) so if the purpose is just to avoid problems when extending slices, a different solution can be invented.
For example adding the third (capacity) field to the dyn array struct, the last bit of the capacity field can be used to tell apart slices from true whole arrays. So at runtime the code knows how to extend/append the array/slice correctly. This slows down the appending itself a little, but it's better than having to use an ugly ArrayBuilder everywhere.

Bye,
bearophile
September 11, 2008
bearophile wrote:
> Oskar Linde (and Andrei Alexandrescu):
>> So by removing ~= from T[], T[] becomes a pure slice type.
> 
> Appending to the built-in dynamic arrays is a fundamental operation (I use it hundred of times in my code) so if the purpose is just to avoid problems when extending slices, a different solution can be invented.
> For example adding the third (capacity) field to the dyn array struct, the last bit of the capacity field can be used to tell apart slices from true whole arrays. So at runtime the code knows how to extend/append the array/slice correctly. This slows down the appending itself a little, but it's better than having to use an ugly ArrayBuilder everywhere.

I'd think that adding a capacity field should actually speed up append operations, since the GC wouldn't have to be queried to determine this info.  And as in another thread, the capacity of all slices should either be zero or the size of the slice, thus forcing a realloc for any append op.


Sean
September 11, 2008
Sean Kelly:
> I'd think that adding a capacity field should actually speed up append operations, since the GC wouldn't have to be queried to determine this info.

Yes, but I meant slower than just adding the capacity field without adding such extra bit flag to tell apart slices from arrays.


> And as in another thread, the capacity of all slices should either be zero or the size of the slice, thus forcing a realloc for any append op.

Oh, right, no need to a separate bit for tagging then, is the value capacity=0 that's the tag. Do D designers like this (small) change in the language? :-)

Bye,
bearophile
September 11, 2008
Sean Kelly wrote:
> bearophile wrote:
>> Oskar Linde (and Andrei Alexandrescu):
>>> So by removing ~= from T[], T[] becomes a pure slice type.
>>
>> Appending to the built-in dynamic arrays is a fundamental operation (I use it hundred of times in my code) so if the purpose is just to avoid problems when extending slices, a different solution can be invented.

I agree that it is a fundamental operation, and my code contains hundreds of uses too. But the number of uses are actually fewer than I thought. One project of mine has only 157 ~= out of a total of 18000 lines of code, and the cases are by their nature quite easily identified. Arbitrary code doesn't usually append to arbitrary slices.

>> For example adding the third (capacity) field to the dyn array struct, the last bit of the capacity field can be used to tell apart slices from true whole arrays. So at runtime the code knows how to extend/append the array/slice correctly. This slows down the appending itself a little, but it's better than having to use an ugly ArrayBuilder everywhere.
> 
> I'd think that adding a capacity field should actually speed up append operations, since the GC wouldn't have to be queried to determine this info.  And as in another thread, the capacity of all slices should either be zero or the size of the slice, thus forcing a realloc for any append op.
> 
> 
> Sean

capacity = the size of of the slice won't work, since then you could transform a slice into a resizable array by mistake:

s = a[5..7];
// s.capacity = 2
t = s;
s.length = s.length - 1;
s ~= x;

so that basically means that capacity has to be = 0 for slices, and != 0 for resizable arrays.

Without considering whether arrays would gain from having the capacity readily accessible, the advantage from this would be to have a run-time way to separate the slice from the array at the cost of 50 % increased storage. But even though this information would only be accessible at run-time, it is fully deducible at compile time. So you lose all compile time gains from separating the two concepts.

-- 
Oskar
September 11, 2008
bearophile <bearophileHUGS@lycos.com> wrote:
> Sean Kelly:
> > I'd think that adding a capacity field should actually speed up append operations, since the GC wouldn't have to be queried to determine this info.
> 
> Yes, but I meant slower than just adding the capacity field without adding such extra bit flag to tell apart slices from arrays.
> 
> 
> > And as in another thread, the capacity of all slices should either be zero or the size of the slice, thus forcing a realloc for any append op.
> 
> Oh, right, no need to a separate bit for tagging then, is the value capacity=0 that's the tag. Do D designers like this (small) change in the language? :-)

It just doesn't work.  Arrays are structs passed by value.  If you pass a capacity-array into a function, function appends, then you append to your old copy, and you overwrite what the function appended.  If you force slice-on-copy semantics, then arrays become elusive, tending to implicitly turn to slices whenever you toss them around and then reallocate on append.

When I was reading D specs on slicing and appending for the first time I've got a strong feeling of hackery.  Slice is a view of another slice but you cannot count on that.  Slice can be appended to but you can erase parts of other slices in the process.  I've had my share of guessing when I was writing a sort of refactoring tool, when I checked carefully whether a slice would survive and dupped much more than I wish I had.

I'd personally love to have something as simple and natural to use as current built-in arrays but with better defined semantics.  I don't like Andrei's Array!() thingy but it seems better than anything proposed before.
September 12, 2008
Reply to Walter,

> BCS wrote:
> 
>> I was referring to the implementation as visible from the called
>> code's side
>> 
> opApply isn't going to go away, it will still be there as an
> alternative.
> 

Rock on!

Sorry I took so long to reply, I had to lobotimize my PC


September 12, 2008
On Fri, Sep 12, 2008 at 1:24 AM, Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> "Bill Baxter" wrote
>>> Thus one can implement iterators on top of ranges, but I'd argue that
>>> ranges
>>> are much easier to implement on top of iterators.
>>
>> Ranges are safer and easier to work with in most cases so it's worth it, or so the argument goes.  You don't buy it?
>
> I can define an iterator and it doesn't mean that it makes ranges any less safe.  Just give me the choice, if I think iterators are a better fit, I might choose iterators.  But having to shoehorn ranges into an iterator form so that I do not wince at the ugliness of my code seems like unnecessary baggage.

I think one thing to consider is what it will take to make a new container support and "play nice" with the regime proposed.  This touches on Andrei's point about being hard pressed to think of generic algorithms to run on an HMM, too.

The first question is who do you want to "play nice" with?  If you're going to be writing functions specifically for that container, then you don't really have to play nice with anyone.  Your container just needs to have the operations necessary to support those functions.

The question of "playing nice" with everyone is not an issue at all until you start wanting to have one algorithm that works for lots of different containers to that can do kinda similar sorts of things.

And that's exactly what std.algorithm is for.  Supporting those kinds of operations that apply equally to a lot of different kinds of containers.

So if you agree that ranges are good enough for std.algorithm, then you should agree that a generic iterator concept is not really necessary, since the places left where you really need an iterator are those places where a generic algorithm are not really useful.  If they were generic algorithms then they would be in std.algorithm.

The next thing that's worth thinking about, is how much do you have to work to play nice with std.algorithm?  The easier it is to implement that interface expected by std.algorthm the better.

So for one thing that means that you really want the std.algorithm concepts to nest, for one to build on the next.  That way if you implement the most generic level, then you've automatically implemented all the more restricted levels.  This leads us to want to have a names that make sense all the way up the hierarchy.  Like isEmpty().  It pretty much makes sense no matter which direction or how many degrees of freedom you have.  That's better than something like atEnd() for that reason.  It is unbiased.   You wouldn't want to have to provide an "atEnd" to work with forward ranges, and then an "isEmpty" to work with random access even though they mean the same thing.

So the levels of iterators and naming of their parts should nest as much as possible.  Which seems to be pretty much the case with the current proposal.  I think .value will be a better name for "the current thing" than .left.  (Using the operator * may be better still.)  But other than that, the direction the naming is taking here on the NG seems good.  I say .value in part because if users like you implement their own iterator types, then .value is a reasonable name for getting the thing referred to.  So you could then write a function that takes a range or your iterator and uses the distinguished value referred to.  In that sense * would be even better because it would let you pass in a pointer too.


So in the end, really what I'm saying is that I think you are right.
Iterators are useful sometimes and it would be nice to design ranges
in such a way that the range terminology makes sense for iterators
too.  An iterator would probably support the .next property just like
a range, for instance.  That's a good name that will work with either.
  Maybe it's worth codifying what the iterator concepts should be even
if std.algorithm won't use them.

> But I want to be able to
> construct ranges from pointers.

If iterators are up to you then you will be able to do this.  But std.algorithm will only care about the ranges you construct, not the iterators.

> I want to save pointers.  I want to use
> pointers to refer to elements in a collection.  I want to use pointers to
> move one-at-a-time along a node-based container.  I don't want to 'emulate'
> pointers using ranges.  I don't want the library to resist me doing what I
> find natural.

I don't think it will as long as you provide those my-iterator-to-range functions.

> Anyways, I'm going to leave the discussion, I think I've said all I can about my views.  I'm not really good at explaining things anyways.  But I will update dcollections with what I think is the best compromise.  Then I might have a better understanding of how ranges fit into a collection package.  The good news is I don't have to worry about the language not providing iterators, everything is going to be library based, so we can easily try out both ways and see which is easier to use.

I think your comments have made a valuable addition to the conversation, and have at least helped me get my thoughts together. So thanks!  I'll be interested to see how the work on your lib turns out.

--bb
September 12, 2008
"Bill Baxter" wrote
> I think one thing to consider is what it will take to make a new container support and "play nice" with the regime proposed.  This touches on Andrei's point about being hard pressed to think of generic algorithms to run on an HMM, too.
>
> The first question is who do you want to "play nice" with?  If you're going to be writing functions specifically for that container, then you don't really have to play nice with anyone.  Your container just needs to have the operations necessary to support those functions.

Bill, thanks so much for explaining it like this, I really agree with what you say.  My concern is that iterator is going to become a 'bad word' and considered a flawed design.

But you are right, there is no need for iterators to be allowed for std.algorithm, I totally agree with that, I just assumed Andrei meant iterators would be discouraged for everything, including general use as pointers into container objects.  If that is not the case, then I wholeheartedly agree that algorithms should be restricted to ranges, and iterators should be used only in container operations.

Cheers!

-Steve


September 12, 2008
Steven Schveighoffer <schveiguy@yahoo.com> wrote:
> "Bill Baxter" wrote
> > I think one thing to consider is what it will take to make a new container support and "play nice" with the regime proposed.  This touches on Andrei's point about being hard pressed to think of generic algorithms to run on an HMM, too.
> >
> > The first question is who do you want to "play nice" with?  If you're going to be writing functions specifically for that container, then you don't really have to play nice with anyone.  Your container just needs to have the operations necessary to support those functions.
> 
> Bill, thanks so much for explaining it like this, I really agree with what you say.  My concern is that iterator is going to become a 'bad word' and considered a flawed design.
> 
> But you are right, there is no need for iterators to be allowed for std.algorithm, I totally agree with that, I just assumed Andrei meant iterators would be discouraged for everything, including general use as pointers into container objects.  If that is not the case, then I wholeheartedly agree that algorithms should be restricted to ranges, and iterators should be used only in container operations.

If you ask me, I think iterators AKA pointers into containers should be discouraged from SafeD.  If you don't care about SafeD you may use whatever you like.  Most library interfaces want to be SafeD to make user's life easier but few care about the library internals as long as they work.