View mode: basic / threaded / horizontal-split · Log in · Help
May 15, 2010
Re: complement to $
KennyTM~:
> auto a = new OrderedDict!(int, string);
> a[-3] = "negative three";
> a[-1] = "negative one";
> a[0] = "zero";
> a[3] = "three";
> a[4] = "four";
> assert(a[0] == "zero");
> return a[0..4]; // which slice should it return?

D slicing syntax and indexing isn't able to represent what you can in Python, where you can store the last index in a variable:

last_index = -1
a = ['a', 'b', 'c', 'd']
assert a[last_index] == 'd'

In D you represent the last index as $-1, but you can't store that in a variable.
If you introduce a symbol like ^ to represent the start, you can't store it in a variable.

Another example are range bounds, you can omit them, or exactly the same, they can be None:

a = ['a', 'b', 'c', 'd']
>>> assert a[None : 2] == ['a', 'b']
>>> assert a[ : 2] == ['a', 'b']
>>> assert a[0 : 2] == ['a', 'b']
>>> idx = None
>>> assert a[idx : 2] == ['a', 'b']
>>> assert a[2 : None] == ['c', 'd']
>>> assert a[2 : ] == ['c', 'd']
>>> assert a[2 : len(a)] == ['c', 'd']

You can store a None in a Python variable, so you can use it to represent the empty start or end of a slice. 
But currently D indexes are a size_t, so they can't represent a null.

Bye,
bearophile
May 15, 2010
Re: complement to $
bearophile wrote:
> KennyTM~:
>> auto a = new OrderedDict!(int, string);
>> a[-3] = "negative three";
>> a[-1] = "negative one";
>> a[0] = "zero";
>> a[3] = "three";
>> a[4] = "four";
>> assert(a[0] == "zero");
>> return a[0..4]; // which slice should it return?
> 
> D slicing syntax and indexing isn't able to represent what you can in Python, where you can store the last index in a variable:
> 
> last_index = -1
> a = ['a', 'b', 'c', 'd']
> assert a[last_index] == 'd'
> 
> In D you represent the last index as $-1, but you can't store that in a variable.

Sure you can:

    last_index = a.length - 1;
May 15, 2010
Re: complement to $
Walter Bright:

> Sure you can:
>      last_index = a.length - 1;

Probably my post was nearly useless, because it doesn't help much the development of D, so you can ignore most of it.
The only part of it that can be meaningful is that it seems Python designers have though that having a way to specify the _generic_ idea of start of a slice can be useful (with a syntax like a[:end] or a[None:end]) so in theory an equivalent syntax (like a[^..end]) can be added to D, but I have no idea if this is so commonly useful in D programs. If you notice in this thread I have not said that I like or dislike the 'complement to $' feature.

Regarding your specific answer here, storing a.length-1 in the index is not able to represent the idea of "last item". Another example to show you better what I meant:

last_index = -1
a = ['a', 'b', 'c']
b = [1.5, 2.5, 3.5, 4.5]
assert a[last_index] == 'c'
assert b[last_index] == '4.5'

I am not asking for this in D, I don't think there are simple ways to add this, and I don't need this often in Python too. I am just saying that the semantics of negative indexes in Python is a superset of the D one.

Bye,
bearophile
May 15, 2010
Re: complement to $
On Sat, 15 May 2010 23:46:12 +0200, Walter Bright  
<newshound1@digitalmars.com> wrote:

> bearophile wrote:
>> KennyTM~:
>>> auto a = new OrderedDict!(int, string);
>>> a[-3] = "negative three";
>>> a[-1] = "negative one";
>>> a[0] = "zero";
>>> a[3] = "three";
>>> a[4] = "four";
>>> assert(a[0] == "zero");
>>> return a[0..4]; // which slice should it return?
>>  D slicing syntax and indexing isn't able to represent what you can in  
>> Python, where you can store the last index in a variable:
>>  last_index = -1
>> a = ['a', 'b', 'c', 'd']
>> assert a[last_index] == 'd'
>>  In D you represent the last index as $-1, but you can't store that in  
>> a variable.
>
> Sure you can:
>
>      last_index = a.length - 1;

Ah, but if you then change the length of a, last_index is no longer
correct. Now, if we had a special index type...

enum slice_base {
  START,
  END
}

struct index {
  ptrdiff_t pos;
  slice_base base;

  // Operator overloads here, returns typeof( this ) if +/-
  // integral, ptrdiff_t if subtracted from
  // typeof( this ).
}

immutable $ = index( 0, slice_base.END );
immutable ^ = index( 0, slice_base.START );

auto last_index = $ - 1;
auto third_index = ^ + 2;

These would then stay valid no  matter what you
did to the container.

-- 
Simen
May 16, 2010
Re: complement to $
Steven Schveighoffer wrote:
> Currently, D supports the special symbol $ to mean the end of a 
> container/range.
> 
> However, there is no analogous symbol to mean "beginning of a 
> container/range".  For arrays, there is none necessary, 0 is always the 
> first element.  But not all containers are arrays.
> 
> I'm running into a dilemma for dcollections, I have found a way to make 
> all containers support fast slicing (basically by imposing some 
> limitations), and I would like to support *both* beginning and end symbols.
> 
> Currently, you can slice something in dcollections via:
> 
> coll[coll.begin..coll.end];
> 
> I could replace that end with $, but what can I replace coll.begin 
> with?  0 doesn't make sense for things like linked lists, maps, sets, 
> basically anything that's not an array.
> 
> One thing that's nice about opDollar is I can make it return coll.end, 
> so I control the type.  With 0, I have no choice, I must take a uint, 
> which means I have to check to make sure it's always zero, and throw an 
> exception otherwise.
> 
> Would it make sense to have an equivalent symbol for the beginning of a 
> container/range?
> 
> In regex, ^ matches beginning of the line, $ matches end of the line -- 
> would there be any parsing ambiguity there?  I know ^ is a binary op, 
> and $ means nothing anywhere else, so the two are not exactly 
> equivalent.  I'm not very experienced on parsing ambiguities, but things 
> like ~ can be unambiguous as binary and unary ops, so maybe it is possible.
> 
> So how does this look:  coll[^..$];
> 
> Thoughts? other ideas?
> 
> -Steve

If we were to have something like this (and I'm quite unconvinced that 
it is desirable), I'd suggest something beginning with $, eg $begin.
But, it seems to me that the slicing syntax assumes that the slicing 
index can be mapped to the natural numbers. I think in cases where 
that's not true, slicing syntax just shouldn't be used.
May 16, 2010
Re: complement to $
I am not sure that is necessary to have a symbol for begining ($
represents the length, not the end, right?).

Anyway, instead of $begin and $end, I would rather have: $$ and $ (or
vice-versa).

Thoughts?
May 17, 2010
Re: complement to $
bearophile Wrote:

> Another example are range bounds, you can omit them, or exactly the same, they can be None:
> 
> a = ['a', 'b', 'c', 'd']
> >>> assert a[None : 2] == ['a', 'b']
> >>> assert a[ : 2] == ['a', 'b']
> >>> assert a[0 : 2] == ['a', 'b']
> >>> idx = None
> >>> assert a[idx : 2] == ['a', 'b']
> >>> assert a[2 : None] == ['c', 'd']
> >>> assert a[2 : ] == ['c', 'd']
> >>> assert a[2 : len(a)] == ['c', 'd']

half of a slice: a[None+len(a) : None]
May 17, 2010
Re: complement to $
Kagamin:
> half of a slice: a[None+len(a) : None]

Python is strictly typed, so you can't sum None with an int.

Bye,
bearophile
May 17, 2010
Re: complement to $
On Sat, 15 May 2010 01:07:50 -0400, Walter Bright  
<newshound1@digitalmars.com> wrote:

> Steven Schveighoffer wrote:
>> On Fri, 14 May 2010 13:33:57 -0400, Walter Bright  
>> <newshound1@digitalmars.com> wrote:
>>
>>> Steven Schveighoffer wrote:
>>>> So how does this look:  coll[^..$];
>>>
>>> nooooooooo <g>
>>  Do you have specific objections, or does it just look horrendous to  
>> you :)  Would another symbol be acceptable?
>
>
> The problem is D already has a lot of syntax. More syntax just makes the  
> language more burdensome after a certain point, even if in isolation  
> it's a good idea.
>

In a lot of cases, this is somewhat true.  On the other hand though,  
shortcut syntaxes like this are not as bad.  What I mean by shortcut is  
that 1) its a shortcut for an existing syntax (e.g. $ is short for  
coll.length), and 2) it doesn't affect or improves readability.

A good example of shortcut syntax is the recent inout changes.  At first,  
the objection was "we already have too mcuh const", but when you look at  
the result, it is *less* const because you don't have to worry about the  
three cases, only one.

The burden for such shortcuts is usually on readers of such code, not  
writers.  But a small lesson from the docs is all that is needed.  Any new  
developer will already be looking up $ when they encounter it, if you put  
^ right there with it, it's not so bad.  Once you understand the meanings,  
it reads just as smoothly  (and I'd say even smoother) as the alternative  
syntax.

I'll also say that I'm not in love with ^, it's just a suggestion.  I'd  
not be upset if something else were to be used.  But 0 cannot be it.

> One particular problem regex has is that few can remember its syntax  
> unless they use it every day.

I don't use it every day, in fact, I almost always have to look up syntax  
if I want to get fancy.

But I always remember several things:

1. [^abc] means none of these
2. . means any character
3. * means 0 or more of the previous characters and + means 1 or more of  
the previous characters
4. ^ and $ mean beginning and end of line.  I usually have to look up  
which one means which :)

point 4 may suggest a special error message if someone does coll[^-1] or  
coll[$..^]

>>>> Thoughts? other ideas?
>>>
>>> I'd just go with accepting the literal 0. Let's see how far that goes  
>>> first.
>>  I thought of a counter case:
>>  auto tm = new TreeMap!(int, uint);
>> tm[-1] = 5;
>> tm[1] = 6;
>>  What does tm[0..$] mean?  What about tm[0]?  If it is analogous to  
>> "beginning of collection" then it doesn't make any sense for a  
>> container with a key of numeric type.
>>  Actually any map type where the indexes don't *always* start at zero  
>> are a problem.
>
> I'd question the design of a map type that has the start at something  
> other than 0.

Then I guess you question the AA design?  Or STL's std::map?  Or Java's  
TreeMap and HashMap?  Or dcollections' map types?

I don't think you meant this.  The whole *point* of a map is to have  
arbitrary indexes, requiring them to start at 0 would defeat the whole  
purpose.

-Steve
May 17, 2010
Re: complement to $
On Sun, 16 May 2010 02:24:55 -0400, Don <nospam@nospam.com> wrote:

> Steven Schveighoffer wrote:
>> Currently, D supports the special symbol $ to mean the end of a  
>> container/range.
>>  However, there is no analogous symbol to mean "beginning of a  
>> container/range".  For arrays, there is none necessary, 0 is always the  
>> first element.  But not all containers are arrays.
>>  I'm running into a dilemma for dcollections, I have found a way to  
>> make all containers support fast slicing (basically by imposing some  
>> limitations), and I would like to support *both* beginning and end  
>> symbols.
>>  Currently, you can slice something in dcollections via:
>>  coll[coll.begin..coll.end];
>>  I could replace that end with $, but what can I replace coll.begin  
>> with?  0 doesn't make sense for things like linked lists, maps, sets,  
>> basically anything that's not an array.
>>  One thing that's nice about opDollar is I can make it return coll.end,  
>> so I control the type.  With 0, I have no choice, I must take a uint,  
>> which means I have to check to make sure it's always zero, and throw an  
>> exception otherwise.
>>  Would it make sense to have an equivalent symbol for the beginning of  
>> a container/range?
>>  In regex, ^ matches beginning of the line, $ matches end of the line  
>> -- would there be any parsing ambiguity there?  I know ^ is a binary  
>> op, and $ means nothing anywhere else, so the two are not exactly  
>> equivalent.  I'm not very experienced on parsing ambiguities, but  
>> things like ~ can be unambiguous as binary and unary ops, so maybe it  
>> is possible.
>>  So how does this look:  coll[^..$];
>>  Thoughts? other ideas?
>>  -Steve
>
> If we were to have something like this (and I'm quite unconvinced that  
> it is desirable), I'd suggest something beginning with $, eg $begin.

This would be better than nothing.

> But, it seems to me that the slicing syntax assumes that the slicing  
> index can be mapped to the natural numbers. I think in cases where  
> that's not true, slicing syntax just shouldn't be used.

slicing implies order, that is for sure.  But mapping to natural numbers  
may be too strict.  I look at slicing in a different way.  Hopefully you  
can follow my train of thought.

dcollections, as a D2 lib, should support ranges, I think that makes the  
most sense.  All containers in dcollections are classes, so they can't  
also be ranges (my belief is that a reference-type based range is too  
awkward to be useful).  The basic operation to get a range from a  
container is to get all the elements as a range (a struct with the range  
interface).

So what if I want a subrange?  Well, I can pick off the ends of the range  
until I get the right elements as the end points.  But if it's possible,  
why not allow slicing as a better means of doing this?  However, slicing  
should be a fast operation.  Slicing quickly isn't always feasible, for  
example, LinkList must walk through the list until you find the right  
element, so that's an O(n) operation.  So my thought was to allow slicing,  
but with the index being a cursor (i.e. pointer) to the elements you want  
to be the end points.

Well, if we are to follow array convention, and want to try not to enforce  
memory safety, we should verify those end points make sense, we don't want  
to return an invalid slice.  In some cases, verifying the end points are  
in the correct order is slow, O(n) again.  But, you always have reasonably  
quick access to the first and last elements of a container, and you *know*  
their order relative to any other element in the container.

So in dcollections, I support slicing on all collections based on two  
cursors, and in all collections, if you make the first cursor the  
beginning cursor, or the second cursor the end cursor, it will work.  In  
some cases, I support slicing on arbitrary cursors, where I can quickly  
determine validity of the cursors.  The only two cases which allow this  
are the ArrayList, which is array based, and the Tree classes (TreeMap,  
TreeSet, TreeMultiset), where determining validity is at most a O(lgN)  
operation.

Essentially, I see slicing as a way to create a subrange of a container,  
where the order of the two end points can be quickly verified.

auto dict = new TreeMap!(string, string); // TreeMap is sorted

...

auto firstHalf = dict["A".."M"];

(You say that slicing using anything besides natural numbers shouldn't be  
used.  You don't see any value in the above?)

But "A" may not be the first element, there could be strings that are less  
than it (for example, strings that start with _), such is the way with  
arbitrary maps.  So a better way to get the first half may be:

auto firstHalf = dict[dict.begin.."M"];

What does the second half look like?

auto secondHalf = dict["M"..dict.end];

Well, if we are to follow array convention, the second half can be  
shortcutted like this:

auto secondHalf = dict["M"..$];

Which looks and reads rather nicely.  But there is no equivalent "begin"  
shortcut because $ was invented for arrays, which always have a way to  
access the first element -- 0.  Arbitrary maps have no such index.  So  
although it's not necessary, a shortcut for begin would also be nice.

Anyways, that's what led me to propose we have some kind of short cut.  If  
nothing else, at least I hope you now see where I'm coming from, and  
hopefully you can see that slicing is useful in cases other than natural  
number indexes.

-Steve
1 2 3 4 5
Top | Discussion index | About this forum | D home