Jump to page: 1 2
Thread overview
Suggestion: dynamic array operations
Sep 06, 2006
Nikita Kalaganov
Sep 06, 2006
Gregor Richards
Sep 06, 2006
Steve Horne
Sep 06, 2006
Gregor Richards
Sep 06, 2006
Gregor Richards
Sep 06, 2006
Sean Kelly
Sep 07, 2006
Steve Horne
Sep 07, 2006
nobody
Sep 08, 2006
Steve Horne
Sep 08, 2006
nobody
Sep 08, 2006
Steve Horne
Sep 07, 2006
Sean Kelly
Sep 06, 2006
Steve Horne
September 06, 2006
1. Pop from array on front/back

Syntax:
	ARRAY1 = ARRAY2 <<< INTEGER
	ARRAY1 <<<= INTEGER

	ARRAY1 = ARRAY2 >>> INTEGER
	ARRAY1 >>>= INTEGER

Example:
	int[] a;
	// let a = [1,2,3,4,5]
	int[] b;

	b = a <<< 4; // b = [1,2,3,4] a = [5]
	b = a >>> 4; // b = [2,3,4,5] a = [1]

	a <<<= 3; // a = [1,2,3]
	a >>>= 1; // a = [3]

2. Array rotation

Syntax:
	ARRAY1 = ARRAY2 ral INTEGER
	ARRAY1 ral= INTEGER

	ARRAY1 = ARRAY2 rar INTEGER
	ARRAY1 rar= INTEGER

Example:
	int[] a;
	// let a = [1,2,3,4,5]

	b = a ral 2; // b = [3,4,5,1,2]
	b = a rar 2; // b = [4,5,1,2,3]

	a ral= 2; // a = [3,4,5,1,2]
	a rar= 4; // a = [4,5,1,2,3]

3. Shift array left/right

Syntax:
	ARRAY1 = ARRAY2 << {ARRAY3 | INTEGER}
	ARRAY1 <<= {ARRAY2 | INTEGER}

	ARRAY1 = ARRAY2 >> {ARRAY3 | INTEGER}
	ARRAY1 >>= {ARRAY2 | INTEGER}

Example ('di' means default initializer):
	int[] a;
	// let a = [1,2,3,4,5]
	int[] b;
	// let b = [6,7];
	int[] c;

	c = a << 3;	// c = [4,5,di,di,di]
	c = a << b;	// c = [3,4,5,6,7]

	c = a >> 3;	// c = [di,di,di,1,2]
	c = a >> b;	// c = [6,7,1,2,3]

	a <<= 2;	// a = [3,4,5,di,di]
	a <<= b;	// a = [5,di,di,6,7]
	a >>= 2;	// a = [di,di,5,di,0]
	a >>= b;	// a = [6,7,di,di,5]

------------

Together with concatenation, these operations give us in-built standard containers like queue and deque (goodbye templates!). Also, they ease implementation of various algorithms with matrix operations.

P.S. Hmm, maybe it's a good idea to add rotation operations for _unsigned_ integers ?

Syntax & example:
	uint a;

	a = 0x000ff000;
	a rol= 4; // a = 0x00ff0000
	a ror= 20; // a = 0xf000000f
September 06, 2006
Nikita Kalaganov wrote:
> 
> 1. Pop from array on front/back
> 
> Syntax:
>     ARRAY1 = ARRAY2 <<< INTEGER
>     ARRAY1 <<<= INTEGER
> 
>     ARRAY1 = ARRAY2 >>> INTEGER
>     ARRAY1 >>>= INTEGER
> 
> Example:
>     int[] a;
>     // let a = [1,2,3,4,5]
>     int[] b;
> 
>     b = a <<< 4; // b = [1,2,3,4] a = [5]
>     b = a >>> 4; // b = [2,3,4,5] a = [1]
> 
>     a <<<= 3; // a = [1,2,3]
>     a >>>= 1; // a = [3]
> 
> 2. Array rotation
> 
> Syntax:
>     ARRAY1 = ARRAY2 ral INTEGER
>     ARRAY1 ral= INTEGER
> 
>     ARRAY1 = ARRAY2 rar INTEGER
>     ARRAY1 rar= INTEGER
> 
> Example:
>     int[] a;
>     // let a = [1,2,3,4,5]
> 
>     b = a ral 2; // b = [3,4,5,1,2]
>     b = a rar 2; // b = [4,5,1,2,3]
> 
>     a ral= 2; // a = [3,4,5,1,2]
>     a rar= 4; // a = [4,5,1,2,3]
> 
> 3. Shift array left/right
> 
> Syntax:
>     ARRAY1 = ARRAY2 << {ARRAY3 | INTEGER}
>     ARRAY1 <<= {ARRAY2 | INTEGER}
> 
>     ARRAY1 = ARRAY2 >> {ARRAY3 | INTEGER}
>     ARRAY1 >>= {ARRAY2 | INTEGER}
> 
> Example ('di' means default initializer):
>     int[] a;
>     // let a = [1,2,3,4,5]
>     int[] b;
>     // let b = [6,7];
>     int[] c;
> 
>     c = a << 3;    // c = [4,5,di,di,di]
>     c = a << b;    // c = [3,4,5,6,7]
> 
>     c = a >> 3;    // c = [di,di,di,1,2]
>     c = a >> b;    // c = [6,7,1,2,3]
> 
>     a <<= 2;    // a = [3,4,5,di,di]
>     a <<= b;    // a = [5,di,di,6,7]
>     a >>= 2;    // a = [di,di,5,di,0]
>     a >>= b;    // a = [6,7,di,di,5]
> 
> ------------
> 
> Together with concatenation, these operations give us in-built standard containers like queue and deque (goodbye templates!). Also, they ease implementation of various algorithms with matrix operations.
> 
> P.S. Hmm, maybe it's a good idea to add rotation operations for _unsigned_ integers ?
> 
> Syntax & example:
>     uint a;
> 
>     a = 0x000ff000;
>     a rol= 4; // a = 0x00ff0000
>     a ror= 20; // a = 0xf000000f

Arrays are arrays, not lists.  You can abuse them into stacks, queues and the ilk, but it's an inefficient use of memory.  There's a reason that stacks, queues and such are not usually implemented with arrays, in the same way that constant strings are not generally implemented as linear linked lists of single characters.

 - Gregor Richards
September 06, 2006
Nikita Kalaganov wrote:
> 
> 1. Pop from array on front/back
> 
> Syntax:
>     ARRAY1 = ARRAY2 <<< INTEGER
>     ARRAY1 <<<= INTEGER
> 
>     ARRAY1 = ARRAY2 >>> INTEGER
>     ARRAY1 >>>= INTEGER
> 
> Example:
>     int[] a;
>     // let a = [1,2,3,4,5]
>     int[] b;
> 
>     b = a <<< 4; // b = [1,2,3,4] a = [5]
>     b = a >>> 4; // b = [2,3,4,5] a = [1]
> 
>     a <<<= 3; // a = [1,2,3]
>     a >>>= 1; // a = [3]

T[] pull (T) (T[] src, size_t len) {
  T[] result ;

  len    = len >= src.length ? src.length : len + 1;
  result = src[0 .. len];
  src    = src[len .. src.length];

  return result;
}

T[] rpull (T) (T[] src, size_t len) {
  T[] result ;

  len    = len >= src.length ? src.length : len + 1;
  result = src[src.length - len .. src.length];
  src    = src[0 .. len];

  return result;
}

int[] a ;
int[] b ;
a = array!(int)(1, 2, 3, 4, 5);

b = a.pull(4); // a == [5], b == [1, 2, 3, 4]

b = a.rpull(4); // a == [1], b == [2, 3, 4, 5]


This is precisely why I say we need an array utils module in Phobos.  I have one in progress in my Cashew project, but it is incomplete and could use updating now that D's import system has been improved.

-- Chris Nicholson-Sauls
September 06, 2006
On Wed, 06 Sep 2006 20:04:29 +0400, "Nikita Kalaganov" <sapfirnik@rambler.ru> wrote:

>
>1. Pop from array on front/back
>
>Syntax:
...
>	ARRAY1 = ARRAY2 >>> INTEGER
>	ARRAY1 >>>= INTEGER


I think additional built-in methods for array types would be the best solution for your problems.

'Push' and 'pop' are as clear as it gets. I don't find the C++ STL push_back and push_front very intuitive - which end is the front? - but, since I mentally lay out my subscripts the same as my cartesian co-ordinates, push_left and pop_left work for me. And rotate methods for arrays strike me as a very rare requirement, but what the hell ;-)


As for your operators, for one thing, the ones above already exist. D has both signed and unsigned bitshift right operators. Don't ask me why - I don't see the problem with shifting consistently with the datatype, as C and C++ do.

Mind you, operators can be overloaded. I may be in a minority around here, given that D has chosen a different route for I/O, but personally I like the way that C++ overloads the << operator as 'stream insertion'. And I have no problem with the idea of extending that so a 'stream' is any sequence of items. On that basis, why not...

  array1 << value-to-push;
  array2 >> var-to-pop-into;

Well, there is that little problem I have. While I like C++ stream insertion, stream extraction is something else. It reads wrong.

Also, this goes against the flow of what D already defines. Pushing a new item can already be done with...

  array1 ~= value-to-push;

D already defines all those wierd relative operators that handle NAN for floats, so maybe there's room for taking '!' as 'opposite-of' rather than just 'not'...

  array1 !~= var-to-pop;

But a major issue I have with that is that the item that is assigned is the right hand argument, not the usual left hand argument that is traditionally assigned by all C/C++/D/etc assignment operators.

Which goes back to the problem I have with stream extraction. If it was written as...

  var-to-pop-into << array1;

or...

  var-to-pop-into <<= array1;

I would have less of a problem. And, as with multi-assignment expressions, I would have less of a problem with...

  var1 << var2 << var3 << array1;

It reads better, to my eyes. It suggests items flowing out of the array until the required number have flowed out, into the specified slots. Beads sliding down string or something like that. The C++ stream extraction form...

  array1 >> var1 >> var2 >> var3;

Seems to suggest the same flow-like system, but the implied order is the opposite of what actually happens.

But using the same '<<' or '<<=' or whatever for both means that it is down to overload resolution to decide whether to insert or extract. If the left argument is a 'stream', insert. If the right argument is a 'stream', extract. But if a 'stream' is just taken as referring to any sequential data, what about arrays of arrays? What does the following mean?

  varname << stringvar;

Is it inserting a string into an array of strings? Is it extracting a character from a string?  For that matter, is it concatenating two strings?

Overloading can resolve it. If you are inserting or extracting, the types are different. Only concatenation has both arguments the same type. But it isn't just the compiler that has to resolve the overloading.

Basically, you have an operator with three clearly different meanings (not including the bitshifting). Operator overloading was invented to allow operators with a consistent meaning to work with new types - for example, supporting '+' on new numeric types (perhaps decimal floats, rationals) that the language doesn't have built in. It just seems a bit too cryptic. Though you could claim that the meaning of '<<' was something to do with flowing or shifting in general.

Also, I don't think deque operations and rotations are done often enough to justify an operator. For example, those matrix algorithms of yours should be written once and packaged in a library. Methods aren't that much of a hardship.

But then, I already think the D relative operators are a bit nuts. You'd have to be using floating point and caring about NAN a lot to get used to them. Short of that, any gains from clutter clearing are going to be lost referring back to the manuals. But there are obviously people around who see it differently.

-- 
Remove 'wants' and 'nospam' from e-mail.
September 06, 2006
On Wed, 06 Sep 2006 10:51:32 -0700, Gregor Richards <Richards@codu.org> wrote:

>Arrays are arrays, not lists.  You can abuse them into stacks, queues and the ilk, but it's an inefficient use of memory.

Sorry for pedanting, but it's an inefficient use of CPU cycles, not memory. Arrays are more memory efficient than linked lists - no per-item pointers, and only a one-off malloc overhead.

Even for time, it depends on how big the stack/deque will get. For small sizes arrays are a good choice. Realloc problems are minimised, time to shift the items up/down is trivial, and there are cache benefits to keeping the items in adjacent memory locations.

For large deques, what you really want is a list of small arrays. A trade-off between the costs and benefits of both lists and arrays. IIRC this is pretty much what std::deque does in C++. But even then, pushing and popping on the small arrays is needed to implement the full deque logic.

And of course, that large deque class could overload those operators to allow the same syntax.

I just think methods are a better choice than operators for this.

-- 
Remove 'wants' and 'nospam' from e-mail.
September 06, 2006
Steve Horne wrote:
> On Wed, 06 Sep 2006 10:51:32 -0700, Gregor Richards
> <Richards@codu.org> wrote:
> 
>> Arrays are arrays, not lists.  You can abuse them into stacks, queues and the ilk, but it's an inefficient use of memory.
> 
> Sorry for pedanting, but it's an inefficient use of CPU cycles, not
> memory. Arrays are more memory efficient than linked lists - no
> per-item pointers, and only a one-off malloc overhead.
> 
> Even for time, it depends on how big the stack/deque will get. For
> small sizes arrays are a good choice. Realloc problems are minimised,
> time to shift the items up/down is trivial, and there are cache
> benefits to keeping the items in adjacent memory locations.
> 
> For large deques, what you really want is a list of small arrays. A
> trade-off between the costs and benefits of both lists and arrays.
> IIRC this is pretty much what std::deque does in C++. But even then,
> pushing and popping on the small arrays is needed to implement the
> full deque logic.
> 
> And of course, that large deque class could overload those operators
> to allow the same syntax.
> 
> I just think methods are a better choice than operators for this.
> 

To pop an element off the front of an array requires either losing that chunk of memory or moving the entire rest of the array back such that that memory is still used.  Pushing an element onto either side of an array involves expanding the memory used by that array, which often involves duplication.  Using an array as a stack or queue is a waste of MEMORY, not CPU cycles.

 - Gregor Richards
September 06, 2006
Gregor Richards wrote:
> Steve Horne wrote:
>> On Wed, 06 Sep 2006 10:51:32 -0700, Gregor Richards
>> <Richards@codu.org> wrote:
>>
>>> Arrays are arrays, not lists.  You can abuse them into stacks, queues and the ilk, but it's an inefficient use of memory.
>>
>> Sorry for pedanting, but it's an inefficient use of CPU cycles, not
>> memory. Arrays are more memory efficient than linked lists - no
>> per-item pointers, and only a one-off malloc overhead.
>>
>> Even for time, it depends on how big the stack/deque will get. For
>> small sizes arrays are a good choice. Realloc problems are minimised,
>> time to shift the items up/down is trivial, and there are cache
>> benefits to keeping the items in adjacent memory locations.
>>
>> For large deques, what you really want is a list of small arrays. A
>> trade-off between the costs and benefits of both lists and arrays.
>> IIRC this is pretty much what std::deque does in C++. But even then,
>> pushing and popping on the small arrays is needed to implement the
>> full deque logic.
>>
>> And of course, that large deque class could overload those operators
>> to allow the same syntax.
>>
>> I just think methods are a better choice than operators for this.
>>
> 
> To pop an element off the front of an array requires either losing that chunk of memory or moving the entire rest of the array back such that that memory is still used.  Pushing an element onto either side of an array involves expanding the memory used by that array, which often involves duplication.  Using an array as a stack or queue is a waste of MEMORY, not CPU cycles.
> 
>  - Gregor Richards

Erm, totally misread everything.  Wowza.

Yeah, that'd be CPU cycles :P

 - Gregor Richards
September 06, 2006
Gregor Richards wrote:
> Gregor Richards wrote:
>> Steve Horne wrote:
>>> On Wed, 06 Sep 2006 10:51:32 -0700, Gregor Richards
>>> <Richards@codu.org> wrote:
>>>
>>>> Arrays are arrays, not lists.  You can abuse them into stacks, queues and the ilk, but it's an inefficient use of memory.
>>>
>>> Sorry for pedanting, but it's an inefficient use of CPU cycles, not
>>> memory. Arrays are more memory efficient than linked lists - no
>>> per-item pointers, and only a one-off malloc overhead.
>>>
>>> Even for time, it depends on how big the stack/deque will get. For
>>> small sizes arrays are a good choice. Realloc problems are minimised,
>>> time to shift the items up/down is trivial, and there are cache
>>> benefits to keeping the items in adjacent memory locations.
>>>
>>> For large deques, what you really want is a list of small arrays. A
>>> trade-off between the costs and benefits of both lists and arrays.
>>> IIRC this is pretty much what std::deque does in C++. But even then,
>>> pushing and popping on the small arrays is needed to implement the
>>> full deque logic.
>>>
>>> And of course, that large deque class could overload those operators
>>> to allow the same syntax.
>>>
>>> I just think methods are a better choice than operators for this.
>>>
>>
>> To pop an element off the front of an array requires either losing that chunk of memory or moving the entire rest of the array back such that that memory is still used.  Pushing an element onto either side of an array involves expanding the memory used by that array, which often involves duplication.  Using an array as a stack or queue is a waste of MEMORY, not CPU cycles.
>>
>>  - Gregor Richards
> 
> Erm, totally misread everything.  Wowza.
> 
> Yeah, that'd be CPU cycles :P

Not necessarily.  Appending to an array that doubles its size on reallocations is an amortized O(1) operation.  And since arrays display far better locality than linked-lists, the potentially far superior cache performance can result in a noticeable performance increase over linked-list implementations of some data structures.  Stacks are relatively trivial to implement in arrays (push/pop from the back), and even queues can be made fairly efficient (if there is a soft upper bound on their size) by using a circular array.  Heaps and binary trees do fairly well in arrays also, since insertions and deletions do not require moving data.  That isn't to say that arrays are always the best choice for implementing such data structures, but I think it's a much grayer area than you're suggesting.  Particularly when real-world issues such as cache misses come into play.


Sean
September 07, 2006
On Wed, 06 Sep 2006 15:43:42 -0700, Sean Kelly <sean@f4.ca> wrote:

>Gregor Richards wrote:
>> Erm, totally misread everything.  Wowza.

No problem.

To be honest, I regretted posting. I have a habit of being too abrupt and causing offense. Plus I have a lot of time on my hands ATM, and getting bored and writing reams on silly stuff can easily end up looking like criticism. Sorry if thats what happened.

>Not necessarily.  Appending to an array that doubles its size on reallocations is an amortized O(1) operation.

That reminds me of an article in Dr. Dobbs, back when I subscribed. Got a few good ideas from that. It might be time to get their latest CD again, if I can afford it ;-)

Array size doubling can sound like a terrible waste of memory, or at least it did to me when I first read about it. But then again, efficiency with data structures is all about trade-offs.

I have a thing about multiway trees (B trees, B+ trees, etc). In memory, not just on disk. Compared with binary trees, they have better locality and cause less heap fragmentation for the same reasons that arrays do, though not to the same extreme. But if you like multiway trees, it is absurd to complain about the memory waste when doing array size doubling. Both guarantee that the memory allocated is at least 50% used, barring very very small datasets, but multiway trees have per-node overheads. And (for B+-style trees) branch nodes that don't hold data items at all.

Data structures are kind of a horses for courses thing. These days, you can get away with a lot while using whatevers to hand. Bung a few thousand items in any container and you won't have much to worry about. Custom allocators can fix a lot of problems even when they do show up. Even so, those choices always have costs as well as benefits, as the dataset gets bigger, and those costs can sometimes be a big problem.

I was surprised to see hash tables built into D. I had that one in my 'scripting language' stereotype. But it's probably a good thing.

But I still think there should be alternative containers in the library (assuming they're not there anyway and I've just missed them). I know the term 'perfect hash', but the term is precisely that - not 'perfect container'! There are things that hashtables can't do (or at least, can't do very efficiently for large datasets) like in-order iterating or picking out an ordered subrange.

-- 
Remove 'wants' and 'nospam' from e-mail.
September 07, 2006
Steve Horne wrote:
> On Wed, 06 Sep 2006 15:43:42 -0700, Sean Kelly <sean@f4.ca> wrote:
> 
>> Gregor Richards wrote:
>>> Erm, totally misread everything.  Wowza.
> 
> No problem.
> 
> To be honest, I regretted posting. I have a habit of being too abrupt
> and causing offense. Plus I have a lot of time on my hands ATM, and
> getting bored and writing reams on silly stuff can easily end up
> looking like criticism. Sorry if thats what happened.

...

> But I still think there should be alternative containers in the
> library (assuming they're not there anyway and I've just missed them).
> I know the term 'perfect hash', but the term is precisely that - not
> 'perfect container'! There are things that hashtables can't do (or at
> least, can't do very efficiently for large datasets) like in-order
> iterating or picking out an ordered subrange.

So what sort of silly stuff would you write about possible benefits derived from having a list data structure built into the language?
« First   ‹ Prev
1 2