June 01, 2013
On 31/05/13 23:58, Steven Schveighoffer wrote:
> On Fri, 31 May 2013 00:48:47 -0400, Peter Williams
>>
>> That makes programming much easier, doesn't it.  I'll just avoid it by
>> using:
>>
>> a = a ~ b;
>>
>> instead of:
>>
>> a ~= b;
>
> If you care nothing for performance, this certainly is a way to go.
>
>> where I think it might be an issue or is that broken too?
>
> This is a conservative "always reallocate" methodology, it should work
> just like you allocated a new array to hold a and b.

That's what I assumed.  I'm still getting used to the idea that "a <op>= b" isn't just a shorthand for "a = a <op> b".

>
> If a is frequently large, and b is frequently small, you will kill your
> performance vs. a ~= b.

I doubt that I'll be doing it often enough (i.e. only when I think that it's an issue) for it to matter.  The only time I have two variables for the same array is when I pass one to a function as a parameter and if I'm intending to modify it in the function I'll pass it by reference so that there's no gotchas.

I do like the idea that "~=" is generally cheap as it potentially makes building lists easy (is there any need for the singly linked list in D?) and I may modify some of my code.  I've been allocating arrays using "new array[size]" where I know that "size" will be the max needed but that it may be too much (i.e. throwing away duplicates) inserting into the array and then adjusting "length" to whatever I used.  In the case, where it's highly likely that the whole array will fit in a page I might as well allocate an empty array and use "+=".  NB there's only one copy of the array.

Peter


June 03, 2013
On Fri, 31 May 2013 21:04:47 -0400, Peter Williams <pwil3058@bigpond.net.au> wrote:

> On 31/05/13 23:58, Steven Schveighoffer wrote:
>> On Fri, 31 May 2013 00:48:47 -0400, Peter Williams
>>>
>>> That makes programming much easier, doesn't it.  I'll just avoid it by
>>> using:
>>>
>>> a = a ~ b;
>>>
>>> instead of:
>>>
>>> a ~= b;
>> This is a conservative "always reallocate" methodology, it should work
>> just like you allocated a new array to hold a and b.
>
> That's what I assumed.  I'm still getting used to the idea that "a <op>= b" isn't just a shorthand for "a = a <op> b".

It is if you don't care about where it lands :)  I understand that in some cases, it's important or desirable to dictate whether extension-in-place is used or not, but in most cases, you don't care, you just want to change an array to contain more data.

>> If a is frequently large, and b is frequently small, you will kill your
>> performance vs. a ~= b.
>
> I doubt that I'll be doing it often enough (i.e. only when I think that it's an issue) for it to matter.  The only time I have two variables for the same array is when I pass one to a function as a parameter and if I'm intending to modify it in the function I'll pass it by reference so that there's no gotchas.

The only real gotchas come if you append *and then* modify the original data.  If you append *after* modifying the original data, or don't modify the original data, then there is no issue.  If you only ever have one reference to the array data, there is no issue.

There really are very few cases where appending can cause curious behavior.  For the most part, it's undetected.

> I do like the idea that "~=" is generally cheap as it potentially makes building lists easy (is there any need for the singly linked list in D?) and I may modify some of my code.  I've been allocating arrays using "new array[size]" where I know that "size" will be the max needed but that it may be too much (i.e. throwing away duplicates) inserting into the array and then adjusting "length" to whatever I used.  In the case, where it's highly likely that the whole array will fit in a page I might as well allocate an empty array and use "+=".  NB there's only one copy of the array.

You can .reserve the space that you need ahead of time.  Then appending will always deterministically go into the reserved block, and won't reallocate.  This should be relatively quick.  It's not as quick as pre-allocating the entire array and then writing the data directly -- you still need calls into the runtime for appending.

The appending feature of D arrays/slices is intended to be "good enough" for most usages, not horrendously slow, but also not super-optimized for specific purposes.

And yes, we still need linked lists, arrays are good for appending, but not inserting :)

-Steve
June 04, 2013
On 04/06/13 00:57, Steven Schveighoffer wrote:
> On Fri, 31 May 2013 21:04:47 -0400, Peter Williams
> <pwil3058@bigpond.net.au> wrote:
>> I do like the idea that "~=" is generally cheap as it potentially
>> makes building lists easy (is there any need for the singly linked
>> list in D?) and I may modify some of my code.  I've been allocating
>> arrays using "new array[size]" where I know that "size" will be the
>> max needed but that it may be too much (i.e. throwing away duplicates)
>> inserting into the array and then adjusting "length" to whatever I
>> used.  In the case, where it's highly likely that the whole array will
>> fit in a page I might as well allocate an empty array and use "+=".
>> NB there's only one copy of the array.
>
> You can .reserve the space that you need ahead of time.  Then appending
> will always deterministically go into the reserved block, and won't
> reallocate.  This should be relatively quick.  It's not as quick as
> pre-allocating the entire array and then writing the data directly --
> you still need calls into the runtime for appending.

That's great news.  When I tried to implement what I described above it didn't go as well as planned (because I'd misunderstood how much gets allocated) and I was thinking that what's needed is a way to tell the compiler how much to allocate at the start.  And now you tell me there is a way.

This is one of the things I like about learning D.  Almost every time I say to myself "I wish there was an easier way to do this" it turns out that there is :-).  Some of them are easier to discover than others, though.

>
> The appending feature of D arrays/slices is intended to be "good enough"
> for most usages, not horrendously slow, but also not super-optimized for
> specific purposes.
>
> And yes, we still need linked lists, arrays are good for appending, but
> not inserting :)

I use a = a[0..i] ~ v ~ a[i..$] for insertion into a sorted array as I'm willing to pay the cost of allocation for the convenience of array notation.  One advantage is that finding i is O(log(a.length)) instead of O(a.length()).  I also reasoned that the compiler can use memcopy() (or whatever its name is) to do the reallocation and therefore it should be fairly quick.

I also do a = a[0..i] ~ a[i + 1..$] to remove an item but am starting to suspect this isn't as good an idea as for the insert.  Maybe something like:

auto tmp = a[i + 1..$];
a.length = i;
a ~= tmp;

would be more efficient?

Peter


June 04, 2013
On Mon, 03 Jun 2013 20:06:14 -0400, Peter Williams
<pwil3058@bigpond.net.au> wrote:

> On 04/06/13 00:57, Steven Schveighoffer wrote:
>> On Fri, 31 May 2013 21:04:47 -0400, Peter Williams
>> <pwil3058@bigpond.net.au> wrote:
>>> I do like the idea that "~=" is generally cheap as it potentially
>>> makes building lists easy (is there any need for the singly linked
>>> list in D?) and I may modify some of my code.  I've been allocating
>>> arrays using "new array[size]" where I know that "size" will be the
>>> max needed but that it may be too much (i.e. throwing away duplicates)
>>> inserting into the array and then adjusting "length" to whatever I
>>> used.  In the case, where it's highly likely that the whole array will
>>> fit in a page I might as well allocate an empty array and use "+=".
>>> NB there's only one copy of the array.
>>
>> You can .reserve the space that you need ahead of time.  Then appending
>> will always deterministically go into the reserved block, and won't
>> reallocate.  This should be relatively quick.  It's not as quick as
>> pre-allocating the entire array and then writing the data directly --
>> you still need calls into the runtime for appending.
>
> That's great news.  When I tried to implement what I described above it didn't go as well as planned (because I'd misunderstood how much gets allocated) and I was thinking that what's needed is a way to tell the compiler how much to allocate at the start.  And now you tell me there is a way.
>
> This is one of the things I like about learning D.  Almost every time I say to myself "I wish there was an easier way to do this" it turns out that there is :-).  Some of them are easier to discover than others, though.

I added the capacity, reserve, and assumeSafeAppend array methods when I updated the append code, it's been there for a while now, since the beginning of 2010.  It was cited in the article mentioned earlier too.  You might want to re-read that part (it's near the end).

>> The appending feature of D arrays/slices is intended to be "good enough"
>> for most usages, not horrendously slow, but also not super-optimized for
>> specific purposes.
>>
>> And yes, we still need linked lists, arrays are good for appending, but
>> not inserting :)
>
> I use a = a[0..i] ~ v ~ a[i..$] for insertion into a sorted array as I'm willing to pay the cost of allocation for the convenience of array notation.  One advantage is that finding i is O(log(a.length)) instead of O(a.length()).  I also reasoned that the compiler can use memcopy() (or whatever its name is) to do the reallocation and therefore it should be fairly quick.

ugh.  Sorry, the performance miser in me must object :)

There are better ways to do this, using range operations.  I'm pretty sure you could do it with std.algorithm.copy.

> I also do a = a[0..i] ~ a[i + 1..$] to remove an item but am starting to suspect this isn't as good an idea as for the insert.  Maybe something like:
>
> auto tmp = a[i + 1..$];
> a.length = i;
> a ~= tmp;
>
> would be more efficient?

No, because that will also reallocate, just like your original.  This one is REALLY easy to get right, because it can be done in place (assuming a is not const/immutable).

You can even do:

memmove(&a[i], &a[i + 1], a.length - i - 1);
a.length--;

For some reason, D doesn't support overlapping moves, otherwise, this would work:

a[i..$-1] = a[i + 1..$];
a.length--;

I think std.algorithm.remove can do the same thing in one line.

-Steve
June 04, 2013
On 04/06/13 11:56, Steven Schveighoffer wrote:
> On Mon, 03 Jun 2013 20:06:14 -0400, Peter Williams
>> That's great news.  When I tried to implement what I described above
>> it didn't go as well as planned (because I'd misunderstood how much
>> gets allocated) and I was thinking that what's needed is a way to tell
>> the compiler how much to allocate at the start.  And now you tell me
>> there is a way.
>>
>> This is one of the things I like about learning D.  Almost every time
>> I say to myself "I wish there was an easier way to do this" it turns
>> out that there is :-).  Some of them are easier to discover than
>> others, though.
>
> I added the capacity, reserve, and assumeSafeAppend array methods when I
> updated the append code, it's been there for a while now, since the
> beginning of 2010.  It was cited in the article mentioned earlier too.
> You might want to re-read that part (it's near the end).

I'm finding that the mind tends to skip (even potentially useful) bits when reading about computer languages that are similar to ones that I'm familiar with.  I've had the same thing happen with Andrei's book i.e. when I discover something new I say to myself "Why didn't Andrei mention this?" only to discover (on rereading the pertinent section) that he did but it didn't register.  I've decided to reread his book cover to cover.

>
>>> The appending feature of D arrays/slices is intended to be "good enough"
>>> for most usages, not horrendously slow, but also not super-optimized for
>>> specific purposes.
>>>
>>> And yes, we still need linked lists, arrays are good for appending, but
>>> not inserting :)
>>
>> I use a = a[0..i] ~ v ~ a[i..$] for insertion into a sorted array as
>> I'm willing to pay the cost of allocation for the convenience of array
>> notation.  One advantage is that finding i is O(log(a.length)) instead
>> of O(a.length()).  I also reasoned that the compiler can use memcopy()
>> (or whatever its name is) to do the reallocation and therefore it
>> should be fairly quick.
>
> ugh.  Sorry, the performance miser in me must object :)
>
> There are better ways to do this, using range operations.  I'm pretty
> sure you could do it with std.algorithm.copy.
>
>> I also do a = a[0..i] ~ a[i + 1..$] to remove an item but am starting
>> to suspect this isn't as good an idea as for the insert.  Maybe
>> something like:
>>
>> auto tmp = a[i + 1..$];
>> a.length = i;
>> a ~= tmp;
>>
>> would be more efficient?
>
> No, because that will also reallocate,

Wouldn't the a.length = i prevent that?

> just like your original.  This
> one is REALLY easy to get right, because it can be done in place
> (assuming a is not const/immutable).
>
> You can even do:
>
> memmove(&a[i], &a[i + 1], a.length - i - 1);
> a.length--;
>
> For some reason, D doesn't support overlapping moves,

Probably because there's some instances where it would be a disaster and explaining all the cases where you can and can't becomes too difficult so it's just easier to say no to all cases.

> otherwise, this
> would work:
>
> a[i..$-1] = a[i + 1..$];
> a.length--;
>
> I think std.algorithm.remove can do the same thing in one line.

As I said somewhere else, there's a need for a book on Phobos.

Thanks
Peter
June 04, 2013
On Mon, 03 Jun 2013 23:38:12 -0400, Peter Williams <pwil3058@bigpond.net.au> wrote:

> On 04/06/13 11:56, Steven Schveighoffer wrote:
>> On Mon, 03 Jun 2013 20:06:14 -0400, Peter Williams

>>> auto tmp = a[i + 1..$];
>>> a.length = i;
>>> a ~= tmp;
>>>
>>> would be more efficient?
>>
>> No, because that will also reallocate,
>
> Wouldn't the a.length = i prevent that?

No.  The runtime specifically will reallocate on this case.

It does not know that tmp is never going to be used again, nor does it know that there aren't any other references to the data past i.  It MUST reallocate to avoid stomping (in fact, prior to my changes it DID overwrite).

Case in point, if it didn't reallocate, the above would be copying tmp over itself, offset by one!

The runtime expects that when it is appending data, it is doing so into space that is currently unused.  You can use assumeSafeAppend to tell it "the data after this is unused", but then it better be unused :)  In your case, you are using it (via tmp), so that doesn't work.

>> For some reason, D doesn't support overlapping moves,
>
> Probably because there's some instances where it would be a disaster and explaining all the cases where you can and can't becomes too difficult so it's just easier to say no to all cases.

No, memmove is valid C code and supports overlapping writes.  It's not as optimized as memcpy, which does not.

But I don't know exactly what the difference is.  A simple check at the beginning can determine which mode to use, memmove should devolve to memcpy if there is no overlap.  And in some cases, the compiler specifically knows whether there is overlap at compile time.

-Steve
June 04, 2013
On 04/06/13 11:56, Steven Schveighoffer wrote:
> On Mon, 03 Jun 2013 20:06:14 -0400, Peter Williams
> <pwil3058@bigpond.net.au> wrote:
>> I use a = a[0..i] ~ v ~ a[i..$] for insertion into a sorted array as
>> I'm willing to pay the cost of allocation for the convenience of array
>> notation.  One advantage is that finding i is O(log(a.length)) instead
>> of O(a.length()).  I also reasoned that the compiler can use memcopy()
>> (or whatever its name is) to do the reallocation and therefore it
>> should be fairly quick.
>
> ugh.  Sorry, the performance miser in me must object :)

I worry about correct first and then come back and worry about efficiency.  That way I have tests in place to make sure I don't break things during tweaking.  :-)

>
> There are better ways to do this, using range operations.  I'm pretty
> sure you could do it with std.algorithm.copy.

To insert "newElement" into array a" at index "i" where "i <= a.length", I've come up with:

    a ~= newElement;
    if (a.length > 1 && i < a.length - 1) {
        copy(retro(a[i .. $ - 1]), retro(a[i + 1 .. $]));
        a[i] = newElement;
    }

which works (i.e. it is correct in that it passes my unit tests).  I'm assuming allocation will only occur if it's required for the first line?

Peter
PS It's not as pretty as the original.
PPS For remove, "a = a.remove(index);" does the job.

June 04, 2013
On 04/06/13 15:25, Peter Williams wrote:
> On 04/06/13 11:56, Steven Schveighoffer wrote:
>> On Mon, 03 Jun 2013 20:06:14 -0400, Peter Williams
>> <pwil3058@bigpond.net.au> wrote:
>>> I use a = a[0..i] ~ v ~ a[i..$] for insertion into a sorted array as
>>> I'm willing to pay the cost of allocation for the convenience of array
>>> notation.  One advantage is that finding i is O(log(a.length)) instead
>>> of O(a.length()).  I also reasoned that the compiler can use memcopy()
>>> (or whatever its name is) to do the reallocation and therefore it
>>> should be fairly quick.
>>
>> ugh.  Sorry, the performance miser in me must object :)
>
> I worry about correct first and then come back and worry about
> efficiency.  That way I have tests in place to make sure I don't break
> things during tweaking.  :-)
>
>>
>> There are better ways to do this, using range operations.  I'm pretty
>> sure you could do it with std.algorithm.copy.
>
> To insert "newElement" into array a" at index "i" where "i <= a.length",
> I've come up with:
>
>      a ~= newElement;
>      if (a.length > 1 && i < a.length - 1) {
>          copy(retro(a[i .. $ - 1]), retro(a[i + 1 .. $]));
>          a[i] = newElement;
>      }
>
> which works (i.e. it is correct in that it passes my unit tests).  I'm
> assuming allocation will only occur if it's required for the first line?
>
> Peter
> PS It's not as pretty as the original.
> PPS For remove, "a = a.remove(index);" does the job.

I've since discovered std.array.insertInPlace() which does the job in a single call statement.

I found it interesting that std.algorithm has a remove but no insert and std.array has an insert but no remove.

Peter

June 11, 2013
"Michel Fortin" <michel.fortin@michelf.ca> wrote in message news:ko5dl5$b2v$1@digitalmars.com...
> On 2013-05-29 16:02:58 +0000, "Daniel Murphy" <yebblies@nospamgmail.com> said:
>
>> Introduce *C (syntax not important) to give you the raw class type, much like the raw function type.  You can then apply const directly to this type, and an appropriate suffix gets you back to the reference.
>
> Are you sure you're not starting from the wrong assumption? There's no such thing as a "raw class type" in the compiler that is separate from a "class reference".
>

I know, this would be an addition.

>> This should reduce the compiler changes required, as I recall much of the complexity was due to changing the meaning of the existing type.
>
> To implement what you want you'd have to separate the current class type in two types. which would change pretty much everything related to classes in the compiler.
>

Yes, my hope is that this is less disruptive than changing 'everything related to modifiers' as your patch did.

> My technique for "const(Object)ref" was to minimize those changes. What I ended up adding is a way for a type to have head modifiers different from its regular modifiers (but only classes are allowed to have different head modifiers). Then I went on a hunt for places checking modifiers in the compiler (code such as `c->mod`) and convert them to use head modifiers instead (`c->head()->mod`) where it made sense. It took some time, but it's not really difficult once you figure it out.
>

I am familiar with your implementation.

>> This would also play better with template argument deduction, as there was no clear way to define it when ref was optional. The inconsistent handling of arrays and pointers has since been fixed (eg const(T*) matching const(U*), U becomes const(T)* and the same for arrays) so there is a clear pattern to follow.
>
> What was problematic for template argument deduction was the lack of a coherent example of how it should work -- which as you said has been fixed since -- not the optionality of ref.
>

This is not true.

What does const(C) matching (T : const(U), U) give?

If we follow what arrays and pointers do, it removes one level of const,
giving:
const(C) : const(U)
U == const(C)ref

If you then match with (T : const(U)ref, U)

You get the 'ref' suffix removed, giving const(C), then one level of const
removed, giving:
const(C)ref

This is the problem I hit with template deduction.  It is possible to define the matching differently for classes... but the problem comes from the 'ref' suffix being optional.

My observation is that class head types behave similar to function types - you can point to them but cannot instantiate them.  My hope is that most of the class stuff will stay in TypeClass, with a the new reference type leveraging the existing default treatment of TypeNext derived types.

I guess we'll see if I ever get some time to implement this.


June 12, 2013
On 2013-06-11 23:24:56 +0000, "Daniel Murphy" <yebblies@nospamgmail.com> said:

> What does const(C) matching (T : const(U), U) give?
> 
> If we follow what arrays and pointers do, it removes one level of const,
> giving:
> const(C) : const(U)
> U == const(C)ref

Exactly. That works.

> If you then match with (T : const(U)ref, U)
> 
> You get the 'ref' suffix removed, giving const(C), then one level of const
> removed, giving:
> const(C)ref
> 
> This is the problem I hit with template deduction.  It is possible to define
> the matching differently for classes... but the problem comes from the 'ref'
> suffix being optional.

If you can manage to patch DMD as you suggest, then it'll be theoretically more sound and there's chances the resulting code in the compiler (at the semantic level at least) will be cleaner than what I did, so I'm all for it.

I fail to see how getting a "non-reference" type for the class (through U in this template) would be useful though. You can't use that type directly, all you can do is add a 'ref' after it.

My fear is that you'll just move some weird behaviour from the semantic to the syntactic level. You'll have a true reference type that'll be implicitly there but optional at the same time. Well, maybe. That's just a feeling I have. By all means, give it a try so we know how it fares.

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca/