December 25, 2009
bearophile Wrote:

> Steven Schveighoffer:
> 
> >The new code beats the old runtime in both tests<
> 
> Are you testing Phobos or Tango? Because I think Tango may have a performance bug there that's missing in Phobos (another even worse performance bug that I think is present in Tango is on the isIn_r of built in associative arrays).

D2, so phobos.

-Steve
December 25, 2009
Leandro Lucarella Wrote:

> Steven Schveighoffer, el 24 de diciembre a las 12:59 me escribiste:
> > In fact, semantics are simpler.  You can completely eliminate the explanation about stomping in the spec.
> 
> What about the cache? You have to explain to the users that there is
> a possibility that their appends works blazing fast, except... well,
> sometimes if you fill the cache they will suck. As gruzone said, find this
> kind of weird behaviours will be very hard.

I don't think you do.  We don't explain currently in the spec that if you append to 2 arrays in lockstep, performance drops off a cliff, I don't see why we should have to explain rare corner cases for the new code either.

BTW, after you start appending to several arrays in lockstep, the bottleneck seems to shift away from the append function and to something else -- no amount of cache helps.  I think its because the array pages are all interleaving and so cannot be extended.

> > No example?
> 
> No, I said that I think arrays/slices as they are have better performance (when you know exactly what you're doing and pre-allocate and do that kind of tricks). Unless you are asking for some other kind of examples?

I meant *any* kind of gain -- performance or otherwise.  If you have no examples where separated slices perform better or are not as awkward as combined slices/arrays, then I don't see why you'd want them.  What I'm looking for is:

This code sucks with combined arrays/slices:

....

Now, look how good it is with split arrays/slices:

....


> > You think splitting the features into 2 different types with several overlapping features is easier to explain?
> 
> Yes. And I think they have overlapping features just like any other
> 2 containers, hashes and arrays have overlapping features, but they are
> completely different beasts. Same goes for arrays and slices.
> 
> PHP have arrays and hashes combined in one type. You think that's a good idea because arrays and hashes have overlapping features?

I hate to say this, and it has nothing to do with this argument, but I'm actually pretty impressed with php's arrays.  The rest of php, not so much.  They perform like hashes, but are ordered and sortable.  I tried to find out how they are implemented, but the code is so buried in the zend engine that I didn't know where to look.

> I really don't,
> and I do think arrays and slices are 2 completely different beasts that
> needs to be separated in the language. I think that's the natural good
> thing to do, not an optimization. I think D is broken by combining this
> 2 types into one.

They are only different because you say they are.  To me they are no different.  In fact, I don't even think we need to distinguish between arrays and slices, they should just all be arrays.  The result of slicing an array is -- another array.  I don't see the issue with that.

> You don't think that (and it looks like Andrei and
> Walter doesn't think that either) so there's no way we would agree on this
> topic.

Probably not.

> > I think that is why Andrei nixed the T[new] idea, because when he tried to explain it in TDPL, it was very unnatural.
> 
> Well, I don't know what Andrei tried to do, but I think Python does pretty well with it, for example.

I don't have any experience with python, so I can't really say anything about it.

> > I think the whole basis of your argument is invalid -- the current array/slice combination type is easier to use and explain *and* performs better.
> 
> I can say exactly the same about you. That's why I said I don't argue anymore about this (grrr! I don't know why I ended up doing it in this thread). This is a matter of taste, you think array/slice combined are easier to use and I don't.

I'd love to see an example where you think this is the case.  If it's simply a matter of taste, then you have no argument.  If that's the case, it's just a personal choice, and at that point it's up to Walter.

> About performance, they only spot where
> splitting arrays and slices would perform worse than the current
> slice/arrays combined is in that you add an extra indirection for arrays
> because of the reference semantics (but you need to pass around a word
> less than with slices, so I don't know how the real impact of that would
> be).

If you were to make arrays a true reference type, I don't think you would have an allocated block of metadata pointing to a separate allocated block of data, I'd expect all to be in one block.  In that case, you don't have 2 levels of indirection.

I don't necessarily think that reference based arrays would perform much worse than the current scheme for normal usages, but the performance problem I see is when you want to append to a slice, you not only have the runtime reallocating when it may not need to (if the slice is at the end of an allocated block), but you have to spell out that you're changing the slice to an array by changing the type *before* appending.  Then it's on you to make sure the reallocation is necessary.  e.g. if you are appending one slice to another, and the slice being appended is 0 length at runtime, in order to avoid a costly reallocation, you have to check for this at runtime.

I just don't see why you think it's easier.

-Steve
December 25, 2009
Steven Schveighoffer wrote:
>> Also, if you use GC.malloc directly, and the user tries to append to slices to it, your code may break. GC.malloc doesn't seem to pad the memory block with a length field.
> 
> Yes, this is a possible problem.  However, using GC.malloc directly and then treating the result as a normal array is probably extremely rare.  At least it is not valid for safe D.  It probably should be explained as a danger in GC.malloc, but I don't think this will adversely affect most code.  There will be functions that should obviate the need for calling GC.malloc to allocate an array (specifically, allocating uninitialized data).

Maybe the whole "used size of memory block" (aka array length) thing should be built directly into the GC. Leaving GC.malloc this way would be a bit "shaky", and asks only for strange Heisenbugs.

>> I must say that I find your argumentation strange: didn't you say adding an additional field to the slice struct is too much overhead?
> 
> Overhead when passing around, not overhead for storage.  For example, pushing 12 bytes onto the stack instead of 8 when calling a function with a slice.  If you want safe appends, you need to store the allocation length somewhere, there's no way around that.

But would it really matter outside of microbrenchmarks?

In cases where it _really_ matters, even slices with two members may be overhead, and programmers will split the slice into separate variables anyway as they optimize their code.

> developing stuff for Tango (Tango uses slices to get every ounce of performance!), I'm convinced that as-fast-as-possible slice semantics for passing around data is essential.

I've often seen Tango code mess with pointers directly, where slices could have been used for clearer, more readable code. I assume this was to get more performance.
December 25, 2009
Steven Schveighoffer, el 25 de diciembre a las 06:07 me escribiste:
> > PHP have arrays and hashes combined in one type. You think that's a good idea because arrays and hashes have overlapping features?
> 
> I hate to say this, and it has nothing to do with this argument, but I'm actually pretty impressed with php's arrays. The rest of php, not so much.

OK, now is clearer than ever that we will never agree on this =)

> They perform like hashes, but are ordered and sortable.  I tried to find out how they are implemented, but the code is so buried in the zend engine that I didn't know where to look.

BTW, they are not "ordered", they preserve the "order of appending" (like
lists or, well, arrays):

$ php -a
Interactive mode enabled
<?
$a = array(1,2);
$a[9] = 9;
$a[5] = 5;
foreach ($a as $k => $v)
	printf("%s: %s\n", $k, $v);
0: 1
1: 2
9: 9
5: 5

I think they are a plain hash with a next pointer for the iteration, I don't think they have much more magic than that.

> > > I think the whole basis of your argument is invalid -- the current array/slice combination type is easier to use and explain *and* performs better.
> > 
> > I can say exactly the same about you. That's why I said I don't argue anymore about this (grrr! I don't know why I ended up doing it in this thread). This is a matter of taste, you think array/slice combined are easier to use and I don't.
> 
> I'd love to see an example where you think this is the case.  If it's simply a matter of taste, then you have no argument.  If that's the case, it's just a personal choice, and at that point it's up to Walter.

Exactly. That's what I'm saying about 3 mails ago. I just think it's easier to think in terms of arrays and slices separately (I know it is for me). And besides that, you can have simpler implementations, without making them very tightly coupled with the GC, but you already said too that for you is not a concern the runtime complexity.

> I just don't see why you think it's easier.

It's just how my mind work. I just like simpler concepts separated than a type that fit it all a la PHP. Maybe Python has something to do with it because it works like this. I learned Python after PHP and it seems much cleaner for me.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Le pedí que me enseñe a usar el mouse
Pero solo quiere hablarme del Bauhaus
Le pregunté si era chorra o rockera
Me dijo "Gertrude Stein era re-tortillera"
Me hizo mucho mal la cumbiera intelectual
1 2 3
Next ›   Last »