View mode: basic / threaded / horizontal-split · Log in · Help
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Denis Koroskin wrote:
>> You can reuse memory even if it comprehends more complex patterns
>> than one single allocation.
>> 
>> Andrei
> 
> I *highly* doubt it is worth the trouble. Most likely, this container
> won't be lock-free and scalable anymore. Performance will also
> degrade dramatically.

How did you get to "most likely"? There's so many factors in the 
equation. What is the alternative? Why wouldn't the code be lock-free? 
Why would it not be scalable?

> Besides, the more I think about thread-local/shared separation that
> is going to come to D2, the more I see that there is absolutely no
> need to make Phobos containers thread-safe. Most containers will be
> thread-local and won't need any synchronization at all.

I agree. But you also don't want to preclude using a shared container 
either.


Andrei
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Sat, 02 May 2009 18:17:16 +0400, Denis Koroskin <2korden@gmail.com> wrote:

> On Sat, 02 May 2009 18:08:30 +0400, Andrei Alexandrescu  
> <SeeWebsiteForEmail@erdani.org> wrote:
>
>> Robert Jacques wrote:
>>> On Sat, 02 May 2009 03:35:51 -0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail@erdani.org> wrote:
>>>
>>>> Robert Jacques wrote:
>>>>>> No, repeated allocations are trivial to eliminate. I didn't even
>>>>>> bother to explain that in the article. The loop must only refill the
>>>>>> allocated object from the object that needs to be replaced.
>>>>>  putting the test in the pseudo code for this would've help my
>>>>> understanding.
>>>>>  if(copy is null)
>>>>>     copy = s.deepDup;
>>>>> else
>>>>>     s.deepDupTo(copy);
>>>>
>>>> copy = cast(T*) GC.malloc(T.sizeof);
>>>> do {
>>>>      overwrite(s, copy);
>>>>      copy.modify;
>>>> } while (!cas(copy, s));
>>>  I'm confused. The article talks about copying the entire data
>>> structure, not just a node/etc. And trees, etc tend to have variable
>>> sizes, etc.
>>
>> You can reuse memory even if it comprehends more complex patterns than
>> one single allocation.
>>
>> Andrei
>
> I *highly* doubt it is worth the trouble. Most likely, this container  
> won't be lock-free and scalable anymore. Performance will also degrade  
> dramatically.
>
> Besides, the more I think about thread-local/shared separation that is  
> going to come to D2, the more I see that there is absolutely no need to  
> make Phobos containers thread-safe. Most containers will be thread-local  
> and won't need any synchronization at all.

I meant, most essential ones (Array, Set, Map, HashMap, SList, DList, IntrusiveSList, IntrusiveDList etc).
Lock-free containers are needed, too, but I believe they should be designed after thread-unsafe ones.
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Robert Jacques wrote:
> I do scientific computing. Generally, I find it breaks down into two 
> parts: things under 4x4, for which value types are probably better, and 
> everything else, for which value types are to be avoided like the 
> plague. I'll often work with 100's mb of data with algorithms that take 
> minutes to hours to complete. So an unexpected copy is both hard to find 
> (am I slow/crashing because of my algorithm, or because of a typo?) and 
> rather harmful, because its big.

I don't buy this. Undue copying is an issue that manifests itself 
locally, reproducibly, and debuggably. Contrast with long-distance 
coupling which is bound to hard to debug. You change a matrix here, and 
all of a sudden a matrix miles away has been messed up. Also, efficiency 
can be fixed with COW, whereas there is nothing you can do to fix the 
coupling aside from relentless and patient user education.

Walter gave me a good argument (little did he know he was making a point 
destroying his.) Consider the progress we made when replacing char[] 
with string. Why? Because with char[] long-distance dependencies crop up 
easy and fast. With string you know there's never going to be a 
long-distance dependency. Why? Because unlike char[], content 
immutability makes string as good as a value.

I remember the nightmare. I'd define a little structure:

struct Sentence
{
    uint id;
    char[] data;
}

Above my desk I have a big red bulb along with an audible alarm. As soon 
as I add the member "data", the bulb and the alarm go off. Sentence is 
now an invalid struct - I need to add at least constructor and a 
postblit. In the constructor I need to  call .dup on the incoming data, 
and in the postblit I need to do something similar (or something more 
complicated if I want to be efficient). This is a clear example of code 
that is short and natural, yet does precisely the wrong thing. This is 
simply a ton of trouble, as experience with C++ has shown.

I'm not even getting into calling functions that take a char[] and 
keeping fingers crossed ("I hope they won't mess with it") or .dup-ing 
prior to the call to eliminate any doubt (even though the function may 
anyway call .dup internally). string has marked huge progress towards 
people considering D seriously.

> But I've generally worked on making 
> something else fast so more data can be crunched, etc. Actual prototype 
> work (for array/matrix based stuff at least) is often done in Matlab, 
> which I think uses COW under-the-hood to provide value semantics. So I 
> think anyone turning to D to do scientific computing will know reference 
> semantics, since they'd already be familiar with them from C/C++, etc 
> (Fortran?). Although successfully attracting algorithm prototypes from 
> Matlab/python/mathmatica/R/etc is probably bigger issue than just the 
> container types, growing the pie was why the Wii won the last console wars.

Fortran uses pass by reference, but sort of gets away with it by 
assuming and promoting no aliasing throughout. Any two named values in 
Fortran can be assumed to refer to distinct memory. Also unless I am 
wrong A = B in Fortran does the right thing (copies B's content into A). 
Please confirm/infirm.

For all I know, Matlab does the closest to "the real thing". Also, C++ 
numeric/scientific libraries invariably use value semantics in 
conjunction with expression templates meant to effect loop fusion. Why? 
Because value semantics is the right thing and C++ is able to express 
it. I should note, however, that Perl Data Language uses reference 
semantics (http://tinyurl.com/derlrh).

There's also a definite stench when one realizes that

a = b;

on one side, and

a = b * 1;

or

a = b + 0;

on the other, do completely different things.

So what we're looking at is: languages that had the option chose value 
semantics. Languages that didn't, well, they did what they could.

I started rather neutral in this discussion but the more time goes by, 
the more things tilt towards value semantics.


Andrei
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On 2009-05-02 10:18:41 -0400, Andrei Alexandrescu 
<SeeWebsiteForEmail@erdani.org> said:

>> Yes, when you have an a=b anywhere you've got to pay attention and
>> make sure you didn't mean a=b.dup.
> 
> That is terrible. Anywhere == a portion of the code far, far away.

It's funny that you mention ==, since anywhere you write = you also got 
to pay attention that you didn't really mean ==.

-- 
Michel Fortin
michel.fortin@michelf.com
http://michelf.com/
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Sat, 02 May 2009 06:23:46 -0400, Michel Fortin  
<michel.fortin@michelf.com> wrote:

> On 2009-05-01 13:44:58 -0400, "Robert Jacques" <sandford@jhu.edu> said:
>
>> I think you might of misunderstood the concept of a unique/mobile type.  
>> If  you take a slice or a reference of an unique, you null the source.  
>> Which  makes using unique T logically invalid as a container.
>
> Take note: I'm not expecting it to copy automatically, "unique T[]"  
> isn't a value-type container. I'm solving the problem of slices  
> detaching from one another when you grow them, which I don't think is a  
> problem addressed by value-type containers either. If you grow the slice  
> or container, previously-taken slices of it may or may not be left  
> dangling. Allowing grow only on "unique T[]" makes sure there are no  
> slice to be invalidated when you grow.
>
> Also, generally type systems featuring unique also have that "lent" flag  
> allowing you to pass the unique reference to functions which are  
> required to not keep it anywhere after the function returns. So you  
> could pass lent slices of your unique T[] to functions without breaking  
> uniqueness of the reference.

By the way, the canonical system with a unique/mobile type (CSP) doesn't  
have a lent flag. It lets you circumvent uniqueness instead. Scope/lent's  
definitely better :)

Hmm... lent functions are a fairly restrictive subset, if you consider  
unique T[] to be long lived. However, in the more common string builder  
style of programming, it works just fine. (Though people will still want a  
capacity field). So maybe an array type with unique semantics and a  
capacity field is what's warranted (either language or library type).

Would you also allow T[] ~=? i.e. x~=y; => x = x~y; which is less  
efficient but still valid.

>> Also, shared unique  T can't be converted to shared T. This  
>> incompatibility is the whole point  of having a unique type in the  
>> first place, and negates your concept of  using unique T as containers.
>
> Why couldn't it? You just lose (forever) the uniqueness of the reference  
> when you copy it to a non-unique one.

The whole point of unique is that it lacks any sharing protection  
mechanisms. Instead the object is protected by the fact that only one  
thread has access to it at a time. You could design the system so that  
uniques could be casted to local objects (I don't think it's a good idea,  
but you could), but casting them to shared objects is a recipe for  
disaster, due to races, etc.

>> And generally, unique T implies shared  storage, and can not be used  
>> for thread-local arrays.
>
> Generally, but not necessarily. Any problem for unique be applicable to  
> thread-local arrays?

Nope, the implication comes from the standard use case of cheaply passing  
data between threads.
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
About the issue of matrices as reference or value semantics.

As a program I think references are more efficient, they are used everywhere in many languages like Python/Numpy, Sage, Ruby, etc, and I don't like the idea of putting "ref" everywhere. So I like references more. Also all objects in D are reference-based, so a D programmer is already very used to the concept of reference-based data (the problems in D1 don't come from the fact that arrays are references, but from the fact that their pointer/length is passed by value, so for example if you change the length inside a function, such change can't be seen outside. So by default for dynamic arrays I'd like a full reference semantic).

On the other hand the value semantics is the most natural. Python uses reference semantics as an optimization mean, and not because references are more natural for programming newbies.
To prove this, every day in the python newsgroup people don't understand why:
if the following builds a list (array) of 10 integers:

>>> v = [0] * 10
>>> v
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> v[1] = 5
>>> v
[0, 5, 0, 0, 0, 0, 0, 0, 0, 0]

The following doesn't build a 3*4 matrix:
>>> m = [[0] * 3] * 4
>>> m[1][2] = 6
>>> m
[[0, 0, 6], [0, 0, 6], [0, 0, 6], [0, 0, 6]]

The answer comes from the value semantics of lists in python, plus immutability of natural numbers.
Such really common errors in newbies show that reference semantics isn't natural.
A future Python-like language that wants to be higher-level and less error-prone and more natural than Python will probably use value semantics everywhere (with COW, etc).

And I agree with Andrei Alexandrescu that reference semantics may produce bugs far away from their originating point.

The future of programming is more toward immutable values and functional programming, like the immutable strings of D2 (and some very intelligent people have invented clever purely functional data structures that minimize data copying).
In a purely functional language there's no point in giving the programmer a way to tell references apart from values. They can be seen as the same thing.

Bye,
bearophile
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
Andrei Alexandrescu:
> Consider the progress we made when replacing char[] 
> with string. Why? Because with char[] long-distance dependencies crop up 
> easy and fast.

For big arrays people will put "ref" everywhere, so those bugs will not decrease.

Bye,
bearophile
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Sat, 02 May 2009 10:58:08 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> Robert Jacques wrote:
>> I do scientific computing. Generally, I find it breaks down into two  
>> parts: things under 4x4, for which value types are probably better, and  
>> everything else, for which value types are to be avoided like the  
>> plague. I'll often work with 100's mb of data with algorithms that take  
>> minutes to hours to complete. So an unexpected copy is both hard to  
>> find (am I slow/crashing because of my algorithm, or because of a  
>> typo?) and rather harmful, because its big.
>
> I don't buy this. Undue copying is an issue that manifests itself  
> locally, reproducibly, and debuggably. Contrast with long-distance  
> coupling which is bound to hard to debug. You change a matrix here, and  
> all of a sudden a matrix miles away has been messed up. Also, efficiency  
> can be fixed with COW, whereas there is nothing you can do to fix the  
> coupling aside from relentless and patient user education.
>
> Walter gave me a good argument (little did he know he was making a point  
> destroying his.) Consider the progress we made when replacing char[]  
> with string. Why? Because with char[] long-distance dependencies crop up  
> easy and fast. With string you know there's never going to be a  
> long-distance dependency. Why? Because unlike char[], content  
> immutability makes string as good as a value.
>
> I remember the nightmare. I'd define a little structure:
>
> struct Sentence
> {
>      uint id;
>      char[] data;
> }
>
> Above my desk I have a big red bulb along with an audible alarm. As soon  
> as I add the member "data", the bulb and the alarm go off. Sentence is  
> now an invalid struct - I need to add at least constructor and a  
> postblit. In the constructor I need to  call .dup on the incoming data,  
> and in the postblit I need to do something similar (or something more  
> complicated if I want to be efficient). This is a clear example of code  
> that is short and natural, yet does precisely the wrong thing. This is  
> simply a ton of trouble, as experience with C++ has shown.
>
> I'm not even getting into calling functions that take a char[] and  
> keeping fingers crossed ("I hope they won't mess with it") or .dup-ing  
> prior to the call to eliminate any doubt (even though the function may  
> anyway call .dup internally). string has marked huge progress towards  
> people considering D seriously.

Andrei, you're perfectly right about strings. They've been a god send. But  
strings are small, cheap to copy and tend to roam a lot. Scientific  
computing (at least the kind I do) is very local, and uses large datasets  
which are expensive to copy. I load data and set up the problem  
parameters, then pass through a series of functions and write it out the  
answer. When I first learned Matlab, my prof gave me a great rule of  
thumb: if it's more than 10 lines long, you're probably doing it wrong. My  
code bases tend to be small and focused, and use either my own code or  
libraries I understand since they're logically pure (Ah, math functions (I  
don't mess with rounding)).

And while finding bugs is relatively easy (go small, personal code bases)  
tracking down a performance issue if you don't even know it exists is a  
lot harder/slower.

>> But I've generally worked on making something else fast so more data  
>> can be crunched, etc. Actual prototype work (for array/matrix based  
>> stuff at least) is often done in Matlab, which I think uses COW  
>> under-the-hood to provide value semantics. So I think anyone turning to  
>> D to do scientific computing will know reference semantics, since  
>> they'd already be familiar with them from C/C++, etc (Fortran?).  
>> Although successfully attracting algorithm prototypes from  
>> Matlab/python/mathmatica/R/etc is probably bigger issue than just the  
>> container types, growing the pie was why the Wii won the last console  
>> wars.
>
> Fortran uses pass by reference, but sort of gets away with it by  
> assuming and promoting no aliasing throughout. Any two named values in  
> Fortran can be assumed to refer to distinct memory. Also unless I am  
> wrong A = B in Fortran does the right thing (copies B's content into A).  
> Please confirm/infirm.
>
> For all I know, Matlab does the closest to "the real thing". Also, C++  
> numeric/scientific libraries invariably use value semantics in  
> conjunction with expression templates meant to effect loop fusion. Why?  
> Because value semantics is the right thing and C++ is able to express  
> it. I should note, however, that Perl Data Language uses reference  
> semantics (http://tinyurl.com/derlrh).

Actually, all the big, high performance libraries I know use BLAS/Linpack  
(i.e. reference semantics). Though there are a good number of middle range  
libraries use expression templates.

Hmm... just went googling and found some neat work on mixing expression  
templates with BLAS APIs, including using the GPU. Having you're cake and  
eating it too is nice.

Though personally I like the current array op syntax, which is reference  
based and is just enough to make me aware of doing expensive operations,  
without being annoying:

a[] = b[] + c[];


> There's also a definite stench when one realizes that
>
> a = b;
>
> on one side, and
>
> a = b * 1;
>
> or
>
> a = b + 0;
>
> on the other, do completely different things.

Again, I'd use the []= operator

a   = b + 0; // Error
a[] = b + 0; // Okay

So I don't see this as an issue.

> So what we're looking at is: languages that had the option chose value  
> semantics. Languages that didn't, well, they did what they could.

Well, that's one thing I love about D. We have a value assignment  
operator: []= in addition to a reference semantics assignment operator =  
(well, for reference types at least). I actually use it to do format  
copy/conversions in my own array class.

> I started rather neutral in this discussion but the more time goes by,  
> the more things tilt towards value semantics.

Well, for me, the more this goes on the more I get interested in trying  
value semantics. But if I'm honest, I'd just go on using array slices and  
forget about the array containers. And the lack of good, fast lock-free  
value semantics containers (even the concept of shared data without  
reference semantics) is a border-line deal breaker, depending on who you  
are (It's bad PR regardless).

Also, in a value semantics world, refs are third class citizens, but in a  
reference semantic world, value semantics get their own assignment  
operator ( []= ), and by convention, their own property ( .dup )
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Sat, 02 May 2009 10:18:41 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:
> Bill Baxter wrote:
>> On Fri, May 1, 2009 at 6:25 PM, Andrei Alexandrescu
>>>  Matrix a, b, c; ... c = a; a += b;
>>>  Does the last operation mutate c as well?
>>  I said "assignments like a = b are rare" and you put one of those in  
>> your example.
>
> You said it and I don't buy it one femtosecond. Code does need to copy
> values around. It's a fundamental operation!

Andrei, he said that explicit assignments/copies of arrays/containers are  
rare in numeric computing, which I agree with. Just because it's a  
fundamental operation doesn't mean it gets used much is his (or I guess  
Numply's actually) specific applications. Also, disregarding/disbelieving  
a use case is not helpful to this discussion.

>> Yes, when you have an a=b anywhere you've got to pay attention and
>> make sure you didn't mean a=b.dup.
>
> That is terrible. Anywhere == a portion of the code far, far away.

No, he was talking about a local decision. i.e. Did I or didn't I mean to  
make a logical copy here?
May 02, 2009
Re: RFC: naming for FrontTransversal and Transversal ranges
On Sat, 02 May 2009 10:23:13 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail@erdani.org> wrote:

> Denis Koroskin wrote:
>>> You can reuse memory even if it comprehends more complex patterns
>>> than one single allocation.
>>>  Andrei
>>  I *highly* doubt it is worth the trouble. Most likely, this container
>> won't be lock-free and scalable anymore. Performance will also
>> degrade dramatically.
>
> How did you get to "most likely"? There's so many factors in the  
> equation. What is the alternative? Why wouldn't the code be lock-free?  
> Why would it not be scalable?

Because it dies a horrible death under contention. It's lock-free, but it  
isn't concurrent. Worse, the inner CAS loop is actually doing a lot of  
work, resulting in a lots of wasted CPU cycles. So as long as the write  
rarely, read many paradigm holds, it works. But try this with a stack or a  
queue and a good locking algorithm will beat you every time. Also, there's  
an issue with stale values for readers, but that's a separate thing.

>> Besides, the more I think about thread-local/shared separation that
>> is going to come to D2, the more I see that there is absolutely no
>> need to make Phobos containers thread-safe. Most containers will be
>> thread-local and won't need any synchronization at all.
>
> I agree. But you also don't want to preclude using a shared container  
> either.

I think Phobos containers should come in two forms: shared and local. i.e.
shared Queue!T sq;   /// and
       Queue!T  q;
They don't need to be there at day one, but I think if they aren't able to  
be supported (or slow as molasses), well end of with two container sets  
and lots of code duplication.
4 5 6 7 8 9 10
Top | Discussion index | About this forum | D home