June 02, 2013
On 6/2/13 9:59 AM, Manu wrote:
> I've never said that virtuals are bad. The key function of a class is
> polymorphism.
> But the reality is that in non-tool or container/foundational classes
> (which are typically write-once, use-lots; you don't tend to write these
> daily), a typical class will have a couple of virtuals, and a whole
> bunch of properties.

I've argued if no dispatch is needed just make those free functions. _Everything_ in a class is supposed to be overridable, unless inherited and explicitly "final"ized. It's sort of a historical accident that things got the way they are. But in D we know better because we have the module-level privacy model and UFCS. So we should break clean from history.


Andrei
June 02, 2013
On 6/2/13 12:16 PM, John Colvin wrote:
> A lot of HPC scientific code is, at best, horribly fragile.

Reminds me of that internal joke at www.llnl.gov. "If we make a mistake, millions of people will live." True story.

Andrei
June 02, 2013
On 06/02/2013 05:53 PM, Roy Obena wrote:
> You're making this up. I'm sure they do a lot of data-driven tests or simulations that make most errors detectable. They may not be savvy programmers, and their programs may not be error-free, but boat-loads of errors? C'mon.

I don't think he's making this up.  I would not want to make any assumptions about any particular institutions, and I would like to believe that institutions with large-scale, long-term computational projects have better practices, but I think that most people in maths and physics research have very limited experience of good code design and testing practices (working with D, and its unit-testing framework and contract guarantees, has certainly been an eye-opener for me).

Generally speaking it's true that in maths and physics there are often either theoretical calculations or empirical data points for you to compare your computational results to, so you can usually confirm that your program is doing what it's supposed to, but not always.

There may not be boat-loads of errors in terms of output, but I bet there are boat-loads of loopholes that will result in insane mistakes if the input were to step outside the use-cases or parameter constraints the researcher has considered.  I don't exclude my own code from that criticism, and the reason it's tolerated is because it's often the quickest way to get to working code; you know that you have to constrain parameters in such-and-such a way for it to work, and you know that you _will_ constrain yourself accordingly.  But of course it's easy to shoot yourself in the foot when you start tweaking things. D's assert() and enforce() functions, and contract checks, are very useful here.
June 02, 2013
On 06/01/2013 08:22 PM, Walter Bright wrote:> On 5/30/2013 7:56 PM, Andrei Alexandrescu wrote:
>> On 5/30/13 9:26 PM, finalpatch wrote:
>>> https://dl.dropboxusercontent.com/u/974356/raytracer.d
>>> https://dl.dropboxusercontent.com/u/974356/raytracer.cpp
>>
>> Manu's gonna love this one: make all methods final.
>
> I have another suggestion. class Sphere and class Ray should be structs.
> Neither class uses polymorphism in any way, so there's no reason to make
> them classes with virtual functions.
>
I was about to write that too, but when the goal is to write a scenegraph polymorphism is what you want, i.e. Sphere is just one object and there is different lights. Ray already is a struct.
June 02, 2013
On 05/31/2013 06:29 AM, Juan Manuel Cabo wrote:
> I just shaved 1.2 seconds trying with dmd by changing the dot function from:

Yep, using -profile on the original code shows that 33% is spend in Sphere.intersect. And the asm for the dot product is horrible.
June 02, 2013
On 6/2/2013 5:29 AM, Paulo Pinto wrote:
> There was an office there that had the sentence "You can program Fortran in any
> language" on the door. :)

I think that joke is older than me!

June 02, 2013
On 6/2/2013 4:10 AM, Iain Buclaw wrote:
> -O3 is covered in the GDC case (turns on -finline-functions).

The reason -inline is a separate switch is for debugging purposes (to set breakpoints on the function that would have been inlined), and also to generate profile statistics.

June 02, 2013
On 06/02/2013 04:34 PM, Manu wrote:
> Well this is another classic point actually. I've been asked by my friends at
> Cambridge to give their code a once-over for them on many occasions, and while I
> may not understand exactly what their code does, I can often spot boat-loads of
> simple functional errors. Like basic programming bugs; out-by-ones, pointer
> logic fails, clear lack of understanding of floating point, or logical structure
> that will clearly lead to incorrect/unexpected edge cases.
> And it blows my mind that they then run this code on their big sets of data,
> write some big analysis/conclusions, and present this statistical data in some
> journal somewhere, and are generally accepted as an authority and taken seriously!
> 
> *brain asplode*

Yes, I can imagine.  I've seen more than enough researcher-written code that made my own brain explode, and I don't consider myself in any way expert in program design.  You have to hope that there were sufficient checks against empirical or theoretical results, that at least any error was minimized ...

What bothers me more than the "trust" issues about the code is that very often, the code is never made available for review.  It's astonishing how timid journals and funding organizations are about trying to resolve this.

> I can tell you I usually offer more in the way of fixing basic logical errors
> than actually making it run faster ;)
> And they don't come to me with everything, just the occasional thing that they
> have a hunch should probably be faster than it is.

I certainly went through a phase of extreme speed-related paranoia in my C programming past, and I think it's a common trait.  Speed is the thing you worry about because it's the most observable problem.  And of course C tends to bias you towards daft micro-optimization rather than basic things like getting the algorithms right.

> I hope my experience there isn't too common, but I actually get the feeling it's
> more common that you'd like to think!
> This is a crowd I'd actually love to promote D to! But the tools they need
> aren't all there yet...

I think it's probably very common, exacerbated by the fact that most researchers (not just in maths and physics) are amateur, self-taught programmers with limited time to study the art of programming for itself, and limited opportunities for formal training, who are under great pressure to produce research results quickly and continuously.

I do what I can to promote D to colleagues, and I've had one or two people come up to me spontaneously and ask about it (because they've seen my name on the mailing lists), so I think the interest is growing and it will get there.  90% of the worries I have with it concern 3rd-party libraries -- obviously C/C++ gives you access to a much wider range of stuff without having to write bindings (which to me at least, is a scary prospect, not so much the technical side as the hassle and time requirement of having to write and maintain them).  The other 10% of the worries are about potential issues with the standard library that might result in incorrect results (actually, I'm pretty confident here really, but there are a number of known issues in std.random which I make sure to work around).

> Yeah, this is an interesting point. These friends of mine all write C code, not
> even C++.
> Why is that?
> I guess it's promoted, because they're supposed to be into the whole 'HPC'
> thing, but C is really not a good language for doing maths!

Well, I can't speak for proper HPC because it's not my field (I work in complexity science, which does involve a lot of intensive computation but has developed somewhat independently of traditional HPC fields).  However, my guess would be that it's a mix of what people are first trained in, together with a measure of conservatism and "What's the lowest common denominator?".  I also don't think I can stress enough how true it is that mathematicians, physicists and other researchers tend to be trained to program _in C_, or in C++, or in FORTRAN, rather than _how to program_.

Speed concerns might be a factor, as C++ offers you rather more ways to shoot yourself in the foot speed-wise than C -- there might be some prejudice about C being the one to use for really heavy-duty computation, though I imagine that says more about the programmer's skill than the reality of the languages.

In my own field the norm these days seems to be a hybrid of C/C++ and Python -- the former for the high-intensity stuff, the latter for convenience or to have a friendly surface from which to call the high-intensity routines, although libraries like NumPy seem to be challenging the C dominance for some intense calculations -- I'm seeing an increasing number of Python libraries being written and used.

That said, I don't think language lock-in is unique to mathematicians and physicists.  Many of the computer scientists I've worked with have been wedded to Java with a strength that is astonishing given that you'd expect them to be trained well enough to really appreciate the variety of choices available.  (In my experience, mathematicians and physicists tend to be far more comfortable with the command line and in programming without an IDE.  That may of course explain some of the errors, too:-)

> I see stuff like this:
> float ***cubicMatrix = (float***)malloc(sizeof(float**)depth);
> for(int z=0; z<width; z++)
> {
>   cubicMatrix[z] = (float**)malloc(sizeof(float**)*height);
>   for(int y=0; y<height; y++)
>   {
>     cubicMatrix[z][y] = (float*)malloc(sizeof(float*)*width);
>   }
> }
> 
> Seriously, float***. Each 1d row is an individual allocation!
> And then somewhere later on they want to iterate a column rather than a row, and
> get confused about the pointer arithmetic (well, maybe not precisely that, but
> you get the idea).

Oh yes, I've written code like that. :-P

I can only say that it's the way that I was shown how to create matrices in C. I can't remember the context; possibly I read it in a book, or possibly it was by browsing other code, possibly it was in lecture notes.

That said, it's the _obvious_ way to create a matrix (if you think of a matrix as being an entity whose values are accessed by indices [x][y][z]), and if you're not trained in program design, the obvious way tends to be the thing you pick, and it seems to work, so ...

I mean, I guess (I've never had call to do it, so never looked into the detail) the way to _really_ build an effective matrix design is to have a single array and wrap it with functions that translate x, y, z indices to appropriate array locations.  But you wouldn't think of that as a novice programmer, and unless someone teaches you, the multi-level alloc solution probably seems to work well enough that you never think to question it.  (For the avoidance of doubt: I'm at least experienced enough to have questioned it before this email exchange:-)

That kind of "works well enough" probably explains 99% of the programming faults made by researchers, together with the fact that they very rarely encounter anyone with the experience to question their approach -- and of course, their own sense of the problems within their code can (as you've experienced) be very different from the problems an experienced developer will focus on.

I have to say that it'd be very tempting to try and organize an annual event (and maybe also an online space) where researchers using computation are brought together with genuinely expert developers for lectures, brainstorming and collaboration, with the specific aim of getting away from these kinds of habitual errors.
June 02, 2013
On 06/02/2013 06:44 PM, Andrei Alexandrescu wrote:
> On 6/2/13 12:16 PM, John Colvin wrote:
>> A lot of HPC scientific code is, at best, horribly fragile.
> 
> Reminds me of that internal joke at www.llnl.gov. "If we make a mistake, millions of people will live." True story.

A story I heard from a lecturer of mine -- one of the interesting little factors of the Cold War was how Soviet Russia managed to keep parity with NATO in terms of missile guidance and other computer-related military technologies.  The US had far superior hardware, so it was both a concern and a mystery.  What came out after the end of the Cold War was quite impressive -- Soviet scientists had realized very well that they couldn't compete on the hardware front and so had focused a very intense effort on really, really efficient algorithms that squeezed every drop of performance out of the hardware they had available, far greater performance than Western computer scientists had ever imagined possible.
June 02, 2013
On Sunday, June 02, 2013 12:37:38 Andrei Alexandrescu wrote:
> On 6/2/13 9:59 AM, Manu wrote:
> > I've never said that virtuals are bad. The key function of a class is
> > polymorphism.
> > But the reality is that in non-tool or container/foundational classes
> > (which are typically write-once, use-lots; you don't tend to write these
> > daily), a typical class will have a couple of virtuals, and a whole
> > bunch of properties.
> 
> I've argued if no dispatch is needed just make those free functions.
> _Everything_ in a class is supposed to be overridable, unless inherited
> and explicitly "final"ized. It's sort of a historical accident that
> things got the way they are. But in D we know better because we have the
> module-level privacy model and UFCS. So we should break clean from history.

There are four problems with that:

1. Very few programmers think that way. The normal thing in most every OO language is to put all of the functions on the class, so pretty much no one is going to make them free functions. Do you really expect people to put properties (what would have been getters and setters in other languages) outside the class? It's not going to happen. And by default, everyone takes a performance hit as a result (folks like Manu and Don wcare about that performance hit more than many of us, but it's still there).

2. The class' functions are no longer encapsulated inside the class. For some programmers, this is a big deal. They want all of the class' functionality on the class where it's easy to find. Having UFCS makes using free functions less of a problem, but many programmers will absolutely hate the idea of putting their non-virtual functions outside of the class, so they won't do it, and they (and everyone using their code) will end up with virtual functions when the functions shouldn't be virtual.

3. In many cases, putting a function outside of the class is a royal pain. This is particularly true with templated classes. If you have

class C(T)
{
   private T _var;
}

and you want a property to give you var, you end up with something ugly like

@property auto var(U)(U this_)
    if(is(U V == C!W, W))
{
    return this_._var;
}

Do you really expect many programmers to be able to pull off complicated is expressions like that? _I_ have to look it up every time I use it. Sure, using free functions might work in simple cases, but it starts falling apart when you have to deal with stuff like templated types.

4. It causes more ambiguities and compilation errors. If a function is on the class, it always wins. If it's a free function, then you potentially have ambiguities due to overload sets. Making sure that the function takes the exact type significantly reduces the problem, but someone else could easily create a function with the same signature which now conflicts with the one which is effectively supposed to be a member function.


I also don't see what trying to turn all of these member functions into free functions to use with UFCS buys us. Given that we have virtual by default, it's one way to avoid the function being virtual, but it's just simpler to mark it final if that's what you're trying to do. I think that the view point that every function in a class is supposed to be overridable is demonstrably false. How many functions in your average class get overriden? How often does final get used (or virtual not get used in the case of C++)? I think that it's quite clear that a lot of programmers don't want all of their class functions to be virtual. It seems to me that your argument is based purely on the idea that the main reason to use a class is for polymorphism, but that's not the only reason to use a class, and just because you have a class doesn't mean that you want everything it does to be polymorphic. And you're saying that everything that isn't polymorphic doesn't belong in a class just because the main purpose of a class is polymorphism. I don't buy that at all, and the average programmer agreed with you, they would have been using free functions all along, but the vast majority of them put all of the functions on the class, and I don't think that UFCS is going to change that at all.

- Jonathan M Davis
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19