December 27, 2006
"Jeff Nowakowski" <jeff@dilacero.org> wrote in message news:emum7l$21kg$1@digitaldaemon.com...
> Walter Bright wrote:
>> I think the problem is that Java is just lacking in some needed features - like a full set of basic types, simple aggregates, out parameters, etc. The alternatives are computationally expensive, and so the optimizer has a lot of work to do to reverse engineer them and figure out that all the programmer was doing was a workaround for a POD stack aggregate.
>
> Could you expand on the above items?  What basic types are missing? Does a "simple aggregate" refer to structs/tuples?  What does POD stand for, and what is a POD stack aggregate?  Could you show a code example in D that performs badly in Java?  Pardon my ignorance.
>
> -Jeff

Unsigned types are missing in Java.  POD means plain old data.  Walter is talking about D's structs which are instantiated on the stack instead of the heap.  Currently, Java has no way of allocating an aggregate data structure on the stack, although I hear that Sun is working on it.

-Craig


December 27, 2006
Jeff Nowakowski wrote:
> Walter Bright wrote:
>> I think the problem is that Java is just lacking in some needed features - like a full set of basic types, simple aggregates, out parameters, etc. The alternatives are computationally expensive, and so the optimizer has a lot of work to do to reverse engineer them and figure out that all the programmer was doing was a workaround for a POD stack aggregate.
> 
> Could you expand on the above items?

I'm not Walter, but I can answer most of these:

> What basic types are missing?

The unsigned ones,

> Does a "simple aggregate" refer to structs/tuples?

Structs, probably.

> What does POD stand for, 

Plain Old Data, e.g. ints, chars, floats, C-style structs (that contain only POD members).
Classes aren't POD types (in D and Java) because they have virtual functions which requires a vtable, disqualifying them.

> and what is a POD stack aggregate?

A POD aggregate stored on the stack if possible, i.e. a C or D struct (with only POD members).

> Could you show a code example in D that performs badly in Java? Pardon my ignorance.

Sorry, can't help there. I haven't used Java in a couple of years...
December 27, 2006
Jeff Nowakowski wrote:
> Walter Bright wrote:
>> I think the problem is that Java is just lacking in some needed features - like a full set of basic types, simple aggregates, out parameters, etc. The alternatives are computationally expensive, and so the optimizer has a lot of work to do to reverse engineer them and figure out that all the programmer was doing was a workaround for a POD stack aggregate.
> 
> Could you expand on the above items?  What basic types are missing? Does a "simple aggregate" refer to structs/tuples?  What does POD stand for, and what is a POD stack aggregate?

POD = Plain Old Data.  In D, a struct is a POD stack aggregate.  And I'm not sure what Walter meant about basic types, but I've been occasionally annoyed at the lack of unsigned primitives in Java.  The out parameter issue (or a lack of tuples) is another one.  It's really not uncommon to want to return/alter two values in a function call, particularly in the case of recursion.  That said, it's obviously possible to re-engineer a design to account for this, but the elegant approach often seems to sacrifice some degree of efficiency.


Sean
December 27, 2006
Craig Black wrote:

> 
> Unsigned types are missing in Java.  POD means plain old data.  Walter is talking about D's structs which are instantiated on the stack instead of the heap.  Currently, Java has no way of allocating an aggregate data structure on the stack, although I hear that Sun is working on it.
> 
> -Craig 

It's called escape analysis. You can read a bit about it in this article by Brian Goetz: http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html. And it has been implemented for the first time in the latest version of Java, 1.6.

December 27, 2006
Waldemar wrote:
> Actually, the reality is Java is plenty slow in many applications. There is JNI
> for a reason.  Never mind cases where Java is not even considered (fast servers,
> OS internals, communication, graphics, driveres, embedded, etc, etc.)  As soon as
> Java reaches C/C++ speed, C++ will disappear.  Not too worry, won't happen any
> time soon.
> 
> Having said that, there is always a danger that Sun develops "low level Java" with
> performance truly matching C++.   If that happens D might as well close the shop.
>  Same thing with C#.  MS can definitely do it.  At the moment they go with this
> "safer" C/C++ with many custom libs and features.  But you can feel it's just one
> more step, so watch out.

Last year, I developed a pretty sophisticated machine learning algorithm for my company. The training phase of the algorithm needs to read 5 or 6 gigabytes of data, create incremental indices of commonly occurring features in many different categories, conduct a statistical analysis of all categorical feature histograms, and then plot each of the indices in n-dimensional vector space to create agglomerative category clusters.

There's lots of I/O, and plenty of math involved.

I wrote the code in Java. Thanks to the language's consistent, orthogonal semantics, the code is very easy to read and understand. With only a little profiling and a few tiny optimization tweaks, I was very satisfied with performance.

Then, because this algorithm needed to be deployed to heterogeneous environments, a colleague of mine ported my code to C++. He did a straight transliteration of the code, preserving the same semantics from the Java to C++.

When we timed both implementations, we discovered that mine was 40 percent faster. Several of the C++ developers on my team were completely  incredulous, and they made it their personal quest to optimize the C++ version so that it was the performance winner.

They eventually caught up to, and surpassed, the performance of the Java code. But only after introducing a bunch of ugly optimizations, making the code much more difficult to follow. Of course, the types of optimizations that they performed are unavailable in Java.

The moral of the story is that the JVM is fast. Very fast. And straightforward implementations of algorithms are often faster in Java than in C++. But C++ provides a broader suite of possible micro-optimizations, so it's possible to bend over backwards in order to write *really* fast code. But, when writing equivalent code in C++ and Java, it's tough to say which will have better performance characteristics.

To echo what others have said before me, you will *never* win over any Java programmers to D by emphasizing performance.

The lack of a VM requirement, on the other hand, is very compelling.

--benji
December 27, 2006
Benji Smith wrote:
> Then, because this algorithm needed to be deployed to heterogeneous environments, a colleague of mine ported my code to C++. He did a straight transliteration of the code, preserving the same semantics from the Java to C++.

What did he do about the memory allocation? Use a gc in C++?
December 27, 2006
Walter Bright wrote:
> I haven't worked with Java for over 10 years now. But in March I attended "Java Performance Myths" by Brian Goetz, who is very knowledgeable about getting the most out of Java. He indicated that Java  needed another 10 years before it would be able to consistently match C toe to toe. And this is happening despite years of massive investment in Java by lot of very smart people, just to approach what relatively simple C compilers can do. This to me indicates there's a fundamental problem with Java.


In a head to head C vs. D comparison on the computer go mailing list. It was reported that D was slower than C by a factor of 1.5.  That's close enough for me to consider D sufficiently fast.  I don't know how that compares to Java.

http://www.mail-archive.com/computer-go@computer-go.org/msg00663.html
December 28, 2006
Benji Smith wrote:

> 
> Then, because this algorithm needed to be deployed to heterogeneous environments, a colleague of mine ported my code to C++. He did a straight transliteration of the code, preserving the same semantics from the Java to C++.

Does that mean that wherever you did "new Foo" he did a "new Foo" also?


> When we timed both implementations, we discovered that mine was 40 percent faster. Several of the C++ developers on my team were completely  incredulous, and they made it their personal quest to optimize the C++ version so that it was the performance winner.
> 
> They eventually caught up to, and surpassed, the performance of the Java code. 

Any idea by how much the C++ surpassed the Java in the end?  Was it about the same margin (~40%) or significantly more or less?  It's a big difference between 10x the Java performance vs say only 5% faster.

> To echo what others have said before me, you will *never* win over any Java programmers to D by emphasizing performance.

Probably not, if we're talking about someone who has Java Programmer with a capital P embossed on their business card.  But I hear that it's pretty common these days for schools to teach Java as the main programming language to students.  If that's true, then it seems reasonable that there is a category of people using Java simply because it's what they were taught, and they might be interested in a language with syntax not too far from Java which at least in benchmarks significantly outperforms java:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=dlang&lang2=java
(and which doesn't require a VM)

--bb
December 28, 2006
Waldemar wrote:
> == Quote from Mike Parker (aldacron71@yahoo.com)'s article
>> Bill Baxter wrote:
>>> I wholeheartedly agree with Waldemar, though, that the things that are
>>> going to sway C/C++ folks are different from what's going to sway
>>> Java/C# folks.  D's library and development tools are still rather
>>> anemic, so most likely that would send most Java/C# folks running.  On
>>> the other hand, if they find they need to deliver an app that works
>>> stand-alone, independent of a 100MB runtime environment, or one which
>>> runs at native speed, then D is probably the closest thing they're going
>>> to find to their beloved Java/C# that can do the job.
>>>
>> Being a long time Java programmer, I strongly disagree with you and
>> Waldemar both. Speed is only an issue for people who don't use Java, or
>> for those who don't really know how to properly write software with it.
>> Most Java programmers I know, myself included, call the "Java is slow"
>> mantra a myth. Java *used to* be slow, true. Today, it's possible to
>> code a clunky app in Java if you don't know what you are doing. But the
>> reality is that it's plenty fast in the general case. Plus, Java
> 
> Actually, the reality is Java is plenty slow in many applications. There is JNI
> for a reason.  Never mind cases where Java is not even considered (fast servers,
> OS internals, communication, graphics, driveres, embedded, etc, etc.)  As soon as
> Java reaches C/C++ speed, C++ will disappear.  Not too worry, won't happen any
> time soon.
> 
> Having said that, there is always a danger that Sun develops "low level Java" with
> performance truly matching C++.   If that happens D might as well close the shop.
>  Same thing with C#.  MS can definitely do it.  At the moment they go with this
> "safer" C/C++ with many custom libs and features.  But you can feel it's just one
> more step, so watch out.
> 

D is a great all-around language, but in order to get the benefits of a language like D ("low level Java"), you have to trade some safety and 'hand-holding'. The typical Sun and MS customer is not interested in paying Sun or MS for Java / .NET tools and support only to turn around and have to pay for tools / support for another 'lower level' language with a 'buggy' (because it's newer) port of the same library written for a native compiler.

I seriously doubt Sun or MS would invest in something like that when the market has been moving the other way for so long. For one thing there's the bottom line -- Sun makes it's profit from hardware and services [like tuning Java applications] and MS makes much of it's profit from new licenses for bloated software sold to run on newer, ever faster hardware.

Neither would really stand to 'profit' from an easy to use, high performance, natively compiled language when there's C/++ for those jobs already. MS has recently went to great expense to "retrofit" C++ for .NET, not the other way around. For a large company to produce a new language, go through the growing pains of developing a new language like D, and then retrofit the Java or .NET libs. to take advantage of the new language would be an enormous cost. Even with a huge amount of support, C# for example didn't take off overnight.

I forget the name of it, but Sun came up with a language / runtime intended for numerics. It is anything but small, lightweight and single-CPU efficient. It looked like more of a resource pig than even Hotspot. It's purpose was really to sell Sun hardware and services, no doubt.

I see it more likely that once D takes off, MS, Sun and others may then develop their own compilers/runtimes/libs. for D because they need the bundle to make their systems and tools the most appealing to a wide range of customers.
December 28, 2006
Walter Bright wrote:
> Benji Smith wrote:
>> Then, because this algorithm needed to be deployed to heterogeneous environments, a colleague of mine ported my code to C++. He did a straight transliteration of the code, preserving the same semantics from the Java to C++.
> 
> What did he do about the memory allocation? Use a gc in C++?

No, he explicitly deleted all of the objects that he newed.