August 26, 2008
Benji suggested run-time inheritance and at least from a design
perspective I like some of his thoughts.
I've got a few questions though:

a) people here said that a virtual call will make it slow. How much slow? how much of an overhead is it on modern hardware considering also that this is a place where hardware manufacturers spend time on optimizations?

b) can't a string class use implicit casts and maybe some sugar to
pretend to be a regular array in such a way to avoid that virtual call
and still be useful?
you already can do array.func(...) instead of func(array, ...) so this
can be used with a string class that implicitly converts to a char array..

C) compile-time interfaces? (aka concepts)
August 26, 2008
superdan wrote:
> for example look at this fn from std.string.
> 
> int cmp(C1, C2)(in C1[] s1, in C2[] s2);
> 
> so it looks like cmp accepts arrays of any character type. that is
> cool but the [] limits the thing to builtin arrays. the correct sig
> is
> 
> int cmp(S1, S2)(in S1 s1, in S2 s2) if (isSortaString!(S1) &&
> isSortaString!(S2));
> 
> correct?


Yes. It's just that template constraints came along later than std.string.cmp :-)
August 26, 2008
Yigal Chripun wrote:
> a) people here said that a virtual call will make it slow. How much
> slow? how much of an overhead is it on modern hardware considering also
> that this is a place where hardware manufacturers spend time on
> optimizations?

Virtual function calls have been a problem for hardware optimization. Direct function calls can be speculatively executed, but not virtual ones, because the hardware cannot predict where it will go. This means virtual calls can be much slower than direct function calls.
August 26, 2008
superdan wrote:
 >> For starters, with strings implemented as character arrays, writing
 >> library code that accepts and operates on strings is a bit of a pain in
 >> the neck, since you always have to write templates and template code is
 >> slightly less readable than non-template code. You can't distribute
your
 >> code as a DLL or a shared object, because the template instantiations
 >> won't be included (unless you create wrapper functions with explicit
 >> template instantiations, bloating your code size, but more importantly
 >> tripling the number of functions in your API).
 >
 > so u mean with a class the encoding char/wchar/dchar won't be an
issue anymore. that would be hidden behind the wraps. cool.
 >
 > problem is that means there's an indirection cost for every character
access. oops. so then apps that decided to use some particular encoding
consistently must pay a price for stuff they don't use.

So, I was thinking about the actual costs involved with the String class and CharSequence interface design that I'd like to see (and that exists in languages like Java and C#).

There's the cost of the class wrapper itself, the cost of internally representing and converting between encodings, the cost of routing all method calls through an interface vtable. Characters, if always represented using two bytes, would consume twice the memory. And returning characters from method-calls has got to be slower than accessing them directly from arrays. Right?

So I wrote some tests, in Java and in D/Tango.

The source code files are attached. Both of the tests perform a common set of string operations (searching, splitting, concatenating, and character-iterating). I tried to make the functionality as identical as possible, though I wasn't sure which technique to use for splitting text in Tango, so I used both the "Util.split" and "Util.delimit" functions.

I ran both tests using a 5MB text file, "The Complete Works of William Shakespeare", from the Project Gutenberg website:

http://www.gutenberg.org/dirs/etext94/shaks12.txt

You can grab it for yourself, or you can just run the code against your favorite large text file.

I compiled and ran the Java code in the 1.6.0_06 JDK, with the "-server" flag. The d code was compiled with DMD 1.034 and Tango 0.99.7, using the "-O -release -inline" flags.

My test machine is an AMD Turion 64 X2 dual-core laptop, with 2GB of RAM and running WinXP SP3.

I ran the tests eight times each, using fine-resolution timers. These are the median results:

LOADING THE FILE INTO A STRING:   D/Tango wins, by 428%
    D/Tango: 0.02960 seconds
    Java:    0.12675 seconds

ITERATING OVER CHARS IN A STRING:   Java wins, by 280%
    D/Tango:  0.10093 seconds
    Java:     0.03599 seconds

SEARCHING FOR A SUBSTRING:   D/Tango wins, by 218%
    D/Tango:  0.02251 seconds
    Java:     0.04915 seconds

SEARCH & REPLACE INTO A NEW STRING:   D/Tango wins, by 226%
    D/Tango:  0.17685 seconds
    Java:     0.39996 seconds

SPLIT A STRING ON WHITESPACE:
       Java wins, by 681% (against tango.text.Util.delimit())
       Java wins, by 313%  (against tango.text.Util.split())
    D/Tango (delimit): 8.28195 seconds
    D/Tango (split):   3.80465 seconds
    Java (split):      1.21477 seconds

CONCATENATING STRINGS:   Java wins, by 884%
    D/Tango (array concat, no pre-alloc):  4.07929 seconds
    Java (StringBuilder, no pre-alloc):    0.46150 seconds

SORT STRINGS (CASE-INSENSITIVE):   D/Tango wins, by 226%
    D/Tango:  1.62227 seconds
    Java:     3.66389 seconds

It looks like D mostly falls down when it has to allocate a lot of memory, even if it's just allocating slices. The D performance for string splitting really surprised me.

I was interested to see, though, that Java was so much faster at iterating through the characters in a string, since I used the charAt(i) method of the CharSequence interface, rather than directly iterating through a char[] array, or even calling the charAt method on the String instance.

And yet, character iteration is almost 3 times as fast as in D.

Down with premature optimization! Design the best interfaces possible, to enable the most pleasant and flexible programing idioms. The performance problems can be solved. :-P

--benji


August 26, 2008
superdan wrote:
> Benji Smith Wrote:
>> A char[] is actually an array of UTF-8 encoded octets, where each character may consume one or more consecutive elements of the array. Retrieving the str.length property may or may not tell you how many characters are in the string. And pretty much any code that tries to iterate character-by-character through the array elements is fundamentally broken.
> 
> try this:
> 
> foreach (dchar c; str)
> {
>     process c
> }

Cool. I had no idea that was possible. I was doing this:

  myFunction(T)(T[] array) {
    foreach(T c; array) {
      doStuff(c);
    }
  }

--benji
August 26, 2008
Walter Bright wrote:
> Benji Smith wrote:
>> You can't distribute your code as a DLL or a shared object, because the template instantiations won't be included (unless you create wrapper functions with explicit template instantiations, bloating your code size, but more importantly tripling the number of functions in your API).
> 
> Is the problem you're referring to the fact that there are 3 character types?

Basically, yeah.

With three different character types, and two different array types (static & dynamic). And in D2, with const, invariant, and mutable types (and soon with shared and unshared), the number of ways of representing a "string" in the type-system is overwhelming.

This afternoon, I was writing some string-processing code that I intend to distribute in a library, and I couldn't but help thinking to myself "This code is probably broken, for anything but the most squeaky-clean ASCII text".

I don't mind that there are different character types, or that there are different character encodings. But I want to deal with those issues in exactly *one* place: in my string constructor (and, very rarely, during IO). But 99% of the time, I want to just think of the object as a String, with all the ugly details abstracted away.

>> Another good low-hanging argument is that strings are frequently used as keys in associative arrays. Every insertion and retrieval in an associative array requires a hashcode computation. And since D strings are just dumb arrays, they have no way of memoizing their hashcodes.
> 
> True, but I've written a lot of string processing programs (compilers are just one example of such). This has never been an issue, because the AA itself memoizes the hash, and from then on the dictionary handle is used.

Cool. The hashcode-memoization thing was really just a catalyst to get me thinking. It's really at the periphery of my concerns with Strings.

>> We've already observed that D assoc arrays are less performant than even Python maps, so the extra cost of lookup operations is unwelcome.
> 
> Every one of those benchmarks that purported to show that D AA's were relatively slow turned out to be, on closer examination, D running the garbage collector more often than Python does. It had NOTHING to do with the AA's.

Ah. Good point. Thanks for clarifying. I didn't remember all the follow-up details.

>> Most importantly, the contract between the regex engine and its consumers would provide a well-defined interface for processing text, regardless of the source or representation of that text.
> 
> I think a better solution is for regexp to accept an Iterator as its source. That doesn't require polymorphic behavior via inheritance, it can do polymorphism by value (which is what templates do).

That's a great idea.

I should clarify that my referring to an "interface" was in the informal sense. (Though I think actual interfaces would be a reasonable solution.) But any sort of contract between text-data-structures and text-processing-routines would fit the bill nicely.

>> But then again, I haven't used any of the const functionality in D2, so I can't actually comment on relative usability of compiler-enforced immutability versus interface-enforced immutability.
> 
>  From my own experience, I didn't 'get' invariant strings until I'd used them for a while.

I actually kind of think I'm on the other side of the issue.

I've been primarily a Java programmer (8 years) and secondarily a C# programmer (3 years), so immutable Strings are the only thing I've ever used. Lots of the other JDK classes are like that, too.

So, from my perspective, it seems like the ideal, low-impact way of enforcing immutability is to have the classes enforce it on themselves. I've never felt the need for compiler-enforced const semantics in any of the work I've done.

Thanks for your replies! I always appreciate hearing from you.

--benji
August 26, 2008
"Walter Bright" <newshound1@digitalmars.com> wrote in message news:g90iia$2jc4$3@digitalmars.com...
> Yigal Chripun wrote:
>> a) people here said that a virtual call will make it slow. How much slow? how much of an overhead is it on modern hardware considering also that this is a place where hardware manufacturers spend time on optimizations?
>
> Virtual function calls have been a problem for hardware optimization. Direct function calls can be speculatively executed, but not virtual ones, because the hardware cannot predict where it will go. This means virtual calls can be much slower than direct function calls.

Modern x86 branch prediction treats indirect calls the same as conditional branches. They get a slot in the branch target buffer, so they do get speculatively executed. And if correctly predicted it's only a couple of cycles more costly direct calls.

See the thread "Feature Request: nontrivial functions and vtable optimizations" about 2 weeks ago.

I cited the technical docs and a few doubters ran benchmarks, which proved that virtual methods are not as evil as many people think. In fact they are no more evil than a conditional branch.





August 26, 2008
superdan wrote:
>> class LinkedList(T)
>> {
>> 	Node!(T) head;
>>
>> 	/** Gets the ith item of the list. Throws: SIGSEGV if i >= length of the list. Time complexity: O(N) for a list of length N. This operation is provided for completeness and not recommended for frequent use in large lists. */
>> 	T opIndex(int i)
>> 	{
>> 		auto current = head;
>> 		while (i)
>> 		{
>> 			current = current.next;
>> 		}
>> 		return current.value;
>> 	}
>> }
> 
> oopsies. houston we got a problem here. problem is all that pluggable sort business works only if it can count on a constant time opIndex. why? because sort has right in its spec that it takes o(n log n) time. if u pass LinkedList to it you obtain a nonsensical design that compiles but runs afoul of the spec. because with that opIndex sort will run in quadratic time and no amount of commentin' is gonna save it from that.

WRONG!
Those sorting algorithms are correct. Their runtime is now O(n^2 log n) for this linked list.

> this design is a stillborn.
> 
> what needs done is to allow different kinds of containers implement different interfaces. in fact a better way to factor things is via iterators, as stl has shown.

You didn't ask for an O(1) opIndex on a linked list. You asked for a correct opIndex on a linked list. Any sorting algorithm you name that would work on an array would also work on this linked list. Admittedly, insertion sort would be much faster than qsort, but if your library provides that, you, knowing that you are using a linked list, would choose the insertion sort algorithm.

>>> how do you make a hashtable, an array, and a linked list obey the same interface? guess hashtable has stuff that others don't support?
>> You have an interface for collections (you can add, remove, and get the length, maybe a couple other things).
> 
> this is an incomplete response. what do you add for a vector!(T)? A T. what do you add for a hash!(T, U)? you tell me. and you tell me how you make that signature consistent across vector and hash.

interface List(T) : Collection!(T) {}

class Vector(T) : List!(T) {}

class HashMap(T, U) : Collection!(KeyValuePair!(T, U)) {}

Have a look at C#'s collection classes. They solved this problem.

>> You have an interface for lists (they're collections, and you can index them).
> 
> wrong. don't ever mention a linear-time indexing operator in an interview. you will fail it right then. you can always define linear-time indexing as a named function. but never masquerade it as an index operator.

If you create a linked list with O(1) indexing, that might suffice to get you a PhD. If you claim that you can do so in an interview, you should be required to show proof; and should you fail to do so, you will probably be shown the door.

Even if you did prove it in the interview, they would probably consider you overqualified, unless the company's focus was data structures.

>> Then you can use all the collection-oriented stuff with lists, and you can do special list-type things with them if you want.
>>
>>> these are serious questions. not in jest not rhetorical and not trick.
>> Yes, but they're solved problems.
> 
> apparently not since you failed at'em.

You claimed that a problem with an inefficient solution has no solution, and then you execrate me for providing an inefficient solution. Why?

>>>> Someone else comes along and writes a library of algorithms. The algorithms can operate on any container that implements the necessary operations.
>>> hm. things are starting to screech a bit. but let's see your answers to your questions above.
>>>
>>>> When someone clever comes along and writes a new sorting algorithm, I can plug my new container class right into it, and get the algorithm for free. Likewise for the guy with the clever new collection class.
>>> things ain't that simple.
>> Collection-oriented library code will care sufficiently about performance that this mix-and-match stuff is not feasible.
> 
> what's that supposed to mean? you sayin' stl don't exist?

No, I'm saying that for efficiency, you need to know about the internals of a data structure to implement a number of collection-oriented algorithms. Just like the linked list example.

>> Almost anything else doesn't care enough to take only an AssociativeArrayHashSet and not a TreeSet or a LinkedList or a primitive array.
> 
> wrong for the reasons above.

They expose very similar interfaces. You might care about which you choose because of the efficiency of various operations, but most of your code won't care which type it gets; it would still be correct.

Well, sets have a property that no element appears in them twice, so that is an algorithmic consideration sometimes.

>>> saw this flick "the devil wears prada", an ok movie but one funny remark stayed with me. "you are in desperate need of chanel." 
>>>
>>> i'll paraphrase. "you are in desperate need of stl." you need to learn stl and then you'll figure why you can't plug a new sorting algorithm into a container. you need more guarantees. and you need iterators.
>>>
>>>> We don't bat an eye at the idea of containers & algorithms connecting to one another using a reciprocal set of interfaces.
>>> i do. it's completely wrong. you need iterators that broker between containers and algos. and iterators must give complexity guarantees.
>> I don't. If I'm just going to iterate through the items of a collection, I only care about the opApply. If I need to index stuff, I don't care if I get a primitive array or Bob Dole's brand-new ArrayList class.
> 
> you too are in desperate need for stl.

You are in desperate need of System.Collections.Generic. Or tango.util.container.
August 26, 2008
Benji Smith:
> It looks like D mostly falls down when it has to allocate a lot of memory, even if it's just allocating slices. The D performance for string splitting really surprised me.

String splitting requires lot of work for the GC. The GC of HotSpot is light years ahead of the current D GC. You can see that measuring just the time the D GC takes to deallocate a large array of the splitted substrings.
I have posted several benchmarks here about this topic.


> I was interested to see, though, that Java was so much faster at iterating through the characters in a string, since I used the charAt(i) method of the CharSequence interface, rather than directly iterating through a char[] array, or even calling the charAt method on the String instance.

HotSpot is able to inline lot of virtual methods too, D can't do those things.


> Down with premature optimization! Design the best interfaces possible, to enable the most pleasant and flexible programing idioms. The performance problems can be solved. :-P

They currently can't be solved by the backends of DMD and GDC, only HotSpot (and maybe the compiler of the dot net on windows) are able to do that. I don't know if LLVM will be able to perform some of those things.

Bye,
bearophile
August 26, 2008
Walter Bright:
> > We've already observed that D assoc arrays are less performant than even Python maps, so the extra cost of lookup operations is unwelcome.
> 
> Every one of those benchmarks that purported to show that D AA's were relatively slow turned out to be, on closer examination, D running the garbage collector more often than Python does. It had NOTHING to do with the AA's.

Really? I must have missed those conclusions then, despite reading all the posts on the subject. What solutions do you propose for the problem then? I recall that disabling the GC didn't improve the situation much. So the problem now becomes how to improve the D GC?

In my site I am keeping a gallery of tiny benchmarks where D code (with DMD) is 10 or more times slower than very equivalent Python, C, Java code (I have about 12 programs so far, very different from each other. There's a benchmark regarding the associative arrays too). Hopefully it will become useful once people will start tuning D implementations.

Bye,
bearophile