View mode: basic / threaded / horizontal-split · Log in · Help
August 26, 2008
Re: Why Strings as Classes?
BCS wrote:
> Reply to Benji,
> 
> 
>> The new JSON parser in the Tango library operates on templated string
>> arrays. If I want to read from a file or a socket, I have to first
>> slurp the whole thing into a character array, even though the
>> character-streaming would be more practical.
>>
> 
> Unless you are only going to parse the start of the file or are going to 
> be throwing away most of it *while you parse it, not after* The best way 
> to parse a file is to load it all in one OS system call and then run a 
> slicing parser (like the Tango XML parser) on that.
> One memory allocation and one load or a mmap, and then only the meta 
> structures get allocated later.

There are cases where you might want to parse an XML file that won't fit 
easily in main memory. I think a stream processing SAX parser would be a 
good addition (perhaps not replacement for) the exiting one.
August 26, 2008
Re: Why Strings as Classes?
Benji Smith Wrote:

> superdan wrote:
> > Benji Smith Wrote:
> > 
> >> BCS wrote:
> >>> Ditto, D is a *systems language* It's *supposed* to have access to the 
> >>> lowest level representation and build stuff on top of that
> >> But in this "systems language", it's a O(n) operation to get the nth 
> >> character from a string, to slice a string based on character offsets, 
> >> or to determine the number of characters in the string.
> >>
> >> I'd gladly pay the price of a single interface vtable lookup to turn all 
> >> of those into O(1) operations.
> > 
> > dood. i dunno where to start. allow me to answer from multiple angles.
> > 
> > 1. when was the last time looking up one char in a string or computing length was your bottleneck.
> > 
> > 2. you talk as if o(1) happens by magic that d currently disallows.
> > 
> > 3. maybe i don't want to blow the size of my string by a factor of 4 if i'm just interested in some occasional character search.
> > 
> > 4. implement all that nice stuff you wanna. nobody put a gun to yer head not to. understand you can't put a gun to my head to pay the price.
> 
> Geez, man, you just keep missing the point, over and over again.

relax. believe me i'm tryin', maybe you could put it a better way and meet me in the middle.

> Let me make one point, blisteringly clear: I don't give a shit about the 
>    data format. You want the fastest strings in the universe, 
> implemented with zero-byte magic beans and burned into the local ROM. 
> Fantastic! I'm completely in favor of it.

so far so good. 

> Presumably. people will be so into those strings that they'll write a 
> shitload of functionality for them. Parsing, searching, sorting, 
> indexing... the motherload.

cool.

> One day, I come along, and I'd like to perform some text processing. But 
> all of my string data comes from non-magic-beans data sources. I'd like 
> to implement a new kind of string class that supports my data. I'm not 
> going to push my super-slow string class on anybody else, because I know 
> how concerned with performance you are.

i'm in nirvana.

> But check this out... you can have your fast class, and I can have my 
> slow class, and they can both implement the same interface. Like this:
> 
> interface CharSequence {
>    int find(CharSequence needle);
>    int rfind(CharSequence needle);
>    // ...
> }
> 
> class ZeroByteFastMagicString : CharSequence {
>    // ...
> }
> 
> class SuperSlowStoneTabletString : CharSequence {
>    // ...
> }
> 
> Now we can both use the same string functions. Just by implementing an 
> interface, I can use the same text-processing as your 
> hyper-compiler-optimized builtin arrays.

but maestro. the interface call is already what's costing.

> But only if the interface exists.
> 
> And only if library authors write their text-processing code against 
> that interface.
> 
> That's the point.

then there was none. sorry.

> A good API allows multiple implementations to make use of the same 
> algorithms. Application authors can choose their own tradeoffs between 
> speed, memory consumption, and functionality.
> 
> A rigid builtin implementation, with no interface definition, locks 
> everybody into the same choices.

no. this is just wrong. perfectly backwards in fact. a low-level builtin allows unbounded architectures with control over efficiency.
August 26, 2008
Re: Why Strings as Classes?
Robert Fraser Wrote:

> Benji Smith wrote:
> > superdan wrote:
> >> Benji Smith Wrote:
> >>
> >>> BCS wrote:
> >>>> Ditto, D is a *systems language* It's *supposed* to have access to 
> >>>> the lowest level representation and build stuff on top of that
> >>> But in this "systems language", it's a O(n) operation to get the nth 
> >>> character from a string, to slice a string based on character 
> >>> offsets, or to determine the number of characters in the string.
> >>>
> >>> I'd gladly pay the price of a single interface vtable lookup to turn 
> >>> all of those into O(1) operations.
> >>
> >> dood. i dunno where to start. allow me to answer from multiple angles.
> >>
> >> 1. when was the last time looking up one char in a string or computing 
> >> length was your bottleneck.
> >>
> >> 2. you talk as if o(1) happens by magic that d currently disallows.
> >>
> >> 3. maybe i don't want to blow the size of my string by a factor of 4 
> >> if i'm just interested in some occasional character search.
> >>
> >> 4. implement all that nice stuff you wanna. nobody put a gun to yer 
> >> head not to. understand you can't put a gun to my head to pay the price.
> > 
> > Geez, man, you just keep missing the point, over and over again.
> > 
> > Let me make one point, blisteringly clear: I don't give a shit about the 
> >   data format. You want the fastest strings in the universe, implemented 
> > with zero-byte magic beans and burned into the local ROM. Fantastic! I'm 
> > completely in favor of it.
> > 
> > Presumably. people will be so into those strings that they'll write a 
> > shitload of functionality for them. Parsing, searching, sorting, 
> > indexing... the motherload.
> > 
> > One day, I come along, and I'd like to perform some text processing. But 
> > all of my string data comes from non-magic-beans data sources. I'd like 
> > to implement a new kind of string class that supports my data. I'm not 
> > going to push my super-slow string class on anybody else, because I know 
> > how concerned with performance you are.
> > 
> > But check this out... you can have your fast class, and I can have my 
> > slow class, and they can both implement the same interface. Like this:
> > 
> > interface CharSequence {
> >   int find(CharSequence needle);
> >   int rfind(CharSequence needle);
> >   // ...
> > }
> > 
> > class ZeroByteFastMagicString : CharSequence {
> >   // ...
> > }
> > 
> > class SuperSlowStoneTabletString : CharSequence {
> >   // ...
> > }
> > 
> > Now we can both use the same string functions. Just by implementing an 
> > interface, I can use the same text-processing as your 
> > hyper-compiler-optimized builtin arrays.
> > 
> > But only if the interface exists.
> > 
> > And only if library authors write their text-processing code against 
> > that interface.
> > 
> > That's the point.
> > 
> > A good API allows multiple implementations to make use of the same 
> > algorithms. Application authors can choose their own tradeoffs between 
> > speed, memory consumption, and functionality.
> > 
> > A rigid builtin implementation, with no interface definition, locks 
> > everybody into the same choices.
> > 
> > --benji
> 
> Superdan is confusing the issues here. The main argument against your 
> proposal (besides backwards compatibility, of course) is that every 
> access would require a virtual call, which can be fairly slow.

i'm not confusin'. mentioned the efficiency thing a number of times, didn't seem to phase him a bit. so i tried some more viewpoints.
August 26, 2008
Re: Why Strings as Classes?
BCS:
> If you must have that sort of interface, pick a different language, 
> because D isn't intended to work that way.

I suggest Benji to try C# 3+, despite all the problems it has and the borg-like nature of such software, etc, it will be used way more than D, and it has all the nice things Benji asks for.

Bye,
bearophile
August 26, 2008
Re: Why Strings as Classes?
superdan wrote:
> relax. believe me i'm tryin', maybe you could put it a better way and meet me in the middle.

Okay. I'll try :)

Think about a collection API.

The container classes are all written to satisfy a few basic primitive 
operations: you can get an item at a particular index, you can iterate 
in sequence (either forward or in reverse). You can insert items into a 
hashtable or retrieve them by key. And so on.

Someone else comes along and writes a library of algorithms. The 
algorithms can operate on any container that implements the necessary 
operations.

When someone clever comes along and writes a new sorting algorithm, I 
can plug my new container class right into it, and get the algorithm for 
free. Likewise for the guy with the clever new collection class.

We don't bat an eye at the idea of containers & algorithms connecting to 
one another using a reciprocal set of interfaces. In most cases, you get 
a performance **benefit** because you can mix and match the container 
and algorithm implementations that most suit your needs. You can design 
your own performance solution, rather than being stuck a single "low 
level" implementation that might be good for the general case but isn't 
ideal for your problem.

Over in another message BCS said he wants an array index to compile to 3 
ASM ops. Cool I'm all for it.

I don't know a whole lot about the STL, but my understanding is that 
most C++ compilers are smart enough that they can produce the same ASM 
from an iterator moving over a vector as incrementing a pointer over an 
array.

So the default implementation is damn fast.

But if someone else, with special design constraints, needs to implement 
a custom container template, it's no problem. As long as the container 
provides a function for getting iterators to the container elements, it 
can consume any of the STL algorithms too, even if the performance isn't 
as good as the performance for a vector.

There's no good reason the same technique couldn't provide both speed 
and API flexibility for text processing.

--benji
August 26, 2008
Re: Why Strings as Classes?
bearophile wrote:
> BCS:
>> If you must have that sort of interface, pick a different language, 
>> because D isn't intended to work that way.
> 
> I suggest Benji to try C# 3+, despite all the problems it has and the borg-like nature of such software, etc, it will be used way more than D, and it has all the nice things Benji asks for.
> 
> Bye,
> bearophile

Yep, I like C# a lot. I think it's very well-designed, with the language 
and libraries dovetailing nicely together.

I'm using D on my current project because I need to distribute libraries 
on both windows and linux, with C-linkage.

And D is a helluva lot more pleasant than C/C++, even if there is a lot 
about D that I find lacking.

--benji
August 26, 2008
Re: Why Strings as Classes?
"superdan" <super@dan.org> wrote in message 
news:g8vh9b$fko$1@digitalmars.com...
> Benji Smith Wrote:
>> No. Of course not. The compiler complains that you can't concatenate a
>> dchar to a char[] array. Even though the "find" functions indicate that
>> the array is truly a collection of dchar elements.
>
> that's a bug in the compiler. report it.

I did, a long time ago. #111 if I'm not mistaken.

L.
August 26, 2008
Re: Why Strings as Classes?
Benji Smith wrote:
> BCS wrote:
>> Reply to Benji,
>>
>>> BCS wrote:
>>>
>>>> Ditto, D is a *systems language* It's *supposed* to have access to
>>>> the lowest level representation and build stuff on top of that
>>>>
>>> But in this "systems language", it's a O(n) operation to get the nth
>>> character from a string, to slice a string based on character offsets,
>>> or to determine the number of characters in the string.
>>>
>>> I'd gladly pay the price of a single interface vtable lookup to turn
>>> all of those into O(1) operations.
>>>
>>> --benji
>>>
>>
>> Then borrow, buy, steal or build a class that does that /on top of the
>> D arrays/
>>
>> No one has said that this should not be available, just that it should
>> not /replace/ what is available
> 
> The point is that the new string class would be incompatible with the
> *hundreds* of existing functions that process character arrays.
> 
> Why don't strings qualify for polymorphism?

-------------------------------------------
wchar[] foo="text"w;

int indexOf(char[] str,char ch){
   foreach(int idx,char c;str)
       if(c==ch) return idx;
   return -1;
}

void main() {
   assert(indexOf(foo, 'x')==2);
}
-------------------------------------------

If that does compile, it shouldn't.  The best way to get that to work is
to use a template.  Templates can be annoying.  A String class could
simplify the different kinds of String inherent in D.  The String class
would (should) internally know what kind of String it is (wchar, char,
dchar) and to know how to mitigate those differences when operations are
called on it.

@Benji
If you want a String class, why don't you write one?  It's a fairly
simple task, even high-school CS students do it quite routinely in C++
(which is a lot more unwieldy for OOP than D is).

A very successful instance of Strings-as-objects is present in Java.
I'd suggest trying to duplicate that functionality.  Then you could
easily write wrappers on existing libraries to use the new String object.
August 26, 2008
Re: Why Strings as Classes?
On Mon, 25 Aug 2008 20:52:04 -0400, Benji Smith wrote:

> superdan wrote:
>>> But the "small components" are the *interfaces*, not the
>>> implementation details.
>> 
>> quite when i thought i drove a point home... dood we need to talk. you
>> have all core language, primitives, libraries, and app code confused.
> 
> The standard libraries are in a grey area between language the language
> spec and application code. There are all sorts implicit "interfaces" in
> exposed by the builtin types (and there's also plenty of core language
> functionality implemented in the standard lib... take the GC, for
> example).
> 
> You act there's no such thing as an interface for a builtin language
> feature.
> 
> With strings implemented as raw arrays, they take on the array API...
> 
> slicing: broken
> indexing: busted
> iterating: fucked
> length: you guessed it
> 
> I don't think the internals of the string representation should be any
> different. UTF-8 arrays? Fine by me. Just don't make me look at the
> malformed, mis-sliced bytes. Provide an API (yes, implemented in the
> standard lib, but specified by the language spec) that actually makes
> sense for text data.
> 
> (Incidentally, this is the same reason I think the builtin dynamic
> arrays should be classes implementing a standard List interface, and the
> associative arrays should be classes implementing a Map interface. The
> language implementations are nice, but they're not polymorphic, and that
> makes it a pain in the ass to extend them.)
> 
> --benji

On the language spec vs standard library. While the GC is implemented in 
the standard library, I do not believe the spec says it has to be (though 
I don't think it is possible otherwise). So the spec could state that 
strings should be implemented your way, but it shouldn't.

On another note. I must say this as been quite a turn around. There have 
been many posts in the past with people arguing over having a String 
class, I think they have been staying out. But none the less it is 
nothing new.
August 26, 2008
Re: Why Strings as Classes?
Benji Smith:
> Yep, I like C# a lot. I think it's very well-designed, with the language 
> and libraries dovetailing nicely together.

In the past I have said that C# 3.5/4 has some small ideas that D may enjoy copying. But probably having a complex coherent OOP structure from the bottom up isn't one of them. You must understand that D is lower level than C#, it means it's designed for people that like to suffer more :-) D is designed mostly for people coming from C and C++, and it must be fit to be used procedurally/functionally without any OOP too.

So D isn't C# and this means what you ask isn't much fit for it. Note that the situation isn't set in stone: time ago for example there was a person willing to program like in Python on the dot net platform, unhappy with C#. He has created the Boo language. It's not widespread, and it has few small design mistakes, but overall it's not a bad language, it's quite usable for its purposes. So you can create your language fit for your purposes... Do you know the Vala language? It looks like C#, but compiles to C... it's probably in beta stage still, but it may be closer to your dream language.

Another approach you may follow is to reinvent just the standard library/runtime of D to make it look more like the C# you like :-) Seeing it from outside, Tango too seems already closer to the Java std lib more than Phobos (but I may be wrong). I like Python, so I am writing a large lib that no one else uses that has partially the purpose of making D look like Python :-)

Bye,
bearophile
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home