View mode: basic / threaded / horizontal-split · Log in · Help
August 26, 2008
Re: Why Strings as Classes?
BCS wrote:
> Reply to Benji,
> 
>> BCS wrote:
>>
>>> Ditto, D is a *systems language* It's *supposed* to have access to
>>> the lowest level representation and build stuff on top of that
>>>
>> But in this "systems language", it's a O(n) operation to get the nth
>> character from a string, to slice a string based on character offsets,
>> or to determine the number of characters in the string.
>>
>> I'd gladly pay the price of a single interface vtable lookup to turn
>> all of those into O(1) operations.
>>
>> --benji
>>
> 
> Then borrow, buy, steal or build a class that does that /on top of the D 
> arrays/
> 
> No one has said that this should not be available, just that it should 
> not /replace/ what is available

The point is that the new string class would be incompatible with the 
*hundreds* of existing functions that process character arrays.

Why don't strings qualify for polymorphism?

Am I the only one who thinks the existing tradeoff is a fool's bargain?

--benji
August 26, 2008
Re: Why Strings as Classes?
Benji Smith Wrote:

> BCS wrote:
> > Ditto, D is a *systems language* It's *supposed* to have access to the 
> > lowest level representation and build stuff on top of that
> 
> But in this "systems language", it's a O(n) operation to get the nth 
> character from a string, to slice a string based on character offsets, 
> or to determine the number of characters in the string.
> 
> I'd gladly pay the price of a single interface vtable lookup to turn all 
> of those into O(1) operations.

dood. i dunno where to start. allow me to answer from multiple angles.

1. when was the last time looking up one char in a string or computing length was your bottleneck.

2. you talk as if o(1) happens by magic that d currently disallows.

3. maybe i don't want to blow the size of my string by a factor of 4 if i'm just interested in some occasional character search.

4. implement all that nice stuff you wanna. nobody put a gun to yer head not to. understand you can't put a gun to my head to pay the price.
August 26, 2008
Re: Why Strings as Classes?
Reply to Benji,

> BCS wrote:
> 
>> Reply to Benji,
>> 
>>> BCS wrote:
>>> 
>>>> Ditto, D is a *systems language* It's *supposed* to have access to
>>>> the lowest level representation and build stuff on top of that
>>>> 
>>> But in this "systems language", it's a O(n) operation to get the nth
>>> character from a string, to slice a string based on character
>>> offsets, or to determine the number of characters in the string.
>>> 
>>> I'd gladly pay the price of a single interface vtable lookup to turn
>>> all of those into O(1) operations.
>>> 
>>> --benji
>>> 
>> Then borrow, buy, steal or build a class that does that /on top of
>> the D arrays/
>> 
>> No one has said that this should not be available, just that it
>> should not /replace/ what is available
>> 
> The point is that the new string class would be incompatible with the
> *hundreds* of existing functions that process character arrays.
> 

That is an issue with (and *only* with) "the *hundreds* of existing functions 
that process character arrays".
August 26, 2008
Re: Why Strings as Classes?
superdan wrote:
>> But the "small components" are the *interfaces*, not the implementation 
>> details.
> 
> quite when i thought i drove a point home... dood we need to talk. you have all core language, primitives, libraries, and app code confused.

The standard libraries are in a grey area between language the language 
spec and application code. There are all sorts implicit "interfaces" in 
exposed by the builtin types (and there's also plenty of core language 
functionality implemented in the standard lib... take the GC, for example).

You act there's no such thing as an interface for a builtin language 
feature.

With strings implemented as raw arrays, they take on the array API...

slicing: broken
indexing: busted
iterating: fucked
length: you guessed it

I don't think the internals of the string representation should be any 
different. UTF-8 arrays? Fine by me. Just don't make me look at the 
malformed, mis-sliced bytes. Provide an API (yes, implemented in the 
standard lib, but specified by the language spec) that actually makes 
sense for text data.

(Incidentally, this is the same reason I think the builtin dynamic 
arrays should be classes implementing a standard List interface, and the 
associative arrays should be classes implementing a Map interface. The 
language implementations are nice, but they're not polymorphic, and that 
makes it a pain in the ass to extend them.)

--benji
August 26, 2008
Re: Why Strings as Classes?
Reply to Benji,


> The new JSON parser in the Tango library operates on templated string
> arrays. If I want to read from a file or a socket, I have to first
> slurp the whole thing into a character array, even though the
> character-streaming would be more practical.
> 

Unless you are only going to parse the start of the file or are going to 
be throwing away most of it *while you parse it, not after* The best way 
to parse a file is to load it all in one OS system call and then run a slicing 
parser (like the Tango XML parser) on that. 

One memory allocation and one load or a mmap, and then only the meta structures 
get allocated later.
August 26, 2008
Re: Why Strings as Classes?
superdan wrote:
> but if u have strings like today it's a no-brainer to define a class that does all that stuff. u can then use that class whenever you feel. it would be madness to put that class in the language definition. at best it's a candidate for the stdlib.

Instead, the runtime has to know how to convert between utf8, utf16, and 
utf32. Encodings are not a trivial matter, either.
August 26, 2008
Re: Why Strings as Classes?
Reply to Benji,

> slicing: broken

works as defined

> indexing: busted

works as defined

> iterating: fucked

works as defined and with foreach(dchar) as you want

> length: you guessed it

works as defined

> Provide an API (yes, implemented in the
> standard lib, but specified by the language spec) that actually makes
> sense for text data.

BTW everything in phobos under std.* is /not part of the D language spec/.

> 
> (Incidentally, this is the same reason I think the builtin dynamic
> arrays should be classes implementing a standard List interface, and
> the associative arrays should be classes implementing a Map interface.
> The language implementations are nice, but they're not polymorphic,
> and that makes it a pain in the ass to extend them.)
> 

A system language MUST have arrays that are not classes or anything near 
that thick. If you must have that sort of interface, pick a different language, 
because D isn't intended to work that way.

> --benji
>
August 26, 2008
Re: Why Strings as Classes?
superdan wrote:
> Benji Smith Wrote:
> 
>> BCS wrote:
>>> Ditto, D is a *systems language* It's *supposed* to have access to the 
>>> lowest level representation and build stuff on top of that
>> But in this "systems language", it's a O(n) operation to get the nth 
>> character from a string, to slice a string based on character offsets, 
>> or to determine the number of characters in the string.
>>
>> I'd gladly pay the price of a single interface vtable lookup to turn all 
>> of those into O(1) operations.
> 
> dood. i dunno where to start. allow me to answer from multiple angles.
> 
> 1. when was the last time looking up one char in a string or computing length was your bottleneck.
> 
> 2. you talk as if o(1) happens by magic that d currently disallows.
> 
> 3. maybe i don't want to blow the size of my string by a factor of 4 if i'm just interested in some occasional character search.
> 
> 4. implement all that nice stuff you wanna. nobody put a gun to yer head not to. understand you can't put a gun to my head to pay the price.

Geez, man, you just keep missing the point, over and over again.

Let me make one point, blisteringly clear: I don't give a shit about the 
  data format. You want the fastest strings in the universe, 
implemented with zero-byte magic beans and burned into the local ROM. 
Fantastic! I'm completely in favor of it.

Presumably. people will be so into those strings that they'll write a 
shitload of functionality for them. Parsing, searching, sorting, 
indexing... the motherload.

One day, I come along, and I'd like to perform some text processing. But 
all of my string data comes from non-magic-beans data sources. I'd like 
to implement a new kind of string class that supports my data. I'm not 
going to push my super-slow string class on anybody else, because I know 
how concerned with performance you are.

But check this out... you can have your fast class, and I can have my 
slow class, and they can both implement the same interface. Like this:

interface CharSequence {
  int find(CharSequence needle);
  int rfind(CharSequence needle);
  // ...
}

class ZeroByteFastMagicString : CharSequence {
  // ...
}

class SuperSlowStoneTabletString : CharSequence {
  // ...
}

Now we can both use the same string functions. Just by implementing an 
interface, I can use the same text-processing as your 
hyper-compiler-optimized builtin arrays.

But only if the interface exists.

And only if library authors write their text-processing code against 
that interface.

That's the point.

A good API allows multiple implementations to make use of the same 
algorithms. Application authors can choose their own tradeoffs between 
speed, memory consumption, and functionality.

A rigid builtin implementation, with no interface definition, locks 
everybody into the same choices.

--benji
August 26, 2008
Re: Why Strings as Classes?
Reply to Benji,

> But check this out... you can have your fast class, and I can have my
> slow class, and they can both implement the same interface. Like this:
> 

No, you can't.

The overhead needed to implement that is EXACTLY what we are unwilling to 
use.

I want an indexed array load

x = arr[i];

to be:

-- load arr.ptr into a reg
-- add i to that reg
-- indirect load that into i


3 ASM ops.

If you can get that and what you want, go get a PhD. You will have earned it.
August 26, 2008
Re: Why Strings as Classes?
Benji Smith wrote:
> superdan wrote:
>> Benji Smith Wrote:
>>
>>> BCS wrote:
>>>> Ditto, D is a *systems language* It's *supposed* to have access to 
>>>> the lowest level representation and build stuff on top of that
>>> But in this "systems language", it's a O(n) operation to get the nth 
>>> character from a string, to slice a string based on character 
>>> offsets, or to determine the number of characters in the string.
>>>
>>> I'd gladly pay the price of a single interface vtable lookup to turn 
>>> all of those into O(1) operations.
>>
>> dood. i dunno where to start. allow me to answer from multiple angles.
>>
>> 1. when was the last time looking up one char in a string or computing 
>> length was your bottleneck.
>>
>> 2. you talk as if o(1) happens by magic that d currently disallows.
>>
>> 3. maybe i don't want to blow the size of my string by a factor of 4 
>> if i'm just interested in some occasional character search.
>>
>> 4. implement all that nice stuff you wanna. nobody put a gun to yer 
>> head not to. understand you can't put a gun to my head to pay the price.
> 
> Geez, man, you just keep missing the point, over and over again.
> 
> Let me make one point, blisteringly clear: I don't give a shit about the 
>   data format. You want the fastest strings in the universe, implemented 
> with zero-byte magic beans and burned into the local ROM. Fantastic! I'm 
> completely in favor of it.
> 
> Presumably. people will be so into those strings that they'll write a 
> shitload of functionality for them. Parsing, searching, sorting, 
> indexing... the motherload.
> 
> One day, I come along, and I'd like to perform some text processing. But 
> all of my string data comes from non-magic-beans data sources. I'd like 
> to implement a new kind of string class that supports my data. I'm not 
> going to push my super-slow string class on anybody else, because I know 
> how concerned with performance you are.
> 
> But check this out... you can have your fast class, and I can have my 
> slow class, and they can both implement the same interface. Like this:
> 
> interface CharSequence {
>   int find(CharSequence needle);
>   int rfind(CharSequence needle);
>   // ...
> }
> 
> class ZeroByteFastMagicString : CharSequence {
>   // ...
> }
> 
> class SuperSlowStoneTabletString : CharSequence {
>   // ...
> }
> 
> Now we can both use the same string functions. Just by implementing an 
> interface, I can use the same text-processing as your 
> hyper-compiler-optimized builtin arrays.
> 
> But only if the interface exists.
> 
> And only if library authors write their text-processing code against 
> that interface.
> 
> That's the point.
> 
> A good API allows multiple implementations to make use of the same 
> algorithms. Application authors can choose their own tradeoffs between 
> speed, memory consumption, and functionality.
> 
> A rigid builtin implementation, with no interface definition, locks 
> everybody into the same choices.
> 
> --benji

Superdan is confusing the issues here. The main argument against your 
proposal (besides backwards compatibility, of course) is that every 
access would require a virtual call, which can be fairly slow.
1 2 3 4 5 6
Top | Discussion index | About this forum | D home