Jump to page: 1 221  
Page
Thread overview
Why Strings as Classes?
Aug 25, 2008
Benji Smith
Aug 25, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
Chris R. Miller
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
Robert Fraser
Aug 26, 2008
superdan
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
Christopher Wright
Aug 26, 2008
superdan
Aug 26, 2008
Christopher Wright
Aug 26, 2008
superdan
Aug 26, 2008
superdan
Aug 27, 2008
Chris R. Miller
Aug 27, 2008
superdan
Aug 28, 2008
Chris R. Miller
Aug 28, 2008
Dee Girl
Aug 28, 2008
Chris R. Miller
Aug 28, 2008
Don
Aug 27, 2008
Dee Girl
Aug 26, 2008
Robert Fraser
Aug 26, 2008
Lars Ivar Igesund
Aug 26, 2008
Robert Fraser
Aug 26, 2008
superdan
Aug 26, 2008
Robert Fraser
Aug 26, 2008
superdan
Aug 26, 2008
Denis Koroskin
Aug 26, 2008
Denis Koroskin
Aug 26, 2008
Benji Smith
Aug 26, 2008
Denis Koroskin
Aug 26, 2008
superdan
Aug 26, 2008
Nick Sabalausky
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 27, 2008
superdan
Aug 27, 2008
Nick Sabalausky
Aug 27, 2008
superdan
Aug 27, 2008
Nick Sabalausky
Aug 27, 2008
superdan
Aug 27, 2008
Dee Girl
Aug 27, 2008
Dee Girl
Aug 27, 2008
Bill Baxter
Aug 27, 2008
Dee Girl
Aug 27, 2008
Dee Girl
Aug 27, 2008
bearophile
Aug 27, 2008
superdan
Aug 27, 2008
Benji Smith
Aug 27, 2008
Dee Girl
Aug 27, 2008
Nick Sabalausky
Aug 27, 2008
superdan
Aug 27, 2008
Nick Sabalausky
Aug 27, 2008
Fawzi Mohamed
Aug 28, 2008
Nick Sabalausky
Aug 28, 2008
Nick Sabalausky
Aug 28, 2008
Nick Sabalausky
Aug 28, 2008
Dee Girl
Aug 28, 2008
Nick Sabalausky
Aug 29, 2008
Nick Sabalausky
Aug 28, 2008
Dee Girl
Aug 28, 2008
Fawzi Mohamed
Aug 27, 2008
Derek Parnell
Aug 27, 2008
Dee Girl
Aug 27, 2008
Dee Girl
Aug 28, 2008
Nick Sabalausky
Aug 28, 2008
Don
Aug 28, 2008
Nick Sabalausky
Aug 28, 2008
Don
Aug 28, 2008
Nick Sabalausky
Aug 29, 2008
Don
Aug 30, 2008
Christopher Wright
Aug 28, 2008
Robert Fraser
Aug 30, 2008
Walter Bright
Aug 30, 2008
bearophile
Aug 28, 2008
Dee Girl
Aug 27, 2008
Simen Kjaeraas
Aug 27, 2008
superdan
Aug 27, 2008
Robert Fraser
Aug 27, 2008
superdan
Aug 29, 2008
Sergey Gromov
Aug 26, 2008
Christopher Wright
Aug 26, 2008
superdan
Aug 27, 2008
Christopher Wright
Aug 27, 2008
superdan
Aug 26, 2008
Michel Fortin
Aug 26, 2008
superdan
Aug 26, 2008
Michel Fortin
Aug 26, 2008
Walter Bright
Aug 26, 2008
Walter Bright
Aug 26, 2008
Walter Bright
Aug 26, 2008
Lionello Lunesu
Aug 26, 2008
Benji Smith
Aug 25, 2008
superdan
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
bearophile
Aug 26, 2008
Benji Smith
Aug 26, 2008
bearophile
Aug 26, 2008
Nick Sabalausky
Aug 26, 2008
BCS
Aug 27, 2008
Christopher Wright
Aug 26, 2008
Jesse Phillips
Aug 26, 2008
BCS
Aug 26, 2008
Robert Fraser
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
superdan
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
BCS
Aug 26, 2008
superdan
Aug 26, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
Nick Sabalausky
Aug 27, 2008
superdan
Aug 26, 2008
BCS
Aug 26, 2008
Benji Smith
Aug 26, 2008
superdan
Aug 26, 2008
BCS
Aug 26, 2008
Robert Fraser
Aug 26, 2008
Christopher Wright
Aug 26, 2008
Benji Smith
Aug 26, 2008
bearophile
Aug 26, 2008
JAnderson
Aug 26, 2008
Walter Bright
Aug 26, 2008
Benji Smith
Aug 26, 2008
bearophile
Aug 26, 2008
bearophile
Aug 26, 2008
Walter Bright
Aug 27, 2008
bearophile
Aug 27, 2008
Walter Bright
Re: Prototypes (was: Why Strings as Classes?)
Aug 28, 2008
Fawzi Mohamed
Aug 28, 2008
Manfred_Nowak
Aug 29, 2008
Fawzi Mohamed
Aug 29, 2008
Manfred_Nowak
Re: Prototypes
Aug 29, 2008
Walter Bright
Aug 29, 2008
Manfred_Nowak
Aug 27, 2008
Manfred_Nowak
Aug 27, 2008
bearophile
Benchmark Design (was: Why Strings as Classes?)
Aug 28, 2008
Manfred_Nowak
Aug 28, 2008
bearophile
Aug 28, 2008
Manfred_Nowak
Aug 28, 2008
bearophile
Aug 27, 2008
Fawzi Mohamed
Aug 27, 2008
bearophile
Aug 27, 2008
Fawzi Mohamed
Aug 27, 2008
bearophile
Aug 27, 2008
Fawzi Mohamed
Aug 26, 2008
Yigal Chripun
Aug 26, 2008
Walter Bright
Aug 26, 2008
Jb
Aug 26, 2008
superdan
Aug 26, 2008
superdan
Aug 26, 2008
Jb
Aug 26, 2008
Walter Bright
Aug 26, 2008
bearophile
Aug 26, 2008
Jb
Aug 26, 2008
Jb
Aug 27, 2008
JAnderson
Aug 26, 2008
JAnderson
Aug 26, 2008
Nick Sabalausky
Aug 26, 2008
Jb
Aug 26, 2008
Benji Smith
Aug 27, 2008
Michel Fortin
Re: Why Strings as Classes? [C++ iterator]
Aug 28, 2008
Fawzi Mohamed
Aug 28, 2008
Fawzi Mohamed
Aug 28, 2008
Michel Fortin
Aug 28, 2008
Fawzi Mohamed
Aug 28, 2008
David Wilson
Aug 28, 2008
Fawzi Mohamed
Aug 28, 2008
Denis Koroskin
August 25, 2008
In another thread (about array append performance) I mentioned that Strings ought to be implemented as classes rather than as simple builtin
arrays. Superdan asked why. Here's my response...

I'll start with a few of the softball, easy reasons.

For starters, with strings implemented as character arrays, writing library code that accepts and operates on strings is a bit of a pain in the neck, since you always have to write templates and template code is slightly less readable than non-template code. You can't distribute your code as a DLL or a shared object, because the template instantiations won't be included (unless you create wrapper functions with explicit template instantiations, bloating your code size, but more importantly tripling the number of functions in your API).

Another good low-hanging argument is that strings are frequently used as keys in associative arrays. Every insertion and retrieval in an associative array requires a hashcode computation. And since D strings are just dumb arrays, they have no way of memoizing their hashcodes. We've already observed that D assoc arrays are less performant than even Python maps, so the extra cost of lookup operations is unwelcome.

But much more important than either of those reasons is the lack of polymorphism on character arrays. Arrays can't have subclasses, and they can't implement interfaces.

A good example of what I'm talking about can be seen in the Phobos and Tango regular expression engines. At least the Tango implementation matches against all string types (the Phobos one only works with char[] strings).

But what if I want to consume a 100 MB logfile, counting all lines that match a pattern?

Right now, to use the either regex engine, I have to read the entire logfile into an enormous array before invoking the regex search function.

Instead, what if there was a CharacterStream interface? And what if all the text-handling code in Phobos & Tango was written to consume and return instances of that interface?

A regex engine accepting a CharacterStream interface could process text from string literals, file input streams, socket input streams, database records, etc, etc, etc... without having to pollute the API with a bunch of casts, copies, and conversions. And my logfile processing application would consume only a tiny fraction of the memory needed by the character array implementation.

Most importantly, the contract between the regex engine and its consumers would provide a well-defined interface for processing text, regardless of the source or representation of that text.

Along a similar vein, I've worked on a lot of parsers over the past few years, for domain specific languages and templating engines, and stuff like that. Sometimes it'd be very handy to define a "Token" class that behaves exactly like a String, but with some additional behavior. Ideally, I'd like to implement that Token class as an implementor of the CharacterStream interface, so that it can be passed directly into other text-handling functions.

But, in D, with no polymorphic text handling, I can't do that.

As one final thought... I suspect that mutable/const/invariant string handling would be much more conveniently implemented with a MutableCharacterStream interface (as an extended interface of CharacterStream).

Any function written to accept a CharacterStream would automatically accept a MutableCharacterStream, thanks to interface polymorphism, without any casts, conversions, or copies. And various implementors of the interface could provide buffered implementations operating on in-memory strings, file data, or network data.

Coding against the CharacterStream interface, library authors wouldn't need to worry about const-correctness, since the interface wouldn't provide any mutator methods.

But then again, I haven't used any of the const functionality in D2, so I can't actually comment on relative usability of compiler-enforced immutability versus interface-enforced immutability.

Anyhow, those are some of my thoughts... I think there are a lot of compelling reasons for de-coupling the specification of string handling functionality from the implementation of that functionality, primarily for enabling polymorphic text-processing.

But memoized hashcodes would be cool too :-)

--benji
August 25, 2008
Oh, man, I forgot one other thing... And it's a biggie...

The D _Arrays_ page says that "A string is an array of characters. String literals are just an easy way to write character arrays."

http://digitalmars.com/d/1.0/arrays.html

In my previous post, I also use the "character array" terminology.

Unfortunately, though, it's just not true.

A char[] is actually an array of UTF-8 encoded octets, where each character may consume one or more consecutive elements of the array. Retrieving the str.length property may or may not tell you how many characters are in the string. And pretty much any code that tries to iterate character-by-character through the array elements is fundamentally broken.

Take a look at this code, for example:

------------------------------------------------------------------
import tango.io.Stdout;

void main() {

   // Create a string with UTF-8 content
   char[] str = "mötley crüe";
   Stdout.formatln("full string value: {}", str);

   Stdout.formatln("len: {}", str.length);
   // --> "len: 13" ... but there are only 11 characters!

   Stdout.formatln("2nd char: '{}'", str[1]);
   // --> "2nd char: ''" ... where'd my character go?

   Stdout.formatln("first 3 chars: '{}'", str[0..3]);
   // --> "first 3 chars: 'mö'" ... why only 2?

   char o_umlat = 'ö';
   Stdout.formatln("char value: '{}'", o_umlat);
   // --> "char value: ''" ... where's my char?

}
------------------------------------------------------------------

So you can't actually iterate the the char elements of a char[] without risking that you'll turn your string data into garbage. And you can't trust that the length property tells you how many characters there are. And you can't trust that an index or a slice will return valid data.

Also: take a look at the Phobos string "find" functions:

  int find(char[] s, dchar c);
  int ifind(char[] s, dchar c);
  int rfind(char[] s, dchar c);
  int irfind(char[] s, dchar c);

Huh?

To find a character in a char[] array, you have to use a dchar?

To me, that's like looking for a long within an int[] array.

So.. If a char[] actually consists of dchar elements, does that mean I can append a dchar to a char[] array?

  dchar u_umlat = 'ü';
  char[] newString = "mötley crüe" ~ u_umlat;

No. Of course not. The compiler complains that you can't concatenate a dchar to a char[] array. Even though the "find" functions indicate that the array is truly a collection of dchar elements.

Now, don't get me wrong. I understand why the string is encoded as UTF-8. And I understand that the encoding prevents accurate element iteration, indexing, slicing, and all the other nice array goodies.

The existing D string implementation is exactly what I'd expect to see inside the guts of a string class, because encodings are important and efficiency is important. But those implementation details shouldn't be exposed through a public API.

To claim that D strings are actually usable as character arrays is more than a little spurious, since direct access of the array elements can return fragmented garbage bytes.

If accurate string manipulation is impossible without a set of special-purpose functions, then I'll argue that the implementation is already equivalent to that of a class, but without any of the niceties of encapsulation and polymorphism.

--benji
August 25, 2008
Benji Smith Wrote:

> In another thread (about array append performance) I mentioned that Strings ought to be implemented as classes rather than as simple builtin arrays. Superdan asked why. Here's my response...

well then allow me to retort.

> I'll start with a few of the softball, easy reasons.
> 
> For starters, with strings implemented as character arrays, writing library code that accepts and operates on strings is a bit of a pain in the neck, since you always have to write templates and template code is slightly less readable than non-template code. You can't distribute your code as a DLL or a shared object, because the template instantiations won't be included (unless you create wrapper functions with explicit template instantiations, bloating your code size, but more importantly tripling the number of functions in your API).

so u mean with a class the encoding char/wchar/dchar won't be an issue anymore. that would be hidden behind the wraps. cool.

problem is that means there's an indirection cost for every character access. oops. so then apps that decided to use some particular encoding consistently must pay a price for stuff they don't use.

but if u have strings like today it's a no-brainer to define a class that does all that stuff. u can then use that class whenever you feel. it would be madness to put that class in the language definition. at best it's a candidate for the stdlib.

so that low-hangin' argument of yers ain't that low-hangin' after all. unless u call a hanged deadman low-hangin'.

> Another good low-hanging argument is that strings are frequently used as keys in associative arrays. Every insertion and retrieval in an associative array requires a hashcode computation. And since D strings are just dumb arrays, they have no way of memoizing their hashcodes. We've already observed that D assoc arrays are less performant than even Python maps, so the extra cost of lookup operations is unwelcome.

again you want to build larger components from smaller components. you can build a string with memoized hashcode from a string without memoized hashcode. but you can't build a string without memoized hashcode from a string with memoized hashcode. but wait there's more. the extra field is paid for regardless. so what numbers do you have to back up your assertion that it's worth paying that cost for everything except hashtables.

> But much more important than either of those reasons is the lack of polymorphism on character arrays. Arrays can't have subclasses, and they can't implement interfaces.

that's why you can always define a class that does all those good things.

by the same arg why isn't int a class. the point is you can always create class Int that does what an int does, slower but more flexible. if all you had was class Int you'd be in slowland.

> A good example of what I'm talking about can be seen in the Phobos and Tango regular expression engines. At least the Tango implementation matches against all string types (the Phobos one only works with char[] strings).
> 
> But what if I want to consume a 100 MB logfile, counting all lines that match a pattern?
>
> Right now, to use the either regex engine, I have to read the entire logfile into an enormous array before invoking the regex search function.
> 
> Instead, what if there was a CharacterStream interface? And what if all the text-handling code in Phobos & Tango was written to consume and return instances of that interface?

what exactly is the problem there aside from a library issue.

> A regex engine accepting a CharacterStream interface could process text from string literals, file input streams, socket input streams, database records, etc, etc, etc... without having to pollute the API with a bunch of casts, copies, and conversions. And my logfile processing application would consume only a tiny fraction of the memory needed by the character array implementation.

library problem. or maybe you want to build character stream into the language too.

> Most importantly, the contract between the regex engine and its consumers would provide a well-defined interface for processing text, regardless of the source or representation of that text.
> 
> Along a similar vein, I've worked on a lot of parsers over the past few years, for domain specific languages and templating engines, and stuff like that. Sometimes it'd be very handy to define a "Token" class that behaves exactly like a String, but with some additional behavior. Ideally, I'd like to implement that Token class as an implementor of the CharacterStream interface, so that it can be passed directly into other text-handling functions.
> 
> But, in D, with no polymorphic text handling, I can't do that.

of course you can. you just don't want to for the sake of building a fragile argument.

> As one final thought... I suspect that mutable/const/invariant string handling would be much more conveniently implemented with a MutableCharacterStream interface (as an extended interface of CharacterStream).
> 
> Any function written to accept a CharacterStream would automatically accept a MutableCharacterStream, thanks to interface polymorphism, without any casts, conversions, or copies. And various implementors of the interface could provide buffered implementations operating on in-memory strings, file data, or network data.
> 
> Coding against the CharacterStream interface, library authors wouldn't need to worry about const-correctness, since the interface wouldn't provide any mutator methods.

sounds great. so then go ahead and make the characterstream thingie. the language gives u everything u need to make it clean and fast.

> But then again, I haven't used any of the const functionality in D2, so I can't actually comment on relative usability of compiler-enforced immutability versus interface-enforced immutability.
> 
> Anyhow, those are some of my thoughts... I think there are a lot of compelling reasons for de-coupling the specification of string handling functionality from the implementation of that functionality, primarily for enabling polymorphic text-processing.
> 
> But memoized hashcodes would be cool too :-)

sorry dood each and every argument talks straight against your case. if i had any doubts, you just convinced me that a builtin string class would be a mistake.
August 26, 2008
Benji Smith Wrote:

> Oh, man, I forgot one other thing... And it's a biggie...
> 
> The D _Arrays_ page says that "A string is an array of characters. String literals are just an easy way to write character arrays."
> 
> http://digitalmars.com/d/1.0/arrays.html
> 
> In my previous post, I also use the "character array" terminology.
> 
> Unfortunately, though, it's just not true.
> 
> A char[] is actually an array of UTF-8 encoded octets, where each character may consume one or more consecutive elements of the array. Retrieving the str.length property may or may not tell you how many characters are in the string. And pretty much any code that tries to iterate character-by-character through the array elements is fundamentally broken.

try this:

foreach (dchar c; str)
{
    process c
}

> Take a look at this code, for example:
> 
> ------------------------------------------------------------------
> import tango.io.Stdout;
> 
> void main() {
> 
>     // Create a string with UTF-8 content
>     char[] str = "mötley crüe";
>     Stdout.formatln("full string value: {}", str);
> 
>     Stdout.formatln("len: {}", str.length);
>     // --> "len: 13" ... but there are only 11 characters!
> 
>     Stdout.formatln("2nd char: '{}'", str[1]);
>     // --> "2nd char: ''" ... where'd my character go?
> 
>     Stdout.formatln("first 3 chars: '{}'", str[0..3]);
>     // --> "first 3 chars: 'mö'" ... why only 2?
> 
>     char o_umlat = 'ö';
>     Stdout.formatln("char value: '{}'", o_umlat);
>     // --> "char value: ''" ... where's my char?
> 
> }
> ------------------------------------------------------------------
> 
> So you can't actually iterate the the char elements of a char[] without risking that you'll turn your string data into garbage. And you can't trust that the length property tells you how many characters there are. And you can't trust that an index or a slice will return valid data.

you can iterate with foreach or lib functions. an index or slice won't return valid data indeed, but it couldn't anyway. there's no o(1) indexing into a string unless it's utf32.

> Also: take a look at the Phobos string "find" functions:
> 
>    int find(char[] s, dchar c);
>    int ifind(char[] s, dchar c);
>    int rfind(char[] s, dchar c);
>    int irfind(char[] s, dchar c);
> 
> Huh?
> 
> To find a character in a char[] array, you have to use a dchar?
> 
> To me, that's like looking for a long within an int[] array.

because you're wrong. you look for a dchar which can represent all characters in an array of a given encoding. the comparison is off.

> So.. If a char[] actually consists of dchar elements, does that mean I can append a dchar to a char[] array?
> 
>    dchar u_umlat = 'ü';
>    char[] newString = "mötley crüe" ~ u_umlat;
> 
> No. Of course not. The compiler complains that you can't concatenate a dchar to a char[] array. Even though the "find" functions indicate that the array is truly a collection of dchar elements.

that's a bug in the compiler. report it.

> Now, don't get me wrong. I understand why the string is encoded as UTF-8. And I understand that the encoding prevents accurate element iteration, indexing, slicing, and all the other nice array goodies.

i know you understand. you should also understand

> The existing D string implementation is exactly what I'd expect to see inside the guts of a string class, because encodings are important and efficiency is important. But those implementation details shouldn't be exposed through a public API.

exactly at this point your argument kinda explodes. yes, you should see that stuff inside the guts of a string. which means builtin strings should be just arrays that you build larger stuff from. but wait. that's exactly what happens right now.

> To claim that D strings are actually usable as character arrays is more than a little spurious, since direct access of the array elements can return fragmented garbage bytes.

agreed.

> If accurate string manipulation is impossible without a set of special-purpose functions, then I'll argue that the implementation is already equivalent to that of a class, but without any of the niceties of encapsulation and polymorphism.

and without the disadvantages.
August 26, 2008
Reply to superdan,

> sorry dood each and every argument talks straight against your case.
> if i had any doubts, you just convinced me that a builtin string class
> would be a mistake.
> 

OTOH as standard lib string class...


August 26, 2008
Reply to superdan,

>> The existing D string implementation is exactly what I'd expect to see
>> inside the guts of a string class, because encodings are important and
>> efficiency is important. But those implementation details shouldn't be
>> exposed through a public API.
> 
> exactly at this point your argument kinda explodes. yes, you should
> see that stuff inside the guts of a string. which means builtin
> strings should be just arrays that you build larger stuff from. but
> wait. that's exactly what happens right now.
> 

Ditto, D is a *systems language* It's *supposed* to have access to the lowest level representation and build stuff on top of that


August 26, 2008
Sorry, dan. You're wrong.

superdan wrote:
> again you want to build larger components from smaller components.

Good APIs define public interfaces and hide implementation details, usually providing a default general-purpose implementation while allowing third-parties to define special-purpose implementations to suit their needs.

In D, the text handling is defined by the data format (unicode byte sequences) rather than by the interface, while providing no polymorphic mechanism for alternate implementations.

It's the opposite of good API design.

The regular expression engine accepts character arrays. And that's it. You want to match on a regex pattern, character-by-character, from an input stream? Tough nuts. It's not possible.

The new JSON parser in the Tango library operates on templated string arrays. If I want to read from a file or a socket, I have to first slurp the whole thing into a character array, even though the character-streaming would be more practical.

Parsers, formatters, console IO, socket IO... Anything that provides an iterable sequence of characters ought to comply with an interface facilitating polymorphic text processing. In some cases, there might be a slight memory/speed tradeoff. But in many more cases, the benefit of iterating over the transient characters in a stream would be much faster and require a tiny fraction of the memory of the character array.

There are performance benefits to be found on both sides of the coin.

Anyhow, I totally agree with you when you say that "larger components" should be built from "smaller components".

But the "small components" are the *interfaces*, not the implementation details.

--benji
August 26, 2008
BCS wrote:
> Ditto, D is a *systems language* It's *supposed* to have access to the lowest level representation and build stuff on top of that

But in this "systems language", it's a O(n) operation to get the nth character from a string, to slice a string based on character offsets, or to determine the number of characters in the string.

I'd gladly pay the price of a single interface vtable lookup to turn all of those into O(1) operations.

--benji
August 26, 2008
Benji Smith Wrote:

> Sorry, dan. You're wrong.

well just sayin' it ain't makin' it. hope there's some in the way of a proof henceforth.

> superdan wrote:
> > again you want to build larger components from smaller components.
> 
> Good APIs define public interfaces and hide implementation details, usually providing a default general-purpose implementation while allowing third-parties to define special-purpose implementations to suit their needs.

sure thing. all this gave me a warm fuzzy feelin' right there.

> In D, the text handling is defined by the data format (unicode byte sequences) rather than by the interface, while providing no polymorphic mechanism for alternate implementations.
> 
> It's the opposite of good API design.

wait a minute. you got terms all mixed up. and that's a recurring problem that mashes your argument badly. first you say `in d'. so then i assume there's a problem in the language. but then you go on with what looks like a library thing. sure enuff you end with `api' which is 100% a library thingy.

so if you have any beef say `in phobos' or `in tango'. `in d' must refer to the language proper.

the core lang is supposed to give you the necessary stuff to do that nice api design that gave me half an erection in ur first paragraph. if the language gives me the classes and apis and stuff then things will be slow for everyone and no viagra and no herbal supplement ain't gonna make stuff hard around here.

i think d strings are the cleverest thing yet. not too low level like pointers. not too high level like classes and stuff. just the right size. pun not intended.

> The regular expression engine accepts character arrays. And that's it. You want to match on a regex pattern, character-by-character, from an input stream? Tough nuts. It's not possible.

there are multiple problems with this argument of yours. first it's that you swap horses in the midstream. that, according to the cowboy proverb, is unrecommended. you switch from strings to streams. don't.

about streams. same in perl. it's a classic problem. you must first read as much text as you know could match. then you match. and nobody complains.

what you say is possible. it hasn't been written that way 'cause it's damn hard. motherintercoursin' hard. regexes must backtrack and when they do they need pronto access to the stuff behind. if that's in a file, yeah `tough nuts' eh.

> The new JSON parser in the Tango library operates on templated string arrays. If I want to read from a file or a socket, I have to first slurp the whole thing into a character array, even though the character-streaming would be more practical.

non seqitur. see stuff above.

> Parsers, formatters, console IO, socket IO... Anything that provides an iterable sequence of characters ought to comply with an interface facilitating polymorphic text processing. In some cases, there might be a slight memory/speed tradeoff. But in many more cases, the benefit of iterating over the transient characters in a stream would be much faster and require a tiny fraction of the memory of the character array.

so what u want is a streaming interface and an implementation of it that spans a string. for the mother of god i can't figure what this has to do with the language, and what stops you from writing it.

> There are performance benefits to be found on both sides of the coin.

no. all benefits only of my side of the coin. my side of the coin includes your side of the coin.

> Anyhow, I totally agree with you when you say that "larger components" should be built from "smaller components".

cool.

> But the "small components" are the *interfaces*, not the implementation details.

quite when i thought i drove a point home... dood we need to talk. you have all core language, primitives, libraries, and app code confused.
August 26, 2008
Reply to Benji,

> BCS wrote:
> 
>> Ditto, D is a *systems language* It's *supposed* to have access to
>> the lowest level representation and build stuff on top of that
>> 
> But in this "systems language", it's a O(n) operation to get the nth
> character from a string, to slice a string based on character offsets,
> or to determine the number of characters in the string.
> 
> I'd gladly pay the price of a single interface vtable lookup to turn
> all of those into O(1) operations.
> 
> --benji
> 

Then borrow, buy, steal or build a class that does that /on top of the D arrays/

No one has said that this should not be available, just that it should not /replace/ what is available


« First   ‹ Prev
1 2 3 4 5 6 7 8 9 10 11