String theory in D (page 3) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » String theory in D (page 3)

October 28, 2004

Re: String theory in D

Posted by Glen Perkins
in reply to Walter

Glen Perkins

Posted in reply to Walter

"Walter" <newshound@digitalmars.com> wrote in message news:clptlo$161f$1@digitaldaemon.com...

> ...[UTF-8 indexing issue that I don't want to waste your time with]...


> As for a single string type, there is no answer for that. Each has
> significant tradeoffs. For a speed oriented language, the choice needs to be
> under the control of the application programmer, not the language.

I agree. I think there should be a standard string class for default use plus a selection of byte array forms (e.g. char[], wchar[], dchar[]) for use anywhere that the programmer determined that their use instead of the default improved the app. The choice would be completely under the control of the programmer. The encoding of the default string class would be up to the implementors to optimize for the platform so that for the great majority of text operations in an app, the default string would work so well that replacing it with one of the byte array forms would be found to have no positive impact on the app. However, anytime the programmer encountered a situation where use of a byte array type improved the app, he could use it.

With this approach, you could have code with the same performance as under the current system, because anytime it was slower you could just use the current system. However, having a good default string as well, used by most apps on most platforms by most people most of the time, would simplify designs, porting, maintenance, programmer productivity, etc.

> The three
> types are readilly convertible into each other.

In fact, all four types would be readily convertible, though by having one that was almost always the best choice, regardless of platform, you would be able to avoid many unecessary conversions that could easily de-optimize your code as you added libraries and ported your app to other platforms. Also by matching the implementation of that default to the preferred form of the local OS APIs, conversions between the default string class and the OS API format could probably be compiled down to very lightweight object code on any platform, from the same source code.

> I don't really see the need
> for application programmers to layer on more string types.

I do, and apparently I'm not alone. People seem to mention it a lot in this newsgroup (I've discovered), just as they did for so long with C++. If I want char[] on Linux and wchar[] on Windows and want to avoid the nightmare of maintaining parallel but subtly different code, I need to create my own type using the "alias" feature. The authors of a couple of libraries I'll use will do likewise, but with their own type names and maybe different alias resolution rules. I expect some people will solve it with string classes. Stroustrup took a similar position about the need for programmers to optimize their strings long enough that every C++ library and API created its own string type. He once stated in a meeting I attended that his greatest regret about C++ was waiting so long to have a standard library and that the most requested feature of that library had been a string class. By adding just one more standard string type that would be a good default on every platform, I think you could eliminate the need so many people will feel to create their own and prevent string types from multiplying like bunnies, as happened to C++.

Loss of performance isn't the only thing programmers want, even from a high performance language. They'd also like to avoid unnecessary complexity, avoid bugs, reuse other people's code, target multiple platforms with mostly the same source, and so on. I think having a single, good default string type could be very helpful for these things without having to harm performance.

Even so, I realize that my opinion may be based on incorrect assumptions, missing information, faulty logic, selective memory, or peculiar personal preferences, so I may be wrong. If so, though, I'd be curious to know why.

October 28, 2004

Re: String theory in D

Posted by Anders F Björklund
in reply to Glen Perkins

Anders F Björklund

Posted in reply to Glen Perkins

Glen Perkins wrote:

>> As for a single string type, there is no answer for that. Each has
>> significant tradeoffs. For a speed oriented language, the choice needs to be under the control of the application programmer, not the language.
> 
> I agree. I think there should be a standard string class for default use plus a selection of byte array forms (e.g. char[], wchar[], dchar[]) for use anywhere that the programmer determined that their use instead of the default improved the app.

I don't have a problem with a standard String *class* present in D,
as long as I don't *have* to use it (and OOP) - like I do in Java...

The beauty about D's string types (char[] and wchar[]) is that they
work for plain old procedural C-style programs too, not just objects ?

--anders

October 28, 2004

Re: String theory in D

Posted by Ben Hinkle
in reply to Glen Perkins

Ben Hinkle

Posted in reply to Glen Perkins

> I need to create my own type using the "alias" feature. The authors of a couple of libraries I'll use will do likewise, but with their own type names and maybe different alias resolution rules.

Technically an alias introduces a new symbol. It's like a #define. It doesn't actually introduce a new type (see typedef). For example, the following doesn't compile:

alias int foo;
void bar(int y) {}
void bar(foo y) {}

int main() {
  bar(0);
  return 0;
}

compiling results in: "function bar overloads void(int y) and void(int y)
both match argument list for bar"

Redefining an alias is ignored (well, it is very useful for overloading functions but not for basic types). For example:

alias int foo;
alias long foo;
void bar(int y) {printf("int\n");}
void bar(long y) {printf("long\n");}
int main() {
  foo x;
  bar(x);
  return 0;
}

prints "int". So defining multiple aliases for strings or any other type is a pretty harmless thing to do. It should only effect the readability and maintainability of the code.

October 28, 2004

Re: String theory in D

Posted by Sean Kelly
in reply to Glen Perkins

Sean Kelly

Posted in reply to Glen Perkins

In article <clq9a8$1jkb$1@digitaldaemon.com>, Glen Perkins says...
>
>> I don't really see the need
>> for application programmers to layer on more string types.
>
>I do, and apparently I'm not alone. People seem to mention it a lot in this newsgroup (I've discovered), just as they did for so long with C++. If I want char[] on Linux and wchar[] on Windows and want to avoid the nightmare of maintaining parallel but subtly different code, I need to create my own type using the "alias" feature.

Out of curiosity, why would you want to use different char types internally in the same application depending on platform?  At worst I would think that the i/o might translate to different encodings but the internal code would use some normalized form regardless of platform.

>The authors of a couple of libraries I'll use will do likewise, but with their own type names and maybe different alias resolution rules. I expect some people will solve it with string classes.

They are certainly welcome to, but I'm not sure I see a need for a standard string class.  The built-ins plus support functions should be quite sufficient.

>Stroustrup took a similar position about the need for programmers to optimize their strings long enough that every C++ library and API created its own string type. He once stated in a meeting I attended that his greatest regret about C++ was waiting so long to have a standard library and that the most requested feature of that library had been a string class.

But D has string support while early C++ did not.  In fact the current C++ string type is basically just a vector with some helper functions tacked on, and those functions could just as easily have been implemented separate from the string class (as is becoming popular in these days of generic programming).

>By adding just one more standard string type that would be a good default on every platform, I think you could eliminate the need so many people will feel to create their own and prevent string types from multiplying like bunnies, as happened to C++.

I think people feel the need for a string class for familiarity rather than for need.  While dealing with multibyte encodings can be a tad odd at first, foreach and slices make things quite painless.

>Even so, I realize that my opinion may be based on incorrect assumptions, missing information, faulty logic, selective memory, or peculiar personal preferences, so I may be wrong. If so, though, I'd be curious to know why.

I haven't seen a good argument *for* a string class yet, but I could certainly be swayed if one were provided.  What is the advantage over the built-ins?  Is this purely a desire to create a standard implementation because we know that people are going to try to roll their own?


Sean

October 28, 2004

Re: String theory in D

Posted by Ben Hinkle
in reply to Regan Heath

Ben Hinkle

Posted in reply to Regan Heath

> D:\D\src\temp>dmd string2.d -O -release -inline d:\d\dmd\bin\..\..\dm\bin\link.exe string2,,,user32+kernel32/noi;
>
> D:\D\src\temp>string2
> compile time 219
> run time 1156
> template 157
>
> I ran both several times, the results above are typical for my system.
>
> Notice:
> 1- the compile time string2.d is slower than string.d
> 2- the template one is faster than the compile time one
>
> I don't understand how either of the above can be true.

That is odd. I got:
 compile time 78
 run time 593
 template 79
so I don't know what could be going on. Maybe try switching around the order
to see if that changes anything? I don't really know.

October 28, 2004

Re: String theory in D

Posted by Walter
in reply to Glen Perkins

Walter

Posted in reply to Glen Perkins

"Glen Perkins" <please.dont@email.com> wrote in message news:clq9a8$1jkb$1@digitaldaemon.com...
> > I don't really see the need
> > for application programmers to layer on more string types.
>
> I do, and apparently I'm not alone. People seem to mention it a lot in this newsgroup (I've discovered), just as they did for so long with C++. If I want char[] on Linux and wchar[] on Windows and want to avoid the nightmare of maintaining parallel but subtly different code, I need to create my own type using the "alias" feature. The authors of a couple of libraries I'll use will do likewise, but with their own type names and maybe different alias resolution rules. I expect some people will solve it with string classes. Stroustrup took a similar position about the need for programmers to optimize their strings long enough that every C++ library and API created its own string type. He once stated in a meeting I attended that his greatest regret about C++ was waiting so long to have a standard library and that the most requested feature of that library had been a string class. By adding just one more standard string type that would be a good default on every platform, I think you could eliminate the need so many people will feel to create their own and prevent string types from multiplying like bunnies, as happened to C++.

C++ needs a string class becuase core C++ strings are so inadequate. But this is not true for D - core strings are more than up to the job. D core strings can do everything std::string does, and a lot more. D core strings more than cover what java.lang.string does, as well.

Using 'alias' doesn't create a new type. It just renames an existing type. Hence, I don't see much of a collision problem between different code bases that use aliases.

I also just don't see the need to even bother using aliases. Just use char[]. I think the issue comes up repeatedly because people coming from a C++ background are so used to char* being inadequate that it's hard to get comfortable with the idea that char[] really does work <g>.

October 28, 2004

Re: String theory in D

Posted by Regan Heath
in reply to Anders F Björklund

Regan Heath

Posted in reply to Anders F Björklund

On Thu, 28 Oct 2004 10:24:22 +0200, Anders F Björklund <afb@algonet.se> wrote:
> Glen Perkins wrote:
>
>>> As for a single string type, there is no answer for that. Each has
>>> significant tradeoffs. For a speed oriented language, the choice needs to be under the control of the application programmer, not the language.
>>
>> I agree. I think there should be a standard string class for default use plus a selection of byte array forms (e.g. char[], wchar[], dchar[]) for use anywhere that the programmer determined that their use instead of the default improved the app.
>
> I don't have a problem with a standard String *class* present in D,
> as long as I don't *have* to use it (and OOP) - like I do in Java...
>
> The beauty about D's string types (char[] and wchar[]) is that they
> work for plain old procedural C-style programs too, not just objects ?

So we use a 'struct' instead.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

October 28, 2004

Re: String theory in D

Posted by Regan Heath
in reply to Ben Hinkle

Regan Heath

Posted in reply to Ben Hinkle

On Thu, 28 Oct 2004 08:35:24 -0400, Ben Hinkle <bhinkle4@juno.com> wrote:
> So defining multiple aliases for strings or any other type is
> a pretty harmless thing to do. It should only effect the readability and
> maintainability of the code.

I'd argue that it's not harmless for the very reasons you just mentioned. Readability and maintainability are important when working on any large-ish project.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

October 28, 2004

Re: String theory in D

Posted by Regan Heath
in reply to Sean Kelly

Regan Heath

Posted in reply to Sean Kelly

On Thu, 28 Oct 2004 14:57:33 +0000 (UTC), Sean Kelly <sean@f4.ca> wrote:
> In article <clq9a8$1jkb$1@digitaldaemon.com>, Glen Perkins says...
>>
>>> I don't really see the need
>>> for application programmers to layer on more string types.
>>
>> I do, and apparently I'm not alone. People seem to mention it a lot in
>> this newsgroup (I've discovered), just as they did for so long with
>> C++. If I want char[] on Linux and wchar[] on Windows and want to
>> avoid the nightmare of maintaining parallel but subtly different code,
>> I need to create my own type using the "alias" feature.
>
> Out of curiosity, why would you want to use different char types internally in
> the same application depending on platform?  At worst I would think that the i/o
> might translate to different encodings but the internal code would use some
> normalized form regardless of platform.

Glen mentioned system API calls. AFAIK unix variants use 8-bit char internally, but the later windows platforms use 16-bit, so, if you're doing a lot of system API calls it makes sense to have the string data in the right format. yes/no?

> I haven't seen a good argument *for* a string class yet, but I could certainly
> be swayed if one were provided.  What is the advantage over the built-ins?

I am hoping to outline some below.

> Is
> this purely a desire to create a standard implementation because we know that
> people are going to try to roll their own?

In part, yes, the result of which would be...

Imagine in the future when a large number of 3rd party libs exist, if each lib uses a different char type then interfacing between them all will involve conversions, lots of them.

If there were only 1 string type, this problem would not exist.

I realise that some conversions are un-avoidable (i.e. converting for IO), but, converting for use internally should be avoided without a very good reason,  I cannot think of any at the moment which I would consider good enough to incur the cost of conversion.

Further, say a conscientious library developer understands the above and wants to make his/her lib as compatible as possible, to do so, he/she has to either:
1- write everything 3 times. (as is already happening in the std libs)
2- do conversion internally.

Neither option is particularly good, don't you agree?

Basically I believe conversion should be done at input and ouput stages but nowhere in between, the way to achieve that is to have 1 string type used internally, the way to ensure that is only give people the choice of 1 string type.

As suggested above that type may differ on each platform.

Perhaps it could/should also differ per application, this could be achieved with a compile time flag to choose the internal string type. Not a perfect solution I know, as now we need 3 versions of each library, one for each internal char type.

Thats my 2c anyways.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

October 28, 2004

Re: String theory in D

Posted by Regan Heath
in reply to Walter

Regan Heath

Posted in reply to Walter

On Thu, 28 Oct 2004 09:39:13 -0700, Walter <newshound@digitalmars.com> wrote:
> "Glen Perkins" <please.dont@email.com> wrote in message
> news:clq9a8$1jkb$1@digitaldaemon.com...
>> > I don't really see the need
>> > for application programmers to layer on more string types.
>>
>> I do, and apparently I'm not alone. People seem to mention it a lot in
>> this newsgroup (I've discovered), just as they did for so long with
>> C++. If I want char[] on Linux and wchar[] on Windows and want to
>> avoid the nightmare of maintaining parallel but subtly different code,
>> I need to create my own type using the "alias" feature. The authors of
>> a couple of libraries I'll use will do likewise, but with their own
>> type names and maybe different alias resolution rules. I expect some
>> people will solve it with string classes. Stroustrup took a similar
>> position about the need for programmers to optimize their strings long
>> enough that every C++ library and API created its own string type. He
>> once stated in a meeting I attended that his greatest regret about C++
>> was waiting so long to have a standard library and that the most
>> requested feature of that library had been a string class. By adding
>> just one more standard string type that would be a good default on
>> every platform, I think you could eliminate the need so many people
>> will feel to create their own and prevent string types from
>> multiplying like bunnies, as happened to C++.
>
> C++ needs a string class becuase core C++ strings are so inadequate. But
> this is not true for D - core strings are more than up to the job. D core
> strings can do everything std::string does, and a lot more. D core strings
> more than cover what java.lang.string does, as well.
>
> Using 'alias' doesn't create a new type. It just renames an existing type.
> Hence, I don't see much of a collision problem between different code bases
> that use aliases.
>
> I also just don't see the need to even bother using aliases. Just use
> char[]. I think the issue comes up repeatedly because people coming from a
> C++ background are so used to char* being inadequate that it's hard to get
> comfortable with the idea that char[] really does work <g>.

It's not whether it works or not, I agree it works very well.

It's the fact that there are 3 of them, it's possible people will use different ones in their libs, then my program will have to do internal conversions all over the place.

Conversion should only be done at the input and/or output stages.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation