String theory in D (page 4) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » String theory in D (page 4)

October 28, 2004

Re: String theory in D

Posted by Anders F Björklund
in reply to Walter

Anders F Björklund

Posted in reply to Walter

Walter wrote:

> I also just don't see the need to even bother using aliases. Just use
> char[]. I think the issue comes up repeatedly because people coming from a
> C++ background are so used to char* being inadequate that it's hard to get
> comfortable with the idea that char[] really does work <g>.

I just found char[][] a tad confusing, but maybe it grows on you... :-)

Oh well; I can still use a local "string" alias for char[] if I want
to, even if it doesn't make into the standard D includes. No big deal.

And there probably should be a warning that ".length" only works for
ASCII strings, since it returns the number of code units otherwise ?

--anders

October 29, 2004

Re: String theory in D

Posted by Walter
in reply to Regan Heath

Walter

Posted in reply to Regan Heath

"Regan Heath" <regan@netwin.co.nz> wrote in message news:opsgllx2xp5a2sq9@digitalmars.com...
> It's the fact that there are 3 of them, it's possible people will use different ones in their libs, then my program will have to do internal conversions all over the place.

I admit it may become a problem, but I don't think it will. More experience will let us know. With C++, there is a big problem because char* and wchar_t* are simply incompatible, one cannot even do conversions. One of my beefs with C++ was having to have multiple versions of the same function for the various char types. This isn't necessary in D.

October 29, 2004

Re: String theory in D

Posted by Walter
in reply to Regan Heath

Walter

Posted in reply to Regan Heath

"Regan Heath" <regan@netwin.co.nz> wrote in message news:opsgllvqje5a2sq9@digitalmars.com...
> Perhaps it could/should also differ per application, this could be achieved with a compile time flag to choose the internal string type. Not a perfect solution I know, as now we need 3 versions of each library, one for each internal char type.

Although some are doing this, I argue it isn't necessary. Just pick one, and use conversions as necessary.

October 29, 2004

Re: String theory in D

Posted by Regan Heath
in reply to Walter

Regan Heath

Posted in reply to Walter

On Thu, 28 Oct 2004 17:53:43 -0700, Walter <newshound@digitalmars.com> wrote:
> "Regan Heath" <regan@netwin.co.nz> wrote in message
> news:opsgllvqje5a2sq9@digitalmars.com...
>> Perhaps it could/should also differ per application, this could be
>> achieved with a compile time flag to choose the internal string type. Not
>> a perfect solution I know, as now we need 3 versions of each library, one
>> for each internal char type.
>
> Although some are doing this, I argue it isn't necessary. Just pick one, and use conversions as necessary.

Quite frankly, yuck. As I said earlier, it's in-efficient to convert internally, you should only convert on input and output.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

October 29, 2004

Re: String theory in D

Posted by Regan Heath
in reply to Walter

Regan Heath

Posted in reply to Walter

On Thu, 28 Oct 2004 17:51:04 -0700, Walter <newshound@digitalmars.com> wrote:
> "Regan Heath" <regan@netwin.co.nz> wrote in message
> news:opsgllx2xp5a2sq9@digitalmars.com...
>> It's the fact that there are 3 of them, it's possible people will use
>> different ones in their libs, then my program will have to do internal
>> conversions all over the place.
>
> I admit it may become a problem, but I don't think it will. More experience
> will let us know. With C++, there is a big problem because char* and
> wchar_t* are simply incompatible, one cannot even do conversions. One of my beefs with C++ was having to have multiple versions of the same function for the various char types. This isn't necessary in D.

Isn't it? Explain std.string then.

Don't people convert between char* and wchar_t* all the time, with functions? How is that really different from using a cast() in D, the syntax and knowing the encoding are the only differences I can see.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

October 29, 2004

Re: String theory in D

Posted by James McComb
in reply to Walter

James McComb

Posted in reply to Walter

Walter wrote:

> I also just don't see the need to even bother using aliases. Just use
> char[].

But you need to use aliases for the following scenario:

Suppose that:
  1. I want to write code for both Windows and Unix.
  2. I don't want to pay any string conversion costs at all.

I assume the way to do this in D is:
  1. Use wchar[] on Windows and make UTF-16 API calls.
  2. Use char[] on Linux and make UTF-8 API calls.
  3. Use an alias to toggle between wchar[] and char[].
  4. Use a string library that defines all functions in both wchar[] and char[] versions.

If I just used char[], I would be forced to pay string conversion costs, as Windows ultimately processes all strings in UTF-16.

October 29, 2004

Re: String theory in D

Posted by Walter
in reply to Regan Heath

Walter

Posted in reply to Regan Heath

"Regan Heath" <regan@netwin.co.nz> wrote in message news:opsglzsouo5a2sq9@digitalmars.com...
> On Thu, 28 Oct 2004 17:51:04 -0700, Walter <newshound@digitalmars.com> wrote:
> > "Regan Heath" <regan@netwin.co.nz> wrote in message news:opsgllx2xp5a2sq9@digitalmars.com...
> >> It's the fact that there are 3 of them, it's possible people will use different ones in their libs, then my program will have to do internal conversions all over the place.
> >
> > I admit it may become a problem, but I don't think it will. More
> > experience
> > will let us know. With C++, there is a big problem because char* and
> > wchar_t* are simply incompatible, one cannot even do conversions. One of
> > my beefs with C++ was having to have multiple versions of the same
> > function for the various char types. This isn't necessary in D.
>
> Isn't it? Explain std.string then.
>
> Don't people convert between char* and wchar_t* all the time, with functions? How is that really different from using a cast() in D, the syntax and knowing the encoding are the only differences I can see.

The conversion doesn't work because it doesn't know about UTF. An attempt is being made to fix this in the latest standards.

October 29, 2004

Re: String theory in D

Posted by Walter
in reply to James McComb

Walter

Posted in reply to James McComb

"James McComb" <ned@jamesmccomb.id.au> wrote in message news:cls8k1$16r5$1@digitaldaemon.com...
> Walter wrote:
> > I also just don't see the need to even bother using aliases. Just use char[].
>
> But you need to use aliases for the following scenario:
>
> Suppose that:
>    1. I want to write code for both Windows and Unix.
>    2. I don't want to pay any string conversion costs at all.
>
> I assume the way to do this in D is:
>    1. Use wchar[] on Windows and make UTF-16 API calls.
>    2. Use char[] on Linux and make UTF-8 API calls.
>    3. Use an alias to toggle between wchar[] and char[].
>    4. Use a string library that defines all functions in both wchar[]
> and char[] versions.
>
> If I just used char[], I would be forced to pay string conversion costs, as Windows ultimately processes all strings in UTF-16.

True, Win32 process strings in UTF-16 and Linux in UTF-8. But I'll argue that the string conversion costs are insignificant, because very rarely does one write code that crosses from the app to the OS in a tight loop. In fact, one actively tries to avoid doing that because crossing the process boundary layer is expensive anyway.

If profiling indicates that the conversion cost is significant, then use an alias, sure. But I'll wager that's very unlikely.

October 29, 2004

Re: String theory in D

Posted by Kevin Bealer
in reply to A. Coward (not related to Noël)

Kevin Bealer

Posted in reply to A. Coward (not related to Noël)

In article <cllf6q$24vg$1@digitaldaemon.com>, not related to Noël says... .
>The default should be that everyone uses the Default string, and that only profiling should be used to decide whether some snippets should then be programmed with arrays (or whatever), as a last resort.

I think there is some merit in this guideline, particularly for those new to programming.  But I'm coming around to the perspective that performance problems are like bugs.  If you don't pay attention to bugs during the design phase, you will spend your whole career debugging programs.  Likewise, if performance is the last thing you think about, you will spend all of your career profiling programs with poor performance, trying to overcome slow designs with small optimizations.

If you want a "standard" string type, use "char".  An XML parser needs to look for "<" and ">" a lot, but how often do you -really- need to scan strings for multibyte characters?  Virtually all traditional tokenization and parsing tasks can be done with 8 bit types, because they require searching for delimiters that are themselves 8 bit chars.  I've not seen "U+umlaut" delimited fields ;)

My rule of thumb is to use the smallest type that I won't need to convert inside the function, usually char.  If the function needs to iterate over and modify dchar elements, accept that type at the function interface.

Kevin

October 29, 2004

Re: String theory in D

Posted by Glen Perkins
in reply to Walter

Glen Perkins

Posted in reply to Walter

"Walter" <newshound@digitalmars.com> wrote in message news:clr7kg$2mi2$1@digitaldaemon.com...

> [D library authors and others won't be tempted to create their own string classes
> as so many did for C++ because D's core strings are so much better]

This may turn out to be true. If so, you are still left with multiple string types and no obvious default. My concern is that the result will be a lot of unnecessary complexity, with all of its associated real costs, in exchange for little or no real benefit in many cases, and it won't even be avoidable by those who are aware of it if they use other people's code. And if it doesn't work out that way, the situation would be even worse, with even more string types and still no default.

> Using 'alias' doesn't create a new type. It just renames an existing type.

You're talking about 'type' from the compiler's perspective, while I'm talking about it from the perspective of people--well, programmers are sort of like people--as in complexity, programmer productivity, porting, debugging, maintenance etc. From that perspective two things with different names have to be managed differently. Though the compiler may (sometimes) not object if you mix them, anyone who works with multiple string type names has more to keep track of and check on and worry about.

Just for grins, here is the sort of thing I've overheard coming from developers at first-rate software companies, who ended up with multiple internal string types aliased by #defines:

"<cubicle #1 wonders out loud>Hmm, this FooLib wants to be passed a foochar, but we're passing it a regular char. Is that okay? Or will it fail with a non-ASCII character? <cubicle #2 offers>Maybe you should test it with an accented e. <cubicle #3>No, wait, isn't upper ASCII still one byte sometimes....? <guy #1 again>Well I don't have a Japanese IME. Does anybody remember what a 'foochar' is on Linux? <guy #3> It's UTF-16 on Windows. Sorry, don't know about Linux...."

> Hence, I don't see much of a collision problem between different code bases
> that use aliases.

Whether or not such a problem exists for the compiler, I don't see how working with multiple string types, even if some of them differ in name only, would not be a complexity problem for *people*.

> I also just don't see the need to even bother using aliases.

That's pretty interesting because if there really is no need for this feature, you could prevent some unnecessary complexity by eliminating the feature.

With no default string type, though, people are essentially told to optimize their string type everytime they create a string, which will probably create a demand for a feature like "alias" to create an abstract string type (name) above the implementation level.

> Just use
> char[]. I think the issue comes up repeatedly because people coming from a
> C++ background are so used to char* being inadequate that it's hard to get
> comfortable with the idea that char[] really does work <g>.

<g> Funny, I thought you would say "just use wchar[]". Each one seems about equally likely. ;-)

Again, you may be right about the existing byte array types being good enough to prevent the proliferation of string classes. Even if you are, my concern about the lack of an obvious default resulting in non-trivial complexity costs with no concommitant benefit remains.

But I suppose it's also possible that if you DID add a nice, lightweight string class suitable as a default almost everywhere, the addiction of so many C people to premature optimization could make it unpopular, rendering it less of a unifying default than just a fourth standard string type to add to the complexity.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation