September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu wrote:
> Perhaps, using string instead of char[], it's more obvious that it's not zero-terminated. I've seen D examples online that just cast a char[] to char* for use in MessageBox and the like (which worked since it were string constants.)
And probably only for ASCII string constants, at that...
--anders
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Walter Bright | On Fri, 29 Sep 2006 01:24:50 -0700, Walter Bright wrote: > Derek Parnell wrote: >> On Fri, 29 Sep 2006 10:23:32 +1000, Geoff Carlton wrote: >>> I was a bit underwhelmed by the syntax of char[]. >> >> Yes. It isn't very 'nice' for a modern language. Though as you note below a simple alias can help a lot. >> >> alias char[] string; > > On the other hand, the reasons other languages have strings as classes is because they just don't support arrays very well. C++'s std::string combines the worst of core functionality and libraries, and has the advantages of neither. > > An early design goal for D was to upgrade arrays to the point where string classes weren't necessary. And is it there yet? I mean, given that a string is just a lump of text, is there any text processing operation that cannot be simply done to a char[] item? I can't think of any but maybe somebody else can. And if a char[] is just as capable as a std::string, then why not have an official alias in Phobos? Will 'alias char[] string' cause anyone any problems? -- Derek Parnell Melbourne, Australia "Down with mediocrity!" |
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Lionello Lunesu | Lionello Lunesu wrote:
> I also ALWAYS create aliases for char[], wchar[], dchar[]... I DO wish they would be included by default in Phobos.
>
> alias char[] string;
> alias wchar[] wstring;
> alias dchar[] dstring;
>
> Perhaps, using string instead of char[], it's more obvious that it's not zero-terminated. I've seen D examples online that just cast a char[] to char* for use in MessageBox and the like (which worked since it were string constants.)
Using char[] as long as you don't know about UTF seems to work pretty well in D. But the moment you realise that we're having potential multibyte characters in what essentially is a ubyte[], you get scared to death, and start to wonder how on earth you haven't yet blown up your hard disk.
You start having nightmares about slicing char arrays at the wrong place, extracting single chars that might not be storable in a char, and all of a sudden you decide to stick with your old language "till things calm down".
The only medicine to this is simply to shut your eyes and keep coding on like you never did realise anything.
It's a little like when you first realised Daddy isn't holding your bike: you instantly fall hurting yourself, instead of realizing that he's probably let go ages ago, and you still haven't fallen, so simply keep going.
---
This doesn't mean I'm happy with this either, but I don't have the energy to conjure up a significantly better solution _and_ fight for it till it gets accepted. (Some things are just too hard to fix, like "bit=bool" was, and now "auto/auto".)
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote:
> On Fri, 29 Sep 2006 01:24:50 -0700, Walter Bright wrote:
>>An early design goal for D was to upgrade arrays to the point where string classes weren't necessary.
>
> And is it there yet? I mean, given that a string is just a lump of text
The string you're talking about is not just a lump of text.
More specifically it's a lump of text, irregularly interspersed with short non-ascii ubyte sequences.
The latter being of course the tails of UTF-8 "characters".
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote:
> On Fri, 29 Sep 2006 01:24:50 -0700, Walter Bright wrote:
>
>
>>Derek Parnell wrote:
>>
>>>On Fri, 29 Sep 2006 10:23:32 +1000, Geoff Carlton wrote:
>>>
>>>>I was a bit underwhelmed by the syntax of char[].
>>>
>>>Yes. It isn't very 'nice' for a modern language. Though as you note below a
>>>simple alias can help a lot.
>>>
>>> alias char[] string;
>>
>>On the other hand, the reasons other languages have strings as classes is because they just don't support arrays very well. C++'s std::string combines the worst of core functionality and libraries, and has the advantages of neither.
>>
>>An early design goal for D was to upgrade arrays to the point where string classes weren't necessary.
>
>
> And is it there yet? I mean, given that a string is just a lump of text, is
> there any text processing operation that cannot be simply done to a char[]
> item? I can't think of any but maybe somebody else can.
>
> And if a char[] is just as capable as a std::string, then why not have an
> official alias in Phobos? Will 'alias char[] string' cause anyone any
> problems?
>
I just quickly want to interject my wish for aliases for the basic string array types.
-DavidM
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Anders F Björklund | Anders F Björklund wrote:
> Lionello Lunesu wrote:
>
>> Perhaps, using string instead of char[], it's more obvious that it's not zero-terminated. I've seen D examples online that just cast a char[] to char* for use in MessageBox and the like (which worked since it were string constants.)
>
> And probably only for ASCII string constants, at that...
Right, that too!
char[] somestring = "....";
func( somestring[0] ); // WRONG: somestring[x] is not 1 character!
Using "string" would make it less obvious:
string somestring = ".....";
func( somestring[0] ); // [0] means what?
This goes for iteration as well. DMD will still deduct 'char' as the type type, but at least one's less likely to type foreach(char c;str).
If you want to iterate the UNICODE characters in a string, you'll specify "dchar" as the type and you won't worry about "how come I can use dchar when it's a char[]":
foreach(dchar c; somestring)
func(c); // correct
L.
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Georg Wrede | Georg Wrede wrote:
> Lionello Lunesu wrote:
>
>> I also ALWAYS create aliases for char[], wchar[], dchar[]... I DO wish they would be included by default in Phobos.
>>
>> alias char[] string;
>> alias wchar[] wstring;
>> alias dchar[] dstring;
>>
>> Perhaps, using string instead of char[], it's more obvious that it's not zero-terminated. I've seen D examples online that just cast a char[] to char* for use in MessageBox and the like (which worked since it were string constants.)
>
>
> Using char[] as long as you don't know about UTF seems to work pretty well in D. But the moment you realise that we're having potential multibyte characters in what essentially is a ubyte[], you get scared to death, and start to wonder how on earth you haven't yet blown up your hard disk.
>
> You start having nightmares about slicing char arrays at the wrong place, extracting single chars that might not be storable in a char, and all of a sudden you decide to stick with your old language "till things calm down".
>
> The only medicine to this is simply to shut your eyes and keep coding on like you never did realise anything.
>
> It's a little like when you first realised Daddy isn't holding your bike: you instantly fall hurting yourself, instead of realizing that he's probably let go ages ago, and you still haven't fallen, so simply keep going.
>
> ---
>
> This doesn't mean I'm happy with this either, but I don't have the energy to conjure up a significantly better solution _and_ fight for it till it gets accepted. (Some things are just too hard to fix, like "bit=bool" was, and now "auto/auto".)
haha too true.
I experienced this too as I read this ng. It hasn't been THAT truamatic for me though, since everything seems to work as long as you stick to english. I don't have the resources to even begin thinking about non-english text (ex: paying people to translate stuff), so I don't lose any sleep about it, at least not yet.
Perhaps there should be a string struct/class that has an undefined underlying type (it could be UTF-8, 16, 32, you dunno really), and you could index it to get the *complete* character at any position in the string. Basically, it is like char[], but it /just works/ in all cases. I'd almost rather have the size of a char be undefined, and just have char[] be the said magic string type. If you want something with a .size of 1, then there is byte/ubyte. There would probably have to be some stuff in the phobos internals to handle such a string in a correct manner.
Going even further... if you could make char[] be such a magic string type, then wchar[] and dchar[] could probably be deprecated - use ushort and uint instead. Then add the following aliases to phobos:
alias ubyte utf8;
alias ushort utf16;
alias uint utf32;
Just a thought. I'm no expert on UTF, but maybe this can start a discussion that will result in the nightmares ending :)
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chad J | Chad J > wrote: > Perhaps there should be a string struct/class that has an undefined underlying type (it could be UTF-8, 16, 32, you dunno really), and you could index it to get the *complete* character at any position in the string. Basically, it is like char[], but it /just works/ in all cases. I'd almost rather have the size of a char be undefined, and just have char[] be the said magic string type. If you want something with a ..size of 1, then there is byte/ubyte. There would probably have to be some stuff in the phobos internals to handle such a string in a correct manner. I have thought about this to. > Going even further... if you could make char[] be such a magic string type, then wchar[] and dchar[] could probably be deprecated - use ushort and uint instead. Then add the following aliases to phobos: > alias ubyte utf8; > alias ushort utf16; > alias uint utf32; I completely agree, char should hold a character independently of encoding and NOT a code unit or something else. I think it would bee beneficial to D in the long term if chars where done right (meaning that they can store any character) how it is implemented is not important and i believe performance is not a problem here, so ease of use and correctness would be appreciated. |
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Johan Granberg | Johan Granberg wrote:
>
>
> I completely agree, char should hold a character independently of encoding and NOT a code unit or something else. I think it would be
> beneficial to D in the long term if chars where done right (meaning that they can store any character) how it is implemented is not important and i believe performance is not a problem here, so ease of use and correctness would be appreciated.
Why isn't performance a problem?
If you are saying that this won't cause performance hits in run times or memory space, I might be able to buy it, but I'm not yet convinced.
If you are saying that causing a performance hit in run times or memory space is not a problem... in that case I think you are dead wrong and you will not convince me otherwise.
In my opinion, any compiled language should allow fairly direct access to the most efficient practical means of doing something*. If I didn't care about speed and memory I wound use some sort of scripting language.
A good set of libs should make most of this moot. Leave the char as is and define a typedef struct or whatever that provides the added functionality that you want.
* OTOH a language should not mandate code to be efficient at the expense of ease of coding.
|
September 29, 2006 Re: First Impressions | ||||
---|---|---|---|---|
| ||||
Posted in reply to Derek Parnell | Derek Parnell wrote: > And is it there yet? I mean, given that a string is just a lump of text, is > there any text processing operation that cannot be simply done to a char[] > item? I can't think of any but maybe somebody else can. I believe it's there. I don't think std::string or java.lang.String have anything over it. > And if a char[] is just as capable as a std::string, then why not have an > official alias in Phobos? Will 'alias char[] string' cause anyone any > problems? I don't think it'll cause problems, it just seems pointless. |
Copyright © 1999-2021 by the D Language Foundation