Thread overview | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
March 31, 2003 Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
An idea I had in my sleep, so please forgive if I've overlooked some huge obvious beastie. When interfacing a character array (btw, I'm with Mark in thinking we should have a separate string class, but have not amassed my ammunition so am not looking to engage in that debate yet) to a C API expecting a null string, we have the options of - not terminating - crash! - terminating in the array via ~= (char)0; - using toStringz() which seems from the implementation to contain most of my sleepytime ideas for an efficient placement of a terminating null. Gah! Nonetheless, I was wondering whether there was some way of making this call implicit, perhaps in the declaration of the C function. For example, strlen is declared thus extern (C) { int strlen(char *); } Would it be a nice thing to declare it extern (C) { int strlen(char null *); } and the D compiler would insert a call to toStringz() automatically? Sure there is an efficiency argument against, but I suspect most of such C calls that expect ZTS have to involve some similar treatment. And really, the null decorator would not mean that "the compiler must call toStringz", rather it could mean that "the compiler must ensure that the string is zero-terminated". Hence the compiler would be free to optimise out such a call where it is dealing with a literal, or static, or something that it's already established is null terminated. For example, the code void blah(char[] s) { int len1 = strlen(s); int len2 = strlen(s); } Could be translated to void blah(char[] s) { char[] s_zt = toStringz(s); int len1 = strlen(s_zt); int len2 = strlen(s_zt); } This would eradicate many of the problems that are likely to bite people interfacing to C code, without in any way adding a cost to "pure" D. Any takers? Matthew |
March 31, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthew Wilson | Hi, Matthew.
> When interfacing a character array (btw, I'm with Mark in thinking we should
> have a separate string class, but have not amassed my ammunition so am not
> looking to engage in that debate yet)
This is a rare occasion when I agree with Mark. The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider.
I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all.
Bill
|
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | :) Pragmatist is a lot more of a compliment than what I usually get: pedant. Yes, the string stuff is highly toxic in C, C++ and (it seems) D. I am also, however, wary of building in support for inefficient (in terms of speed, not size) variable character length encoding schemes. Is there are reason why UCS-32 (or is that UTF-32 - I need to go and digest all that awful gunk again and get my terminology back up to speed), a la wchar_t, Java, .NETis not sufficient? I know that 65536 doesn't cover all the bases of _all_ languages, but it is nevertheless used as a "complete" solution by so many languages, so is it "near enough is good enough". Dunno, seems Mark's much more of an expert, so hopefully he can enlighten me on that one. Anyway, Bill, everyone, do you like the "char null *" idea? - Doesn't introduce another keyword. - Surely not hard to parse. - Improves robustness. - Doesn't add operations that would not have to be done anyway. - Leaves it all to compiler's best discretion, so plenty of chances for being _faster_ than leaving it up to user, which seems to be a theme of D, where achievable. Sure, fire away, but I think we should have it running for parliament. ;) Percy the pragmatist "Bill Cox" <bill@viasic.com> wrote in message news:3E88BE91.6010403@viasic.com... > Hi, Matthew. > > > When interfacing a character array (btw, I'm with Mark in thinking we should > > have a separate string class, but have not amassed my ammunition so am not > > looking to engage in that debate yet) > > This is a rare occasion when I agree with Mark. The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider. > > I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all. > > Bill > |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthew Wilson | Correction: meant UCS-/UTF-16, not 32 "Matthew Wilson" <dmd@synesis.com.au> wrote in message news:b6aph7$1dbp$1@digitaldaemon.com... > :) > > Pragmatist is a lot more of a compliment than what I usually get: pedant. > > Yes, the string stuff is highly toxic in C, C++ and (it seems) D. I am also, > however, wary of building in support for inefficient (in terms of speed, not > size) variable character length encoding schemes. > > Is there are reason why UCS-32 (or is that UTF-32 - I need to go and digest > all that awful gunk again and get my terminology back up to speed), a la wchar_t, Java, .NETis not sufficient? > > I know that 65536 doesn't cover all the bases of _all_ languages, but it is > nevertheless used as a "complete" solution by so many languages, so is it "near enough is good enough". Dunno, seems Mark's much more of an expert, so > hopefully he can enlighten me on that one. > > > Anyway, Bill, everyone, do you like the "char null *" idea? > - Doesn't introduce another keyword. > - Surely not hard to parse. > - Improves robustness. > - Doesn't add operations that would not have to be done anyway. > - Leaves it all to compiler's best discretion, so plenty of chances for > being _faster_ than leaving it up to user, which seems to be a theme of D, > where achievable. > > Sure, fire away, but I think we should have it running for parliament. ;) > > Percy the pragmatist > > "Bill Cox" <bill@viasic.com> wrote in message news:3E88BE91.6010403@viasic.com... > > Hi, Matthew. > > > > > When interfacing a character array (btw, I'm with Mark in thinking we > should > > > have a separate string class, but have not amassed my ammunition so am > not > > > looking to engage in that debate yet) > > > > This is a rare occasion when I agree with Mark. The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider. > > > > I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all. > > > > Bill > > > > |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthew Wilson | In article <b6aph7$1dbp$1@digitaldaemon.com>, Matthew Wilson says... >Anyway, Bill, everyone, do you like the "char null *" idea? >- Doesn't introduce another keyword. >- Surely not hard to parse. >- Improves robustness. >- Doesn't add operations that would not have to be done anyway. >- Leaves it all to compiler's best discretion, so plenty of chances for >being _faster_ than leaving it up to user, which seems to be a theme of D, >where achievable. From a user point of view, I like the char null*. The single most common "Help!, I've crashed my simple D program" post on this newsgroup seems to have to do with the terminating null, and how it interacts with character array slicing. I'd be nice to help clear that one up. I don't know how hard the support would be. I'd have to be pretty hard to amount to more of Walter's time than dealing with the confused D users. Bill |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Matthew Wilson | Matthew please post in the other thread if you want me to respond. That's why I started it. Mark |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Mark Evans | Can't remember which bit of which post applies to which thread. Verbal diarrhoea, I'm afraid. "Mark Evans" <Mark_member@pathlink.com> wrote in message news:b6av14$1h2u$1@digitaldaemon.com... > Matthew please post in the other thread if you want me to respond. That's why I > started it. > > Mark > > |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | > I'd be nice to help clear that one up. I don't know how hard the support would > be. I'd have to be pretty hard to amount to more of Walter's time than dealing > with the confused D users. Good point. Maybe you've invented a new, and quite definitive, metric for measuring the worth of D changes. :) Walter ? |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | "Bill Cox" <bill@viasic.com> wrote in message news:3E88BE91.6010403@viasic.com... > I would want to hold built-in string support to just UTF-8. D could offer some support for the other formats through conversion routines in a standard library. Having a single string format would surely be simpler than supporting them all. That's the direction D is going. |
April 01, 2003 Re: Automatic Safe and Efficient Sz-ing | ||||
---|---|---|---|---|
| ||||
Posted in reply to Bill Cox | Bill Cox wrote:
> In article <b6aph7$1dbp$1@digitaldaemon.com>, Matthew Wilson says...
>
>>Anyway, Bill, everyone, do you like the "char null *" idea?
>>- Doesn't introduce another keyword.
>>- Surely not hard to parse.
>>- Improves robustness.
>>- Doesn't add operations that would not have to be done anyway.
>>- Leaves it all to compiler's best discretion, so plenty of chances for
>>being _faster_ than leaving it up to user, which seems to be a theme of D,
>>where achievable.
>
>
> From a user point of view, I like the char null*. The single most common
> "Help!, I've crashed my simple D program" post on this newsgroup seems to have
> to do with the terminating null, and how it interacts with character array
> slicing.
The problems of newbies are eminently ignorable. It's the problems of people who are indoctrinated that are worth looking into, they're the ones who are going to be running into it in the years following.
About the issue itself, uh... it's a good match for D (as set out at the top of the Phobos page), it's not a good match for what I want D to be. I don't like referring to C functions directly, because of incompatible signatures, lack of exceptions, weird overloading, and extreme operating system variations in Unices - for example, sometimes errno is a symbol, sometimes it's a macro calling a function. Purifying this variability is the first task of cross-platform work, which I do quite a lot of, and char* is one small factor of the problem.
So altogether there's no win in it for me. toStringz shows up 38 times in the interface library dig, 0 times in the client program dedit. That's the way it should be.
|
Copyright © 1999-2021 by the D Language Foundation