Automatic Safe and Efficient Sz-ing - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » Automatic Safe and Efficient Sz-ing

Thread overview

Automatic Safe and Efficient Sz-ing
Mar 31, 2003 Matthew Wilson
Mar 31, 2003 Bill Cox
Apr 01, 2003 Matthew Wilson
Apr 01, 2003 Matthew Wilson
Apr 01, 2003 Bill Cox
Apr 01, 2003 Matthew Wilson
Apr 01, 2003 Burton Radons
Apr 02, 2003 Matthew Wilson
Apr 01, 2003 Mark Evans
Apr 01, 2003 Matthew Wilson
Apr 01, 2003 Walter
May 25, 2003 Walter
May 25, 2003 Mark Evans

March 31, 2003

Automatic Safe and Efficient Sz-ing

Posted by Matthew Wilson

Matthew Wilson

An idea I had in my sleep, so please forgive if I've overlooked some huge obvious beastie.

When interfacing a character array (btw, I'm with Mark in thinking we should
have a separate string class, but have not amassed my ammunition so am not
looking to engage in that debate yet)
to a C API expecting a null string, we have the options of

- not terminating - crash!
- terminating in the array via ~= (char)0;
- using toStringz() which seems from the implementation to contain most of
my sleepytime ideas for an efficient placement of a terminating null. Gah!

Nonetheless, I was wondering whether there was some way of making this call implicit, perhaps in the declaration of the C function.

For example, strlen is declared thus

extern (C)
{
    int strlen(char *);
}

Would it be a nice thing to declare it

extern (C)
{
    int strlen(char null *);
}

and the D compiler would insert a call to toStringz() automatically?

Sure there is an efficiency argument against, but I suspect most of such C calls that expect ZTS have to involve some similar treatment.

And really, the null decorator would not mean that "the compiler must call toStringz", rather it could mean that "the compiler must ensure that the string is zero-terminated". Hence the compiler would be free to optimise out such a call where it is dealing with a literal, or static, or something that it's already established is null terminated. For example, the code

void blah(char[] s)
{
    int    len1    =    strlen(s);
    int    len2    =    strlen(s);
}

Could be translated to

void blah(char[] s)
{
    char[]    s_zt    =    toStringz(s);
    int        len1    =    strlen(s_zt);
    int        len2    =    strlen(s_zt);
}

This would eradicate many of the problems that are likely to bite people interfacing to C code, without in any way adding a cost to "pure" D.

Any takers?

Matthew

March 31, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Bill Cox
in reply to Matthew Wilson

Bill Cox

Posted in reply to Matthew Wilson

Hi, Matthew.

> When interfacing a character array (btw, I'm with Mark in thinking we should
> have a separate string class, but have not amassed my ammunition so am not
> looking to engage in that debate yet)

This is a rare occasion when I agree with Mark.  The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider.

I would want to hold built-in string support to just UTF-8.  D could offer some support for the other formats through conversion routines in a standard library.  Having a single string format would surely be simpler than supporting them all.

Bill

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Matthew Wilson
in reply to Bill Cox

Matthew Wilson

Posted in reply to Bill Cox

:)

Pragmatist is a lot more of a compliment than what I usually get: pedant.

Yes, the string stuff is highly toxic in C, C++ and (it seems) D. I am also, however, wary of building in support for inefficient (in terms of speed, not size) variable character length encoding schemes.

Is there are reason why UCS-32 (or is that UTF-32 - I need to go and digest all that awful gunk again and get my terminology back up to speed), a la wchar_t, Java, .NETis not sufficient?

I know that 65536 doesn't cover all the bases of _all_ languages, but it is nevertheless used as a "complete" solution by so many languages, so is it "near enough is good enough". Dunno, seems Mark's much more of an expert, so hopefully he can enlighten me on that one.


Anyway, Bill, everyone, do you like the "char null *" idea?
- Doesn't introduce another keyword.
- Surely not hard to parse.
- Improves robustness.
- Doesn't add operations that would not have to be done anyway.
- Leaves it all to compiler's best discretion, so plenty of chances for
being _faster_ than leaving it up to user, which seems to be a theme of D,
where achievable.

Sure, fire away, but I think we should have it running for parliament. ;)

Percy the pragmatist

"Bill Cox" <bill@viasic.com> wrote in message news:3E88BE91.6010403@viasic.com...
> Hi, Matthew.
>
> > When interfacing a character array (btw, I'm with Mark in thinking we
should
> > have a separate string class, but have not amassed my ammunition so am
not
> > looking to engage in that debate yet)
>
> This is a rare occasion when I agree with Mark.  The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider.
>
> I would want to hold built-in string support to just UTF-8.  D could offer some support for the other formats through conversion routines in a standard library.  Having a single string format would surely be simpler than supporting them all.
>
> Bill
>

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Matthew Wilson
in reply to Matthew Wilson

Matthew Wilson

Posted in reply to Matthew Wilson

Correction: meant UCS-/UTF-16, not 32

"Matthew Wilson" <dmd@synesis.com.au> wrote in message news:b6aph7$1dbp$1@digitaldaemon.com...
> :)
>
> Pragmatist is a lot more of a compliment than what I usually get: pedant.
>
> Yes, the string stuff is highly toxic in C, C++ and (it seems) D. I am
also,
> however, wary of building in support for inefficient (in terms of speed,
not
> size) variable character length encoding schemes.
>
> Is there are reason why UCS-32 (or is that UTF-32 - I need to go and
digest
> all that awful gunk again and get my terminology back up to speed), a la wchar_t, Java, .NETis not sufficient?
>
> I know that 65536 doesn't cover all the bases of _all_ languages, but it
is
> nevertheless used as a "complete" solution by so many languages, so is it "near enough is good enough". Dunno, seems Mark's much more of an expert,
so
> hopefully he can enlighten me on that one.
>
>
> Anyway, Bill, everyone, do you like the "char null *" idea?
> - Doesn't introduce another keyword.
> - Surely not hard to parse.
> - Improves robustness.
> - Doesn't add operations that would not have to be done anyway.
> - Leaves it all to compiler's best discretion, so plenty of chances for
> being _faster_ than leaving it up to user, which seems to be a theme of D,
> where achievable.
>
> Sure, fire away, but I think we should have it running for parliament. ;)
>
> Percy the pragmatist
>
> "Bill Cox" <bill@viasic.com> wrote in message news:3E88BE91.6010403@viasic.com...
> > Hi, Matthew.
> >
> > > When interfacing a character array (btw, I'm with Mark in thinking we
> should
> > > have a separate string class, but have not amassed my ammunition so am
> not
> > > looking to engage in that debate yet)
> >
> > This is a rare occasion when I agree with Mark.  The fact that a minimalist like me, and a maximalist like Mark, and a pragmatist like yourself seem to agree is something Walter should consider.
> >
> > I would want to hold built-in string support to just UTF-8.  D could offer some support for the other formats through conversion routines in a standard library.  Having a single string format would surely be simpler than supporting them all.
> >
> > Bill
> >
>
>

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Bill Cox
in reply to Matthew Wilson

Bill Cox

Posted in reply to Matthew Wilson

In article <b6aph7$1dbp$1@digitaldaemon.com>, Matthew Wilson says...
>Anyway, Bill, everyone, do you like the "char null *" idea?
>- Doesn't introduce another keyword.
>- Surely not hard to parse.
>- Improves robustness.
>- Doesn't add operations that would not have to be done anyway.
>- Leaves it all to compiler's best discretion, so plenty of chances for
>being _faster_ than leaving it up to user, which seems to be a theme of D,
>where achievable.

From a user point of view, I like the char null*.  The single most common "Help!, I've crashed my simple D program" post on this newsgroup seems to have to do with the terminating null, and how it interacts with character array slicing.

I'd be nice to help clear that one up.  I don't know how hard the support would be.  I'd have to be pretty hard to amount to more of Walter's time than dealing with the confused D users.

Bill

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Mark Evans
in reply to Matthew Wilson

Mark Evans

Posted in reply to Matthew Wilson

Matthew please post in the other thread if you want me to respond.  That's why I started it.

Mark

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Matthew Wilson
in reply to Mark Evans

Matthew Wilson

Posted in reply to Mark Evans

Can't remember which bit of which post applies to which thread. Verbal diarrhoea, I'm afraid.

"Mark Evans" <Mark_member@pathlink.com> wrote in message news:b6av14$1h2u$1@digitaldaemon.com...
> Matthew please post in the other thread if you want me to respond.  That's
why I
> started it.
>
> Mark
>
>

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Matthew Wilson
in reply to Bill Cox

Matthew Wilson

Posted in reply to Bill Cox

> I'd be nice to help clear that one up.  I don't know how hard the support
would
> be.  I'd have to be pretty hard to amount to more of Walter's time than
dealing
> with the confused D users.

Good point. Maybe you've invented a new, and quite definitive, metric for measuring the worth of D changes. :)

Walter ?

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Walter
in reply to Bill Cox

Walter

Posted in reply to Bill Cox

"Bill Cox" <bill@viasic.com> wrote in message news:3E88BE91.6010403@viasic.com...
> I would want to hold built-in string support to just UTF-8.  D could offer some support for the other formats through conversion routines in a standard library.  Having a single string format would surely be simpler than supporting them all.

That's the direction D is going.

April 01, 2003

Re: Automatic Safe and Efficient Sz-ing

Posted by Burton Radons
in reply to Bill Cox

Burton Radons

Posted in reply to Bill Cox

Bill Cox wrote:
> In article <b6aph7$1dbp$1@digitaldaemon.com>, Matthew Wilson says...
> 
>>Anyway, Bill, everyone, do you like the "char null *" idea?
>>- Doesn't introduce another keyword.
>>- Surely not hard to parse.
>>- Improves robustness.
>>- Doesn't add operations that would not have to be done anyway.
>>- Leaves it all to compiler's best discretion, so plenty of chances for
>>being _faster_ than leaving it up to user, which seems to be a theme of D,
>>where achievable.
> 
> 
> From a user point of view, I like the char null*.  The single most common
> "Help!, I've crashed my simple D program" post on this newsgroup seems to have
> to do with the terminating null, and how it interacts with character array
> slicing.

The problems of newbies are eminently ignorable.  It's the problems of people who are indoctrinated that are worth looking into, they're the ones who are going to be running into it in the years following.

About the issue itself, uh... it's a good match for D (as set out at the top of the Phobos page), it's not a good match for what I want D to be.  I don't like referring to C functions directly, because of incompatible signatures, lack of exceptions, weird overloading, and extreme operating system variations in Unices - for example, sometimes errno is a symbol, sometimes it's a macro calling a function.  Purifying this variability is the first task of cross-platform work, which I do quite a lot of, and char* is one small factor of the problem.

So altogether there's no win in it for me.  toStringz shows up 38 times in the interface library dig, 0 times in the client program dedit. That's the way it should be.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation