The case for ditching char (page 5) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » The case for ditching char (page 5)

August 25, 2004

Re: The case for ditching char

Posted by antiAlias
in reply to Regan Heath

antiAlias

Posted in reply to Regan Heath

Sorry to have offended your sensibilities, dude. If "knew" is used in your part of the world then "understood" is used in mine. Too bad for the misunderstanding.

Here's a link to a post from Matthew. Given that it's a reply, I think you can safely count at least two posts that you missed <g>

news:cg69c1$120n$1@digitaldaemon.com



"Regan Heath" <regan@netwin.co.nz> wrote in message news:opsc9uihna5a2sq9@digitalmars.com...
> On Tue, 24 Aug 2004 19:45:07 -0700, antiAlias <fu@bar.com> wrote:
>
> > "Regan Heath" <regan@netwin.co.nz> wrote
> >> Is it?!
> >> I didn't realise that, so this is invalid?
> >>
> >> class A {
> >>    dchar[] toString() {}
> >> }
> >
> > Yes. It most certainly is, Regan. I (incorrectly) assumed you understood
> > that.
>
> Either:
> a. I am overly sensitive/insecure
> b. You didn't realise
> c. You're intentionally trying to belittle me
>
> because ... "understood" is not the right word "knew" is a better choice.. "understood" implies I knew but didn't understand. That isn't the case. (this time)
>
> > Sorry. There have been a number of posts that note this, and its implications.
>
> I must have missed them, or missed the importance of that fact. strange given that I read *everything* in all the D NG's on digitalmars.com.
>
> Regan
>
> --
> Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 25, 2004

Re: The case for ditching char and wchar (and renaming "dchar" as "char")

Posted by Regan Heath
in reply to antiAlias

Regan Heath

Posted in reply to antiAlias

On Tue, 24 Aug 2004 22:08:39 -0700, antiAlias <fu@bar.com> wrote:

<snip>

> What you added here seems intended to fan some imaginary flames, or to be
> argumentative purely for the sake of it, rather than to make any cohesive
> point. In fact, four out of the five items you managed to completely
> misconstrue. That may be my failing in terms of language use, so I'll accept the consequences. I will not, however, bite.

I'm sorry to have come across that way. I was simply trying to add my point of view. If I have miss-understood your comments, sorry, I don't get it right all the time (despite what I might think).

Your (miss/understood/guided/ing) friend,
Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 25, 2004

Re: The case for ditching char

Posted by Arcane Jill
in reply to Regan Heath

Arcane Jill

Posted in reply to Regan Heath

In article <opsc9m9ccg5a2sq9@digitalmars.com>, Regan Heath says...

>Is it?!
>I didn't realise that, so this is invalid?
>
>class A {
>   dchar[] toString() {}
>}

It's not invalid as such, it's just that the return type of an overloaded function has to be "covariant" with the return type of the function it's overloading. So it's a compile error /now/. But if dchar[] and char[] were to be considered mutually covariant then this would magically start to compile.

Arcane Jill

August 25, 2004

Re: The case for ditching char

Posted by Regan Heath
in reply to Arcane Jill

Regan Heath

Posted in reply to Arcane Jill

On Wed, 25 Aug 2004 05:44:18 +0000 (UTC), Arcane Jill <Arcane_member@pathlink.com> wrote:
> In article <opsc9m9ccg5a2sq9@digitalmars.com>, Regan Heath says...
>
>> Is it?!
>> I didn't realise that, so this is invalid?
>>
>> class A {
>>   dchar[] toString() {}
>> }
>
> It's not invalid as such, it's just that the return type of an overloaded
> function has to be "covariant" with the return type of the function it's
> overloading. So it's a compile error /now/. But if dchar[] and char[] were to be
> considered mutually covariant then this would magically start to compile.

Ahh.. excellent, that is what I was hoping to hear.

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 25, 2004

Re: The case for ditching char

Posted by Arcane Jill
in reply to Regan Heath

Arcane Jill

Posted in reply to Regan Heath

In article <opsc9xu9h35a2sq9@digitalmars.com>, Regan Heath says...

>Ahh.. excellent, that is what I was hoping to hear.

It's not /all/ good news however. Consider these two cases:

(1)
#    class A { wchar[] toString(); }
#    A a = new A();
#    wchar[] s = a.toString();

All hunky dory. No conversions happen. /But/

(2)
#    class A { wchar[] toString(); }
#    Object a = new A();
#    wchar[] s = a.toString();

Now /two/ conversions happen (assuming Object.toString() still returns char[]) -
toUTF8(wchar[]) followed by toUTF16(char[]). Still, that's polymorphism for you.
It is better than the status quo, but not quite as good (IMO) as having wchar[]
be the standard string type.

Arcane Jill

August 25, 2004

Re: The case for ditching char

Posted by Carlos Santander B.
in reply to Regan Heath

Carlos Santander B.

Posted in reply to Regan Heath

"Regan Heath" <regan@netwin.co.nz> escribió en el mensaje
news:opsc9un10r5a2sq9@digitalmars.com
| I assumed opCat's parameter would have to be char[], wchar[] or dchar[],
| as would it's return value. eg.
|

I don't see why it has to be only that way. ~ is the concatenation operator, so I could define:

class Set(T)
{
    Set opCat(T newElem) { ... }
}

And expect it to work the way I want it. opCat is not only for strings. And it shouldn't be.

| class B
| {
|    char[] opCat(char[] rhs){}
| }
|
| given implicit transcoding you could then say.
|
| char[]  c;
| wchar[] w;
| dchar[] d;
|
| B b = new B();
|
| char[] p;
|
| p = b ~ c;
| p = b ~ w;
| p = b ~ d;
|
| Regan
|

-----------------------
Carlos Santander Bernal

August 25, 2004

Re: The case for ditching char

Posted by Arcane Jill
in reply to Carlos Santander B.

Arcane Jill

Posted in reply to Carlos Santander B.

This is irrelevant. opCat() does not need to do anything special for D strings to work, whether we go the wchar[] route or the implicit conversion route. For wchar[]s, it already works. If we go for implicit converion, then the three different kinds of D string would be regarded as covariant by the D compiler, so expressions of the form (wchar[] ~ dchar[]) would be handled by the type promotion system, not by opCat() - just as (float + int) is handled now.

Jill



In article <cgi5q2$254f$1@digitaldaemon.com>, Carlos Santander B. says...
>
>"Regan Heath" <regan@netwin.co.nz> escribió en el mensaje
>news:opsc9un10r5a2sq9@digitalmars.com
>| I assumed opCat's parameter would have to be char[], wchar[] or dchar[],
>| as would it's return value. eg.
>|
>
>I don't see why it has to be only that way. ~ is the concatenation operator, so I could define:
>
>class Set(T)
>{
>    Set opCat(T newElem) { ... }
>}
>
>And expect it to work the way I want it. opCat is not only for strings. And it shouldn't be.
>
>| class B
>| {
>|    char[] opCat(char[] rhs){}
>| }
>|
>| given implicit transcoding you could then say.
>|
>| char[]  c;
>| wchar[] w;
>| dchar[] d;
>|
>| B b = new B();
>|
>| char[] p;
>|
>| p = b ~ c;
>| p = b ~ w;
>| p = b ~ d;
>|
>| Regan
>|
>
>-----------------------
>Carlos Santander Bernal
>
>

August 25, 2004

Re: implicit char[] conversion

Posted by Sean Kelly
in reply to antiAlias

Sean Kelly

Posted in reply to antiAlias

In article <cggtcq$1ird$1@digitaldaemon.com>, antiAlias says...
>
>
>"Regan Heath" <regan@netwin.co.nz> wrote ..
>> > What happens when there's a partial character left undecoded at the end of  'src'?
>> ---------------------------------------
>> How is that even possible?
>
>It happens all the time with streamed input. However, as AJ pointed out, neither you nor Walter are apparently suggesting that the cast() approach be used for anything other than trivial conversions. That is, one would not use this approach with respect to IO streaming. I had the (distinctly wrong) impression this implied-conversion was intended to be a jack-of-all-trades.

My modified version of std.utf is meant to address the streaming issue. Basically, I added versions of encode and decode that accept as the source or destination hook.  Not perfect perhaps, but it does get around the problem of encode/decode wanting to throw aqn exception of they encounter an invalid sequence.

>If these implicit conversions are put in place, then I respectfully suggest the std.utf functions be replaced with something that avoids fragmenting the heap in the manner they currently do (for non Latin-1); and it's not hard to make them an order-of-magnitude faster, too.

Then by all means do so :)


Sean

August 25, 2004

Re: implicit char[] conversion

Posted by Sean Kelly
in reply to Sean Kelly

Sean Kelly

Posted in reply to Sean Kelly

In article <cgidgk$28mg$1@digitaldaemon.com>, Sean Kelly says...
>
>Basically, I added versions of encode and decode that accept as the source or destination hook.

"Accept a delegate."


Sean

August 25, 2004

Re: implicit char[] conversion

Posted by Arcane Jill
in reply to Sean Kelly

Arcane Jill

Posted in reply to Sean Kelly

In article <cgidgk$28mg$1@digitaldaemon.com>, Sean Kelly says...

>>If these implicit conversions are put in place, then I respectfully suggest the std.utf functions be replaced with something that avoids fragmenting the heap in the manner they currently do (for non Latin-1); and it's not hard to make them an order-of-magnitude faster, too.
>
>Then by all means do so :)
>
>Sean

Some speed-up ideas...

I posted a potentially speedier version of UTF-8 decode here a while back. The basic algorithm I used was this: get the first byte; if it's ASCII, return it; else use it as an index into a lookup table to get the sequence length. There's slightly more to it than that, obviously, but that was the basis. Walter wanted to know if there were any standard tests to check whether a UTF-8 function works correctly. I didn't know of any.

The big difficulty with UTF-8 is that of being fully Unicode conformant. This is poorly understood, so people are often tempted to make shortcuts. The std.utf functions take no shortcuts and so are conformant.

The jist is this, however. You can have two different kinds of UTF-8 decode routine - checked or unchecked. A checked function will ensure that the input contains no invalid sequences (non-shortest sequences are always invalid), and will throw an exception (or otherwise report the error) if that's not the case. Checked decoders can be made fully conformant, but the checking can slow you down.

Unchecked decoders, on the other hand, simply /assume/ that the input is valid, and produce garbage if it isn't. Unchecked decoders can be made to go a lot faster, but they are not Unicode conformant ... unless of course you *KNOW* with 100% certainty that the input *IS* valid. (Without this knowledge, your application won't be Unicode conformant, and can actually be a security risk). So, it would be possible to write a fast, unchecked UTF-8 decoder, if you made use of D's Design by Contract. If you validate the string in the function's "in" block, then you can assume valid input in the function body, and thereby go faster (at least in a release build). But watch out for coding errors. The caller *MUST* fulfil that contract, or you have a bug. And you'd still need to have a checked UTF-8 decoder for those cases when you're not sure where the input came from.

Being able to distinguish between sequences which have already been validated, and those which have not, can buy you a lot of efficiency. Unfortunately, I don't see how D can take advantage of that. If a D string were a class or a struct, then it could have a class invariant - but D strings are just simple arrays, and constructing invalid UTF-8 arrays is all too easy.

Arcane Jill

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation