Why ElementType!(char[3]) == dchar instead of char? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Why ElementType!(char[3]) == dchar instead of char?

Thread overview

Why ElementType!(char[3]) == dchar instead of char?
Sep 01, 2015 drug
Sep 01, 2015 drug
Sep 01, 2015 Justin Whear
Sep 01, 2015 Justin Whear
Sep 01, 2015 Justin Whear
Sep 01, 2015 drug
Sep 01, 2015 H. S. Teoh
Sep 01, 2015 Justin Whear
Sep 01, 2015 drug
Sep 01, 2015 Jonathan M Davis
Sep 02, 2015 drug
Sep 02, 2015 FreeSlave
Sep 02, 2015 drug
Sep 03, 2015 Jonathan M Davis

September 01, 2015

Why ElementType!(char[3]) == dchar instead of char?

Posted by drug

drug

http://dpaste.dzfl.pl/4535c5c03126

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by drug
in reply to drug

drug

Posted in reply to drug

On 01.09.2015 19:18, drug wrote:
> http://dpaste.dzfl.pl/4535c5c03126

Should I use ForeachType!(char[3]) instead of ElementType?

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by Justin Whear
in reply to drug

Justin Whear

Posted in reply to drug

On Tue, 01 Sep 2015 19:18:42 +0300, drug wrote:

> http://dpaste.dzfl.pl/4535c5c03126

Arrays of char are assumed to be UTF-8 encoded text and a single char is not necessarily sufficient to represent a character.  ElementType identifies the type that you will receive when (for instance) foreaching over the array and D autodecodes the UTF-8 for you.  If you'd like to represent raw bytes use byte[3] or ubyte[3].  If you'd like other encodings, check out std.encoding.

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by Justin Whear
in reply to drug

Justin Whear

Posted in reply to drug

On Tue, 01 Sep 2015 19:21:44 +0300, drug wrote:

> On 01.09.2015 19:18, drug wrote:
>> http://dpaste.dzfl.pl/4535c5c03126
> 
> Should I use ForeachType!(char[3]) instead of ElementType?

Try std.range.ElementEncodingType

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by Justin Whear
in reply to Justin Whear

Justin Whear

Posted in reply to Justin Whear

On Tue, 01 Sep 2015 16:25:53 +0000, Justin Whear wrote:

> On Tue, 01 Sep 2015 19:18:42 +0300, drug wrote:
> 
>> http://dpaste.dzfl.pl/4535c5c03126
> 
> Arrays of char are assumed to be UTF-8 encoded text and a single char is not necessarily sufficient to represent a character.  ElementType identifies the type that you will receive when (for instance) foreaching over the array and D autodecodes the UTF-8 for you.  If you'd like to represent raw bytes use byte[3] or ubyte[3].  If you'd like other encodings, check out std.encoding.

I should correct this:
 * ForeachType is the element type that will inferred by a foreach loop
 * ElementType is usually the same as ForeachType but is the type of the
value returned by .front

One major distinction is that ElementType is only for ranges while ForeachType will work for iterable non-ranges.

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by drug
in reply to Justin Whear

drug

Posted in reply to Justin Whear

On 01.09.2015 19:32, Justin Whear wrote:
> On Tue, 01 Sep 2015 16:25:53 +0000, Justin Whear wrote:
>
>> On Tue, 01 Sep 2015 19:18:42 +0300, drug wrote:
>>
>>> http://dpaste.dzfl.pl/4535c5c03126
>>
>> Arrays of char are assumed to be UTF-8 encoded text and a single char is
>> not necessarily sufficient to represent a character.  ElementType
>> identifies the type that you will receive when (for instance) foreaching
>> over the array and D autodecodes the UTF-8 for you.  If you'd like to
>> represent raw bytes use byte[3] or ubyte[3].  If you'd like other
>> encodings, check out std.encoding.
>
> I should correct this:
>   * ForeachType is the element type that will inferred by a foreach loop
>   * ElementType is usually the same as ForeachType but is the type of the
> value returned by .front
>
> One major distinction is that ElementType is only for ranges while
> ForeachType will work for iterable non-ranges.
>
I'm just trying to automatically convert D types to hdf5 types so I guess char[..] isn't obligatory some form of UTF-8 encoded text. Or I should treat it so?

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by H. S. Teoh
in reply to drug

H. S. Teoh

Posted in reply to drug

On Tue, Sep 01, 2015 at 07:40:24PM +0300, drug via Digitalmars-d-learn wrote: [...]
> I'm just trying to automatically convert D types to hdf5 types so I guess char[..] isn't obligatory some form of UTF-8 encoded text. Or I should treat it so?

In D, char[]/wchar[]/dchar[] are intended to be UTF. If you're dealing with strings encoded with other character sets, you should use ubyte[] (or ushort[], etc.) instead.


T

-- 
EMACS = Extremely Massive And Cumbersome System

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by Justin Whear
in reply to drug

Justin Whear

Posted in reply to drug

On Tue, 01 Sep 2015 19:40:24 +0300, drug wrote:

> I'm just trying to automatically convert D types to hdf5 types so I guess char[..] isn't obligatory some form of UTF-8 encoded text. Or I should treat it so?

Because of D's autodecoding it can be problematic to assume UTF-8 if other encodings are actually in use.  If, for instance, you try printing a string stored as char[] that is actually Latin-1 encoded and contains characters from the high range, you'll get a runtime UTF-8 decoding exception.  If you don't know ahead of time what the encoding will be, using ubyte[] will be safer.  The other option is to dynamically reencode strings to UTF-8 as you read them.

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by drug
in reply to Justin Whear

drug

Posted in reply to Justin Whear

My case is I don't know what type user will be using, because I write a library. What's the best way to process char[..] in this case?

September 01, 2015

Re: Why ElementType!(char[3]) == dchar instead of char?

Posted by Jonathan M Davis
in reply to drug

Jonathan M Davis

Posted in reply to drug

On Tuesday, September 01, 2015 20:05:18 drug via Digitalmars-d-learn wrote:
> My case is I don't know what type user will be using, because I write a library. What's the best way to process char[..] in this case?

char[] should never be anything other than UTF-8. Similarly, wchar[] is UTF-16, and dchar[] is UTF-32. So, if you're getting something other than UTF-8, it should not be char[]. It should be something more like ubyte[]. If you want to operate on it as char[], you should convert it to UTF-8. std.encoding may or may not help with that. But pretty much everything in D - certainly in the standard library - assumes that char, wchar, and dchar are UTF-encoded, and the language spec basically defines them that way. Technically, you _can_ put other encodings in them, but it's just asking for trouble.

- Jonathan M Davis

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation