Jump to page: 1 2
Thread overview
Why ElementType!(char[3]) == dchar instead of char?
Sep 01, 2015
drug
Sep 01, 2015
drug
Sep 01, 2015
Justin Whear
Sep 01, 2015
Justin Whear
Sep 01, 2015
Justin Whear
Sep 01, 2015
drug
Sep 01, 2015
H. S. Teoh
Sep 01, 2015
Justin Whear
Sep 01, 2015
drug
Sep 01, 2015
Jonathan M Davis
Sep 02, 2015
drug
Sep 02, 2015
FreeSlave
Sep 02, 2015
drug
Sep 03, 2015
Jonathan M Davis
September 01, 2015
http://dpaste.dzfl.pl/4535c5c03126
September 01, 2015
On 01.09.2015 19:18, drug wrote:
> http://dpaste.dzfl.pl/4535c5c03126

Should I use ForeachType!(char[3]) instead of ElementType?
September 01, 2015
On Tue, 01 Sep 2015 19:18:42 +0300, drug wrote:

> http://dpaste.dzfl.pl/4535c5c03126

Arrays of char are assumed to be UTF-8 encoded text and a single char is not necessarily sufficient to represent a character.  ElementType identifies the type that you will receive when (for instance) foreaching over the array and D autodecodes the UTF-8 for you.  If you'd like to represent raw bytes use byte[3] or ubyte[3].  If you'd like other encodings, check out std.encoding.
September 01, 2015
On Tue, 01 Sep 2015 19:21:44 +0300, drug wrote:

> On 01.09.2015 19:18, drug wrote:
>> http://dpaste.dzfl.pl/4535c5c03126
> 
> Should I use ForeachType!(char[3]) instead of ElementType?

Try std.range.ElementEncodingType
September 01, 2015
On Tue, 01 Sep 2015 16:25:53 +0000, Justin Whear wrote:

> On Tue, 01 Sep 2015 19:18:42 +0300, drug wrote:
> 
>> http://dpaste.dzfl.pl/4535c5c03126
> 
> Arrays of char are assumed to be UTF-8 encoded text and a single char is not necessarily sufficient to represent a character.  ElementType identifies the type that you will receive when (for instance) foreaching over the array and D autodecodes the UTF-8 for you.  If you'd like to represent raw bytes use byte[3] or ubyte[3].  If you'd like other encodings, check out std.encoding.

I should correct this:
 * ForeachType is the element type that will inferred by a foreach loop
 * ElementType is usually the same as ForeachType but is the type of the
value returned by .front

One major distinction is that ElementType is only for ranges while ForeachType will work for iterable non-ranges.
September 01, 2015
On 01.09.2015 19:32, Justin Whear wrote:
> On Tue, 01 Sep 2015 16:25:53 +0000, Justin Whear wrote:
>
>> On Tue, 01 Sep 2015 19:18:42 +0300, drug wrote:
>>
>>> http://dpaste.dzfl.pl/4535c5c03126
>>
>> Arrays of char are assumed to be UTF-8 encoded text and a single char is
>> not necessarily sufficient to represent a character.  ElementType
>> identifies the type that you will receive when (for instance) foreaching
>> over the array and D autodecodes the UTF-8 for you.  If you'd like to
>> represent raw bytes use byte[3] or ubyte[3].  If you'd like other
>> encodings, check out std.encoding.
>
> I should correct this:
>   * ForeachType is the element type that will inferred by a foreach loop
>   * ElementType is usually the same as ForeachType but is the type of the
> value returned by .front
>
> One major distinction is that ElementType is only for ranges while
> ForeachType will work for iterable non-ranges.
>
I'm just trying to automatically convert D types to hdf5 types so I guess char[..] isn't obligatory some form of UTF-8 encoded text. Or I should treat it so?
September 01, 2015
On Tue, Sep 01, 2015 at 07:40:24PM +0300, drug via Digitalmars-d-learn wrote: [...]
> I'm just trying to automatically convert D types to hdf5 types so I guess char[..] isn't obligatory some form of UTF-8 encoded text. Or I should treat it so?

In D, char[]/wchar[]/dchar[] are intended to be UTF. If you're dealing with strings encoded with other character sets, you should use ubyte[] (or ushort[], etc.) instead.


T

-- 
EMACS = Extremely Massive And Cumbersome System
September 01, 2015
On Tue, 01 Sep 2015 19:40:24 +0300, drug wrote:

> I'm just trying to automatically convert D types to hdf5 types so I guess char[..] isn't obligatory some form of UTF-8 encoded text. Or I should treat it so?

Because of D's autodecoding it can be problematic to assume UTF-8 if other encodings are actually in use.  If, for instance, you try printing a string stored as char[] that is actually Latin-1 encoded and contains characters from the high range, you'll get a runtime UTF-8 decoding exception.  If you don't know ahead of time what the encoding will be, using ubyte[] will be safer.  The other option is to dynamically reencode strings to UTF-8 as you read them.
September 01, 2015
My case is I don't know what type user will be using, because I write a library. What's the best way to process char[..] in this case?
September 01, 2015
On Tuesday, September 01, 2015 20:05:18 drug via Digitalmars-d-learn wrote:
> My case is I don't know what type user will be using, because I write a library. What's the best way to process char[..] in this case?

char[] should never be anything other than UTF-8. Similarly, wchar[] is UTF-16, and dchar[] is UTF-32. So, if you're getting something other than UTF-8, it should not be char[]. It should be something more like ubyte[]. If you want to operate on it as char[], you should convert it to UTF-8. std.encoding may or may not help with that. But pretty much everything in D - certainly in the standard library - assumes that char, wchar, and dchar are UTF-encoded, and the language spec basically defines them that way. Technically, you _can_ put other encodings in them, but it's just asking for trouble.

- Jonathan M Davis

« First   ‹ Prev
1 2