char[] dstring = char* cstring ?

Oct 22, 2004

Tyro

Oct 22, 2004

Regan Heath

Oct 22, 2004

Oct 22, 2004

Oct 22, 2004

Oct 22, 2004

On Fri, 22 Oct 2004 00:21:36 -0400, Tyro <ridimz_at@yahoo.dot.com> wrote: > How do I convert from a char* (C string) to a char[] (D sting)? The simple answer: char* cString = "abc"; char[] dString; dString = cString[0..strlen(cString)].dup; D allows you to 'slice' a pointer this gives an array of the same type which refers to the original data. The 'dup' above is required if the char* is free'd etc before you're done with the char[], otherwise you can leave it off. There is also a more complex answer because the c string data may be in any number of different encodings. So to convert from the c string to the d string you will want/need to encode from the source encoding to UTF-8 (the D string encoding). What encoding is the c string? If you have no idea, then it's likely Latin-1 or just ASCII, in which case the simple answer should work. I could be wrong AJ or someone else will likely correct me. :) Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Regan Heath wrote: > On Fri, 22 Oct 2004 00:21:36 -0400, Tyro <ridimz_at@yahoo.dot.com> wrote: > >> How do I convert from a char* (C string) to a char[] (D sting)? > > > The simple answer: > > char* cString = "abc"; > char[] dString; > dString = cString[0..strlen(cString)].dup; > > D allows you to 'slice' a pointer this gives an array of the same type which refers to the original data. The 'dup' above is required if the char* is free'd etc before you're done with the char[], otherwise you can leave it off. > > There is also a more complex answer because the c string data may be in any number of different encodings. So to convert from the c string to the d string you will want/need to encode from the source encoding to UTF-8 (the D string encoding). > > What encoding is the c string? > > If you have no idea, then it's likely Latin-1 or just ASCII, in which case the simple answer should work. I could be wrong AJ or someone else will likely correct me. :) > > Regan > Didn't think of that. I used a for loop checking for '\0'. This is great. Thanks. Andrew

On Fri, 22 Oct 2004 00:32:26 -0400, Tyro <ridimz_at@yahoo.dot.com> wrote: > Tyro wrote: >> How do I convert from a char* (C string) to a char[] (D sting)? >> >> Thanks, >> Andrew > > What I meant by that was: If a function returns a pointer to an array of characters, how do I save the array pointed to in a dynamic array of characters (i.e. char[])? I've just realised std.string contains this function: char[] toString(char *s) { return s ? s[0 .. strlen(s)] : cast(char[])null; } so you can just go... char *foo(); char[] s; s = toString(foo()); Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

October 22, 2004

Re: char[] dstring = char* cstring ?

Posted by Anders F Björklund
in reply to Regan Heath

Permalink

Anders F Björklund

Posted in reply to Regan Heath

Permalink

Regan Heath wrote:

>> How do I convert from a char* (C string) to a char[] (D sting)?
[...]
> There is also a more complex answer because the c string data may be in any number of different encodings. So to convert from the c string to the d string you will want/need to encode from the source encoding to UTF-8 (the D string encoding).
> 
> What encoding is the c string?
> 
> If you have no idea, then it's likely Latin-1 or just ASCII, in which case the simple answer should work. I could be wrong AJ or someone else will likely correct me. :)

If it is ASCII, then it works. If it contains any ISO-Latin-1 characters
(>=0x80) then those must be expanded to two code units - or give errors.

Some people have argued that C functions should be declared as "ubyte*"
instead of "char*", because of this very issue... (since the type in D that matches C's "char" is *not* D's char [which is UTF-8] - but "byte")
Most of the time, i.e ASCII strings and most C functions, char* works...

D does not have characters. It has Unicode code points, which can take
from 1 to 4 UTF-8 code units (char) or one UTF-32 (dchar), to represent.
A code point might not represent a "character", could be more or less.
One grapheme ("character") could possibly need several glyphs to encode.

See http://oss.software.ibm.com/icu/docs/papers/forms_of_unicode/

The aspects of this is that when you call char[].length (or C's strlen)
in D, you get the number of bytes. For ASCII, this is the same as the
length of the string. For Unicode (even "Latin" in UTF-8), it is *not*!
Most of the time, you can use dchar as a "character" in the old sense.

std.string mostly does ascii-only.
--anders

PS. I suggested introducing the "string" and "ustring" aliases:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/11821
> void main(string[] args)

Forums