Thread overview
char[] dstring = char* cstring ?
Oct 22, 2004
Tyro
Oct 22, 2004
Regan Heath
Oct 22, 2004
Tyro
Oct 22, 2004
Tyro
Oct 22, 2004
Regan Heath
October 22, 2004
How do I convert from a char* (C string) to a char[] (D sting)?

Thanks,
Andrew
October 22, 2004
On Fri, 22 Oct 2004 00:21:36 -0400, Tyro <ridimz_at@yahoo.dot.com> wrote:
> How do I convert from a char* (C string) to a char[] (D sting)?

The simple answer:

char* cString = "abc";
char[] dString;
dString = cString[0..strlen(cString)].dup;

D allows you to 'slice' a pointer this gives an array of the same type which refers to the original data. The 'dup' above is required if the char* is free'd etc before you're done with the char[], otherwise you can leave it off.

There is also a more complex answer because the c string data may be in any number of different encodings. So to convert from the c string to the d string you will want/need to encode from the source encoding to UTF-8 (the D string encoding).

What encoding is the c string?

If you have no idea, then it's likely Latin-1 or just ASCII, in which case the simple answer should work. I could be wrong AJ or someone else will likely correct me. :)

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
October 22, 2004
Tyro wrote:
> How do I convert from a char* (C string) to a char[] (D sting)?
> 
> Thanks,
> Andrew

Please ignore. I've figured it out.

Thanks,
Andrew
October 22, 2004
Regan Heath wrote:
> On Fri, 22 Oct 2004 00:21:36 -0400, Tyro <ridimz_at@yahoo.dot.com> wrote:
> 
>> How do I convert from a char* (C string) to a char[] (D sting)?
> 
> 
> The simple answer:
> 
> char* cString = "abc";
> char[] dString;
> dString = cString[0..strlen(cString)].dup;
> 
> D allows you to 'slice' a pointer this gives an array of the same type which refers to the original data. The 'dup' above is required if the char* is free'd etc before you're done with the char[], otherwise you can leave it off.
> 
> There is also a more complex answer because the c string data may be in any number of different encodings. So to convert from the c string to the d string you will want/need to encode from the source encoding to UTF-8 (the D string encoding).
> 
> What encoding is the c string?
> 
> If you have no idea, then it's likely Latin-1 or just ASCII, in which case the simple answer should work. I could be wrong AJ or someone else will likely correct me. :)
> 
> Regan
> 

Didn't think of that. I used a for loop checking for '\0'.
This is great. Thanks.

Andrew
October 22, 2004
On Fri, 22 Oct 2004 00:32:26 -0400, Tyro <ridimz_at@yahoo.dot.com> wrote:
> Tyro wrote:
>> How do I convert from a char* (C string) to a char[] (D sting)?
>>
>> Thanks,
>> Andrew
>
> What I meant by that was: If a function returns a pointer to an array of characters, how do I save the array pointed to in a dynamic array of characters (i.e. char[])?

I've just realised std.string contains this function:

char[] toString(char *s)
{
    return s ? s[0 .. strlen(s)] : cast(char[])null;
}

so you can just go...

char *foo();
char[] s;

s = toString(foo());

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
October 22, 2004
Regan Heath wrote:

>> How do I convert from a char* (C string) to a char[] (D sting)?
[...]
> There is also a more complex answer because the c string data may be in any number of different encodings. So to convert from the c string to the d string you will want/need to encode from the source encoding to UTF-8 (the D string encoding).
> 
> What encoding is the c string?
> 
> If you have no idea, then it's likely Latin-1 or just ASCII, in which case the simple answer should work. I could be wrong AJ or someone else will likely correct me. :)

If it is ASCII, then it works. If it contains any ISO-Latin-1 characters
(>=0x80) then those must be expanded to two code units - or give errors.

Some people have argued that C functions should be declared as "ubyte*"
instead of "char*", because of this very issue... (since the type in D that matches C's "char" is *not* D's char [which is UTF-8] - but "byte")
Most of the time, i.e ASCII strings and most C functions, char* works...

D does not have characters. It has Unicode code points, which can take
from 1 to 4 UTF-8 code units (char) or one UTF-32 (dchar), to represent.
A code point might not represent a "character", could be more or less.
One grapheme ("character") could possibly need several glyphs to encode.

See http://oss.software.ibm.com/icu/docs/papers/forms_of_unicode/

The aspects of this is that when you call char[].length (or C's strlen)
in D, you get the number of bytes. For ASCII, this is the same as the
length of the string. For Unicode (even "Latin" in UTF-8), it is *not*!
Most of the time, you can use dchar as a "character" in the old sense.

std.string mostly does ascii-only.
--anders

PS. I suggested introducing the "string" and "ustring" aliases:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/11821
> void main(string[] args)