View mode: basic / threaded / horizontal-split · Log in · Help
February 17, 2005
change from %FC to ü
Hello,  

i get %FC string from the apache Server and will change it 
to an 'ü'. 
writef("char = %s\n", \xfc); gives me the correct "ü" . 
My idea is to split the %FC to get the "F" and the "C" an build 
a new string like '\xfce' . 
But that didn't give me the correct 'ü' . 


import std.stdio; 
private import std.outbuffer; 

int main() { 
char a='F',b='C'; 
std.outbuffer.OutBuffer buf = new OutBuffer; 
buf=new OutBuffer; 
byte by = 0x5C;  // 0x5c = \ 
buf.write(by); 
buf.write("x"); 
buf.write(a); 
buf.write(b); 
writef("buf = %s\n",buf.toString()); 
writef("char = %s\n", \xFC); 

return 0; 
}
February 17, 2005
Re: change from %FC to ü
This should do it:

char fromHex( char hex )
	if ( hex >= '0' && hex <= '9' ) {
		return hex - '0';
	}
	hex |= 0x20; // -- hex to lowercase
	if ( hex >= 'a' && hex <= 'f' ) {
		return hex - 'a' + 10;
	}
	throw new Exception("not a hex number.");
);

char = fromHex('F') << 4 | fromHex('c');

Well it could have some bugs because I did not test it. But this would 
be faster ;)

nix schreef:
> Hello,  
> 
> i get %FC string from the apache Server and will change it 
> to an 'ü'. 
> writef("char = %s\n", \xfc); gives me the correct "ü" . 
> My idea is to split the %FC to get the "F" and the "C" an build 
> a new string like '\xfce' . 
> But that didn't give me the correct 'ü' . 
> 
> 
> import std.stdio; 
> private import std.outbuffer; 
> 
> int main() { 
> char a='F',b='C'; 
> std.outbuffer.OutBuffer buf = new OutBuffer; 
> buf=new OutBuffer; 
> byte by = 0x5C;  // 0x5c = \ 
> buf.write(by); 
> buf.write("x"); 
> buf.write(a); 
> buf.write(b); 
> writef("buf = %s\n",buf.toString()); 
> writef("char = %s\n", \xFC); 
> 
> return 0; 
> } 
> 
> 
> 
> 
>
February 17, 2005
Re: change from %FC to ü
Thanks a lot.  
I have only change from char to wchar  

If anybody now why this do the same? 

wchar f = 0xfc; 
char[] e = \xfc; 
writef("f = %s\n",f); 
writef("e = %s\n",e); 

Is wchar a char with 2 bytes ? 
How can i cast from wchar to char[]? 

In article <cv1sua$2oh1$1@digitaldaemon.com>, Daan Oosterveld says...  
>  
>This should do it:  
>  
>char fromHex( char hex )  
> if ( hex >= '0' && hex <= '9' ) {  
>  return hex - '0';  
> }  
> hex |= 0x20; // -- hex to lowercase  
> if ( hex >= 'a' && hex <= 'f' ) {  
>  return hex - 'a' + 10;  
> }  
> throw new Exception("not a hex number.");  
>);  
>  
>char = fromHex('F') << 4 | fromHex('c');  
>  
>Well it could have some bugs because I did not test it. But this would   
>be faster ;)  
>  
>nix schreef:  
>> Hello,    
>>   
>> i get %FC string from the apache Server and will change it   
>> to an 'ü'.   
>> writef("char = %s\n", \xfc); gives me the correct "ü" .   
>> My idea is to split the %FC to get the "F" and the "C" an build   
>> a new string like '\xfce' .   
>> But that didn't give me the correct 'ü' .   
>>   
>>   
>> import std.stdio;   
>> private import std.outbuffer;   
>>   
>> int main() {   
>> char a='F',b='C';   
>> std.outbuffer.OutBuffer buf = new OutBuffer;   
>> buf=new OutBuffer;   
>> byte by = 0x5C;  // 0x5c = \   
>> buf.write(by);   
>> buf.write("x");   
>> buf.write(a);   
>> buf.write(b);   
>> writef("buf = %s\n",buf.toString());   
>> writef("char = %s\n", \xFC);   
>>   
>> return 0;   
>> }   
>>   
>>   
>>   
>>   
>>
February 17, 2005
Re: change from %FC to ü
char is UTF-8 (and by technicality ASCII)
wchar is UTF-16 LE/BE, and yes it is two bytes
dchar is UTF-32 LE/BE, and is four bytes

Casting between them is more-or-less transparent.  Any function with 
signature:
void foo(wchar[])

Will accept a char[], wchar[], or dchar[] as argument.  Problem is, 
DMD's implicit cast between string types just changes the byte 
bounderies.  If you actually want to translate between encodings, then 
import std.utf and use the toUTF8(), toUTF16(), and toUTF32() functions. 
 So then calling foo() with a char[] would look like:

-- Chris S

nix wrote:
> Thanks a lot.  
> I have only change from char to wchar  
> 
> If anybody now why this do the same? 
> 
> wchar f = 0xfc; 
> char[] e = \xfc; 
> writef("f = %s\n",f); 
> writef("e = %s\n",e); 
> 
> Is wchar a char with 2 bytes ? 
> How can i cast from wchar to char[]? 
>
February 17, 2005
Re: change from %FC to ü
"Chris Sauls" <ibisbasenji@gmail.com> wrote in message 
news:cv2re4$oce$1@digitaldaemon.com...
> char is UTF-8 (and by technicality ASCII)
> wchar is UTF-16 LE/BE, and yes it is two bytes
> dchar is UTF-32 LE/BE, and is four bytes
>
> Casting between them is more-or-less transparent.  Any function with 
> signature:
> void foo(wchar[])
>
> Will accept a char[], wchar[], or dchar[] as argument.

really? it errors for me:
test.d(4): function test.foo (wchar[]x) does not match argument types 
(char[])
wchart.d(4): cannot implicitly convert expression y of type char[] to 
wchar[]
February 17, 2005
Re: change from %FC to ü
On Thu, 17 Feb 2005 13:39:03 -0600, Chris Sauls <ibisbasenji@gmail.com>  
wrote:
> char is UTF-8 (and by technicality ASCII)
> wchar is UTF-16 LE/BE, and yes it is two bytes
> dchar is UTF-32 LE/BE, and is four bytes
>
> Casting between them is more-or-less transparent.  Any function with  
> signature:
> void foo(wchar[])
>
> Will accept a char[], wchar[], or dchar[] as argument.  Problem is,  
> DMD's implicit cast between string types just changes the byte  
> bounderies.

Often referred to as 'painting'.. which is odd.

I think of it as being similar to a cast from int to uint or vice-versa,  
this cast does not modify the data in any way, it simply interprets the  
data in a different way.

This is different to a cast from int to float or vice-versa, where the  
data format is actually converted from one to the other.

The program at the end is an example of my observations.

> If you actually want to translate between encodings, then import std.utf  
> and use the toUTF8(), toUTF16(), and toUTF32() functions.

"translate between encodings" == transcode.

I think explicit transcoding of char[], etc can be compared to explicit  
casts from integer types to floating point types, neither is, nor perhaps  
should be implicit (too many side effects perhaps?) but both need to  
convert the data in order to be valid.

If this change was made it would mean you couldn't paint a char as a wchar  
directly, but, you could still paint using byte[] as an intermediary. To  
me, this actually makes more sense. I also don't see it as a particularly  
large con, painting is inexpensive and it's more likely you want to  
convert than paint in the case of char[] and friends.

Further, char[] and friends have a specified encoding, so a char[] that is  
not in that encoding is invalid. The compiler ensures they're correctly  
encoded at compile time, and even at runtime in cases. It seems to make  
sense that it should convert on casts also.

Regan

# void main() {
# 	float fdata;
# 	uint udata;
# 	int data;
# 	
# 	byte[] raw;
# 	
# 	data = 5;
# 	
# 	raw = (cast(byte*)&data)[0..4];
# 	printf("Value of int: %d\n",data);
# 	printf("Value of bytes in int: ");
# 	foreach(byte b; raw)
# 		printf("%02x ",b);
# 	printf("\n\n");
# 	
# 	udata = cast(uint)data;
# 	raw = (cast(byte*)&data)[0..4];
# 	printf("Value of uint: %d\n",udata);
# 	printf("Value of bytes in uint: ");
# 	foreach(byte b; raw)
# 		printf("%02x ",b);
# 	printf("\n\n");
# 	
# 	fdata = cast(float)data;
# 	raw = (cast(byte*)&fdata)[0..4];
# 	printf("Value of float: %f\n",fdata);
# 	printf("Value of bytes in float: ");
# 	foreach(byte b; raw)
# 		printf("%02x ",b);
# 	printf("\n\n");	
# }
February 17, 2005
Re: change from %FC to ü
On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:

> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message 
> news:cv2re4$oce$1@digitaldaemon.com...
>> char is UTF-8 (and by technicality ASCII)
>> wchar is UTF-16 LE/BE, and yes it is two bytes
>> dchar is UTF-32 LE/BE, and is four bytes
>>
>> Casting between them is more-or-less transparent.  Any function with 
>> signature:
>> void foo(wchar[])
>>
>> Will accept a char[], wchar[], or dchar[] as argument.
> 
> really? it errors for me:
> test.d(4): function test.foo (wchar[]x) does not match argument types 
> (char[])
> wchart.d(4): cannot implicitly convert expression y of type char[] to 
> wchar[]

If you only have one signature with one of the 'string' forms, char[],
wchar[], or dchar[], then you can simply use it for all string literals.
However, it you attempt to pass a variable with a different data type, you
need to do an explicit conversion.

 For example ..

 void foo(wchar[] x)
 {  . . . }

 dchar[] y;

 foo(y);  // Will fail.

 foo(toUTF16(y)); // works.


You also get errors if you have two or more different signatures and supply
a string literal.

 void foo(char[] x) { . . . }
 void foo(wchar[] x) { . . . }
 void foo(dchar[] x) { . . . }

 foo("abcdef");  // will fail.
 foo(cast(dchar[])"abcdef"); // works


It would *SO NICE* if we could decorate string literals with the required
storage format. For example ...

   d"abcdef"  // A dchar[] string
   w"abcdef"  // A wchar[] string
   n"abcdef"  // A char[] string (narrow).

I know this syntax above will not actually work as we still need raw string
capabilities, but something easier that constantly typing 'cast(dchar[])'
must be able to be discovered.

-- 
Derek
Melbourne, Australia
18/02/2005 9:52:23 AM
February 17, 2005
Re: change from %FC to ü
On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek@psych.ward> wrote:
> On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:
>
>> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message
>> news:cv2re4$oce$1@digitaldaemon.com...
>>> char is UTF-8 (and by technicality ASCII)
>>> wchar is UTF-16 LE/BE, and yes it is two bytes
>>> dchar is UTF-32 LE/BE, and is four bytes
>>>
>>> Casting between them is more-or-less transparent.  Any function with
>>> signature:
>>> void foo(wchar[])
>>>
>>> Will accept a char[], wchar[], or dchar[] as argument.
>>
>> really? it errors for me:
>> test.d(4): function test.foo (wchar[]x) does not match argument types
>> (char[])
>> wchart.d(4): cannot implicitly convert expression y of type char[] to
>> wchar[]
>
> If you only have one signature with one of the 'string' forms, char[],
> wchar[], or dchar[], then you can simply use it for all string literals.
> However, it you attempt to pass a variable with a different data type,  
> you
> need to do an explicit conversion.
>
>   For example ..
>
>   void foo(wchar[] x)
>   {  . . . }
>
>   dchar[] y;
>
>   foo(y);  // Will fail.
>
>   foo(toUTF16(y)); // works.

This also 'works' .. not! It compiles, but the output is garbage.

# import std.stdio;
#
# void foo(wchar[] x)
# {
# 	writefln(x);
# }
#
# void main()
# {
# 	char[] a = "test";
# 	foo(cast(wchar[])a);
# }

Can we have explicit casts between types with a specified encoding (the  
char types for example) cause transcoding, i.e. make it call toUTFxx

Please?

Regan
February 18, 2005
Re: change from %FC to ü
On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:

> On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek@psych.ward> wrote:
>> On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:
>>
>>> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message
>>> news:cv2re4$oce$1@digitaldaemon.com...
>>>> char is UTF-8 (and by technicality ASCII)
>>>> wchar is UTF-16 LE/BE, and yes it is two bytes
>>>> dchar is UTF-32 LE/BE, and is four bytes
>>>>
>>>> Casting between them is more-or-less transparent.  Any function with
>>>> signature:
>>>> void foo(wchar[])
>>>>
>>>> Will accept a char[], wchar[], or dchar[] as argument.
>>>
>>> really? it errors for me:
>>> test.d(4): function test.foo (wchar[]x) does not match argument types
>>> (char[])
>>> wchart.d(4): cannot implicitly convert expression y of type char[] to
>>> wchar[]
>>
>> If you only have one signature with one of the 'string' forms, char[],
>> wchar[], or dchar[], then you can simply use it for all string literals.
>> However, it you attempt to pass a variable with a different data type,  
>> you
>> need to do an explicit conversion.
>>
>>   For example ..
>>
>>   void foo(wchar[] x)
>>   {  . . . }
>>
>>   dchar[] y;
>>
>>   foo(y);  // Will fail.
>>
>>   foo(toUTF16(y)); // works.
> 
> This also 'works' .. not! It compiles, but the output is garbage.
> 
> # import std.stdio;
> #
> # void foo(wchar[] x)
> # {
> # 	writefln(x);
> # }
> #
> # void main()
> # {
> # 	char[] a = "test";
> # 	foo(cast(wchar[])a);
> # }

You are correct, and I didn't mention this 'technique' because, as you say,
it compiles but does not do what you'd expect.

The confusion is no doubt caused by 'cast' currently working differently
depending on the context.

For instance, when using cast on a real to get a long, it does storage
format conversion. That is, code is generated by the compiler to convert
from a 80-byte IEEE floating point format to a 64-byte signed integer
format.

However, when using cast of character arrays, it is just used to pretend
that something is really something else. So just by using cast(dchar[]) on
a char[] variable is only telling the compiler to treat the bytes in the
char[] variable as if there were already in a dchar[] arrangement.


> Can we have explicit casts between types with a specified encoding (the  
> char types for example) cause transcoding, i.e. make it call toUTFxx
> 
> Please?

Sounds nice, but I suspect that we need to have *both* capabilities
available to the coder. Namely a way to tell the compiler to convert from
one storage format to another, and a way to tell the compiler that even
though the explicit data type is 'FOO' we actually want it to be treated as
if it were really stored in RAM as a 'BAR'.

This gives the coder and the compiler some useful flexibility.

-- 
Derek
Melbourne, Australia
18/02/2005 11:07:09 AM
February 18, 2005
Re: change from %FC to ü
On Fri, 18 Feb 2005 11:26:57 +1100, Derek Parnell <derek@psych.ward> wrote:
> On Fri, 18 Feb 2005 12:47:54 +1300, Regan Heath wrote:
>
>> On Fri, 18 Feb 2005 10:06:59 +1100, Derek Parnell <derek@psych.ward>  
>> wrote:
>>> On Thu, 17 Feb 2005 16:08:39 -0500, Ben Hinkle wrote:
>>>
>>>> "Chris Sauls" <ibisbasenji@gmail.com> wrote in message
>>>> news:cv2re4$oce$1@digitaldaemon.com...
>>>>> char is UTF-8 (and by technicality ASCII)
>>>>> wchar is UTF-16 LE/BE, and yes it is two bytes
>>>>> dchar is UTF-32 LE/BE, and is four bytes
>>>>>
>>>>> Casting between them is more-or-less transparent.  Any function with
>>>>> signature:
>>>>> void foo(wchar[])
>>>>>
>>>>> Will accept a char[], wchar[], or dchar[] as argument.
>>>>
>>>> really? it errors for me:
>>>> test.d(4): function test.foo (wchar[]x) does not match argument types
>>>> (char[])
>>>> wchart.d(4): cannot implicitly convert expression y of type char[] to
>>>> wchar[]
>>>
>>> If you only have one signature with one of the 'string' forms, char[],
>>> wchar[], or dchar[], then you can simply use it for all string  
>>> literals.
>>> However, it you attempt to pass a variable with a different data type,
>>> you
>>> need to do an explicit conversion.
>>>
>>>   For example ..
>>>
>>>   void foo(wchar[] x)
>>>   {  . . . }
>>>
>>>   dchar[] y;
>>>
>>>   foo(y);  // Will fail.
>>>
>>>   foo(toUTF16(y)); // works.
>>
>> This also 'works' .. not! It compiles, but the output is garbage.
>>
>> # import std.stdio;
>> #
>> # void foo(wchar[] x)
>> # {
>> # 	writefln(x);
>> # }
>> #
>> # void main()
>> # {
>> # 	char[] a = "test";
>> # 	foo(cast(wchar[])a);
>> # }
>
> You are correct, and I didn't mention this 'technique' because, as you  
> say,
> it compiles but does not do what you'd expect.
>
> The confusion is no doubt caused by 'cast' currently working differently
> depending on the context.
>
> For instance, when using cast on a real to get a long, it does storage
> format conversion. That is, code is generated by the compiler to convert
> from a 80-byte IEEE floating point format to a 64-byte signed integer
> format.
>
> However, when using cast of character arrays, it is just used to pretend
> that something is really something else. So just by using cast(dchar[])  
> on
> a char[] variable is only telling the compiler to treat the bytes in the
> char[] variable as if there were already in a dchar[] arrangement.

Yep, see my other post this thread.

>> Can we have explicit casts between types with a specified encoding (the
>> char types for example) cause transcoding, i.e. make it call toUTFxx
>>
>> Please?
>
> Sounds nice, but I suspect that we need to have *both* capabilities
> available to the coder. Namely a way to tell the compiler to convert from
> one storage format to another, and a way to tell the compiler that even
> though the explicit data type is 'FOO' we actually want it to be treated  
> as
> if it were really stored in RAM as a 'BAR'.
>
> This gives the coder and the compiler some useful flexibility.

Yep, see my other post this thread.

:)

Regan
« First   ‹ Prev
1 2
Top | Discussion index | About this forum | D home