YASQ - Proper way to convert byte[] <--> string

Jul 12, 2007

Steve Teale

Jul 12, 2007

Frits van Bommel

Jul 12, 2007

Jul 12, 2007

Jul 12, 2007

Jul 13, 2007

Jul 13, 2007

I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings. I need to extract such strings and to place strings in such a buffer. I'm using: string s = A[n .. m].dup; // n and m from prefixed string length/position return s; to get strings, and byte[] ba = cast(byte[]) s; A[n .. n+ba.length] = ba[0 .. $].dup; to put them. Are these a) sensible, b) optimal?

Steve Teale wrote: > I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings. I need to extract such strings and to place strings in such a buffer. I'm using: > > string s = A[n .. m].dup; // n and m from prefixed string length/position > return s; > > to get strings, and That should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary). > byte[] ba = cast(byte[]) s; > A[n .. n+ba.length] = ba[0 .. $].dup; > > to put them. Are these a) sensible, b) optimal? This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient: --- A[n .. n+s.length] = cast(byte[]) s; ---

Frits van Bommel Wrote: > Steve Teale wrote: > > I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings. I need to extract such strings and to place strings in such a buffer. I'm using: > > > > string s = A[n .. m].dup; // n and m from prefixed string length/position return s; > > > > to get strings, and > > That should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary). > > > byte[] ba = cast(byte[]) s; > > A[n .. n+ba.length] = ba[0 .. $].dup; > > > > to put them. Are these a) sensible, b) optimal? > > This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient: > --- > A[n .. n+s.length] = cast(byte[]) s; > --- Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.

Steve Teale wrote: > Frits van Bommel Wrote: > >> --- >> A[n .. n+s.length] = cast(byte[]) s; >> --- > > Can I use n+s.length? In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes. You noticed wrong... char[]s in D aren't very special, they're just specific array types that happen to be handled specially by some functions (such as writef*)[1]. The .length is the number of elements, and each element is a fixed size. A char is just a type representing a byte from UTF-8 text. --- import std.stdio; void main() { auto s = "\u0100"; writefln(s); writefln(s.length); writefln((cast(byte[])s).length); } --- Outputs a weird character (an A with a - on top) and two times the number 2. [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.

July 12, 2007

Re: YASQ - Proper way to convert byte[] <--> string

Posted by Steve Teale
in reply to Frits van Bommel

Permalink

Steve Teale

Posted in reply to Frits van Bommel

Permalink

Frits van Bommel Wrote:

> Steve Teale wrote:
> > Frits van Bommel Wrote:
> > 
> >> ---
> >> A[n .. n+s.length] = cast(byte[]) s;
> >> ---
> > 
> > Can I use n+s.length?  In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
> 
> You noticed wrong...
> char[]s in D aren't very special, they're just specific array types that
> happen to be handled specially by some functions (such as writef*)[1].
> The .length is the number of elements, and each element is a fixed size.
> A char is just a type representing a byte from UTF-8 text.
> ---
> import std.stdio;
> 
> void main() {
> 	auto s = "\u0100";
> 	writefln(s);
> 	writefln(s.length);
> 	writefln((cast(byte[])s).length);
> }
> ---
> Outputs a weird character (an A with a - on top) and two times the number 2.
> 
> 
> [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.

You are correct, I had misinterpreted my own test program.

0ffh wrote: > Frits van Bommel wrote: >> Outputs a weird character (an A with a - on top) [...] > > Hah, Null-A! Reading A.E. van Vogt? No, never heard of him. I just picked \u0100 because it was a round character code and it happened to be that character...

Forums