Thread overview
YASQ - Proper way to convert byte[] <--> string
Jul 12, 2007
Steve Teale
Jul 12, 2007
Frits van Bommel
Jul 12, 2007
Steve Teale
Jul 12, 2007
Frits van Bommel
Jul 12, 2007
Steve Teale
Jul 13, 2007
0ffh
Jul 13, 2007
Frits van Bommel
July 12, 2007
I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings.  I need to extract such strings and to place strings in such a buffer.  I'm using:

string s = A[n .. m].dup;  // n and m from prefixed string length/position return s;

to get strings, and

byte[] ba = cast(byte[]) s;
A[n .. n+ba.length] = ba[0 .. $].dup;

to put them.  Are these a) sensible, b) optimal?
July 12, 2007
Steve Teale wrote:
> I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings.  I need to extract such strings and to place strings in such a buffer.  I'm using:
> 
> string s = A[n .. m].dup;  // n and m from prefixed string length/position
> return s;
> 
> to get strings, and

That should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary).

> byte[] ba = cast(byte[]) s;
> A[n .. n+ba.length] = ba[0 .. $].dup;
> 
> to put them.  Are these a) sensible, b) optimal?

This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient:
---
A[n .. n+s.length] = cast(byte[]) s;
---
July 12, 2007
Frits van Bommel Wrote:

> Steve Teale wrote:
> > I have a byte[] A that contains an AJP13 packet, presumably including UTF8 strings.  I need to extract such strings and to place strings in such a buffer.  I'm using:
> > 

> > string s = A[n .. m].dup;  // n and m from prefixed string length/position return s;
> > 
> > to get strings, and
> 
> That should work, and be optimal unless you can be sure the A array doesn't change while you still need the string (in which case the .dup is unnecessary).
> 
> > byte[] ba = cast(byte[]) s;
> > A[n .. n+ba.length] = ba[0 .. $].dup;
> > 
> > to put them.  Are these a) sensible, b) optimal?
> 
> This one should work as well, but isn't optimal; the .dup is unnecessary. This should be equivalent but more efficient:
> ---
> A[n .. n+s.length] = cast(byte[]) s;
> ---

Can I use n+s.length?  In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
July 12, 2007
Steve Teale wrote:
> Frits van Bommel Wrote:
> 
>> ---
>> A[n .. n+s.length] = cast(byte[]) s;
>> ---
> 
> Can I use n+s.length?  In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.

You noticed wrong...
char[]s in D aren't very special, they're just specific array types that happen to be handled specially by some functions (such as writef*)[1]. The .length is the number of elements, and each element is a fixed size. A char is just a type representing a byte from UTF-8 text.
---
import std.stdio;

void main() {
	auto s = "\u0100";
	writefln(s);
	writefln(s.length);
	writefln((cast(byte[])s).length);
}
---
Outputs a weird character (an A with a - on top) and two times the number 2.


[1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.
July 12, 2007
Frits van Bommel Wrote:

> Steve Teale wrote:
> > Frits van Bommel Wrote:
> > 
> >> ---
> >> A[n .. n+s.length] = cast(byte[]) s;
> >> ---
> > 
> > Can I use n+s.length?  In my experimentation i noticed that a UTF8 string containing a character using a two-byte representation definitely had an s.length of the number of characters, which was one less than the number of bytes.
> 
> You noticed wrong...
> char[]s in D aren't very special, they're just specific array types that
> happen to be handled specially by some functions (such as writef*)[1].
> The .length is the number of elements, and each element is a fixed size.
> A char is just a type representing a byte from UTF-8 text.
> ---
> import std.stdio;
> 
> void main() {
> 	auto s = "\u0100";
> 	writefln(s);
> 	writefln(s.length);
> 	writefln((cast(byte[])s).length);
> }
> ---
> Outputs a weird character (an A with a - on top) and two times the number 2.
> 
> 
> [1]: and by foreach statements as well; they can automagically extract char/wchar/dchar elements from char[]/wchar[]dchar[], in any combination.

You are correct, I had misinterpreted my own test program.

July 13, 2007
Frits van Bommel wrote:
> Outputs a weird character (an A with a - on top) [...]

Hah, Null-A! Reading A.E. van Vogt?

Regards, Frank
July 13, 2007
0ffh wrote:
> Frits van Bommel wrote:
>> Outputs a weird character (an A with a - on top) [...]
> 
> Hah, Null-A! Reading A.E. van Vogt?

No, never heard of him. I just picked \u0100 because it was a round character code and it happened to be that character...