Jump to page: 1 2
Thread overview
String created from buffer has wrong length and strip() result is incorrect
Oct 17, 2014
Lucas Burson
Oct 17, 2014
thedeemon
Oct 17, 2014
thedeemon
Oct 17, 2014
spir
Oct 17, 2014
Lucas Burson
Oct 17, 2014
ketmar
Oct 17, 2014
Lucas Burson
Oct 17, 2014
ketmar
Oct 18, 2014
Lucas Burson
Oct 18, 2014
ketmar
Oct 18, 2014
ketmar
Oct 18, 2014
Lucas Burson
Oct 18, 2014
ketmar
Oct 17, 2014
ketmar
October 17, 2014
When creating a string from a ubyte[], I have an invalid length and string.strip() doesn't strip off all whitespace. I'm new to the language. Is this a compiler issue?


import std.string : strip;
import std.stdio  : writefln;

int main()
{
   const string ATA_STR = " ATA ";

   // this works fine
   {
      ubyte[] buffer = [' ', 'A', 'T', 'A', ' ' ];
      string test = strip(cast(string)(buffer));
      assert(test == strip(ATA_STR));
   }

   // This is where things breaks
   {
      ubyte[] buff = new ubyte[16];
      buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);

      // read the string back from the buffer, stripping whitespace
      string stringFromBuffer = strip(cast(string)(buff[0..16]));
      // this shows strip() doesn't remove all whitespace
      writefln("StrFromBuff is '%s'; length %d", stringFromBuffer, stringFromBuffer.length);

      // !! FAILS. stringFromBuffer is length 15, not 3.
      assert(stringFromBuffer.length == strip(ATA_STR).length);

   }

   return 0;
}
October 17, 2014
On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:

>    // This is where things breaks
>    {
>       ubyte[] buff = new ubyte[16];
>       buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);
>
>       // read the string back from the buffer, stripping whitespace
>       string stringFromBuffer = strip(cast(string)(buff[0..16]));
>       // this shows strip() doesn't remove all whitespace
>       writefln("StrFromBuff is '%s'; length %d", stringFromBuffer, stringFromBuffer.length);
>
>       // !! FAILS. stringFromBuffer is length 15, not 3.
>       assert(stringFromBuffer.length == strip(ATA_STR).length);

Unlike C, strings in D are not zero-terminated by default, they are just arrays, i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to string, now you have a 16-chars string. You fill first few chars with data from ATA_STR but the rest 10 bytes of the array are still part of the string, not initialized with data, so having zeroes. Since this tail of zeroes is not whitespace (tabs or spaces etc.) 'strip' doesn't remove it.

October 17, 2014
>You fill first few chars with data from
> ATA_STR but the rest 10 bytes of the array are still part of the string

Edit: you fill first 5 chars and have 11 bytes of zeroes in the tail. My counting skill is too bad. ;)
October 17, 2014
On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote:
> On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:
>
>>    // This is where things breaks
>>    {
>>       ubyte[] buff = new ubyte[16];
>>       buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);
>>
>>       // read the string back from the buffer, stripping whitespace
>>       string stringFromBuffer = strip(cast(string)(buff[0..16]));
>>       // this shows strip() doesn't remove all whitespace
>>       writefln("StrFromBuff is '%s'; length %d", stringFromBuffer,
>> stringFromBuffer.length);
>>
>>       // !! FAILS. stringFromBuffer is length 15, not 3.
>>       assert(stringFromBuffer.length == strip(ATA_STR).length);
>
> Unlike C, strings in D are not zero-terminated by default, they are just arrays,
> i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to
> string, now you have a 16-chars string. You fill first few chars with data from
> ATA_STR but the rest 10 bytes of the array are still part of the string, not
> initialized with data, so having zeroes. Since this tail of zeroes is not
> whitespace (tabs or spaces etc.) 'strip' doesn't remove it.

Side-note: since your string has those zeroes at the end, strip only removes the space at start (thus, final size=15), instead of at both ends.

d

October 17, 2014
On Friday, 17 October 2014 at 08:31:04 UTC, spir via Digitalmars-d-learn wrote:
> On 17/10/14 09:29, thedeemon via Digitalmars-d-learn wrote:
>> On Friday, 17 October 2014 at 06:29:24 UTC, Lucas Burson wrote:
>>
>>>   // This is where things breaks
>>>   {
>>>      ubyte[] buff = new ubyte[16];
>>>      buff[0..ATA_STR.length] = cast(ubyte[])(ATA_STR);
>>>
>>>      // read the string back from the buffer, stripping whitespace
>>>      string stringFromBuffer = strip(cast(string)(buff[0..16]));
>>>      // this shows strip() doesn't remove all whitespace
>>>      writefln("StrFromBuff is '%s'; length %d", stringFromBuffer,
>>> stringFromBuffer.length);
>>>
>>>      // !! FAILS. stringFromBuffer is length 15, not 3.
>>>      assert(stringFromBuffer.length == strip(ATA_STR).length);
>>
>> Unlike C, strings in D are not zero-terminated by default, they are just arrays,
>> i.e. a pair of pointer and size. You create an array of 16 bytes and cast it to
>> string, now you have a 16-chars string. You fill first few chars with data from
>> ATA_STR but the rest 10 bytes of the array are still part of the string, not
>> initialized with data, so having zeroes. Since this tail of zeroes is not
>> whitespace (tabs or spaces etc.) 'strip' doesn't remove it.
>
> Side-note: since your string has those zeroes at the end, strip only removes the space at start (thus, final size=15), instead of at both ends.
>
> d

Okay things are becoming more clear. The cast to string is nothing like the C++ string ctor, I made a bad assumption.

So given the below buffer would I use fromStringz (is this in the stdlib?) to cast it from a null-terminated buffer to a good string? Shouldn't the compiler give a warning about casting a buffer to a string without using fromStringz?

Buffer = [ 0x20, 0x41, 0x54, 0x41, 0x20, 0x00, 0x00, ...]?
October 17, 2014
On Fri, 17 Oct 2014 15:24:21 +0000
Lucas Burson via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> So given the below buffer would I use fromStringz (is this in the stdlib?) to cast it from a null-terminated buffer to a good string? Shouldn't the compiler give a warning about casting a buffer to a string without using fromStringz?
if you are really-really sure that your buffer is null-terminated, you can use this trick:

  import std.conv;
  string s = to!string(cast(char*)buff.ptr);

please note, that this is NOT SAFE. you'd better doublecheck that your buffer is not empty and is null-terminated.


October 17, 2014
On Fri, 17 Oct 2014 18:30:43 +0300
ketmar via Digitalmars-d-learn <digitalmars-d-learn@puremagic.com>
wrote:

> > Shouldn't the compiler give a warning about casting a buffer to a string without using fromStringz?
nope. such casting is perfectly legal, as D strings can contain embedded '\0's.


October 17, 2014
On Friday, 17 October 2014 at 15:30:52 UTC, ketmar via Digitalmars-d-learn wrote:
> On Fri, 17 Oct 2014 15:24:21 +0000
> Lucas Burson via Digitalmars-d-learn
> <digitalmars-d-learn@puremagic.com> wrote:
>
>> So given the below buffer would I use fromStringz (is this in the stdlib?) to cast it from a null-terminated buffer to a good string? Shouldn't the compiler give a warning about casting a buffer to a string without using fromStringz?
> if you are really-really sure that your buffer is null-terminated, you
> can use this trick:
>
>   import std.conv;
>   string s = to!string(cast(char*)buff.ptr);
>
> please note, that this is NOT SAFE. you'd better doublecheck that your
> buffer is not empty and is null-terminated.

The buffer is populated from a scsi ioctl so it "should" be only ascii and null-terminated but it's a good idea to harden the code a bit.
Thank you for your help!
October 17, 2014
On Fri, 17 Oct 2014 16:08:04 +0000
Lucas Burson via Digitalmars-d-learn
<digitalmars-d-learn@puremagic.com> wrote:

> The buffer is populated from a scsi ioctl so it "should" be only
> ascii and null-terminated but it's a good idea to harden the code
> a bit.
> Thank you for your help!
i developed a habit of making such buffers one byte bigger than necessary and just setting the last byte to 0 before converting. this way it's guaranteed to be 0-terminated.


October 18, 2014
On Friday, 17 October 2014 at 17:40:09 UTC, ketmar via Digitalmars-d-learn wrote:

> i developed a habit of making such buffers one byte bigger than
> necessary and just setting the last byte to 0 before converting. this
> way it's guaranteed to be 0-terminated.

Perfect, great idea. Below is my utility method to pull strings out of a buffer.


/**
 * Get a string from buffer where the string spans [offset_start, offset_end).
 * Params:
 *    buffer = Buffer with an ASCII string to obtain.
 *    offset_start = Beginning byte offset within the buffer where the string starts.
 *    offset_end = Ending byte offset which is not included in the string.
 */
string bufferGetString(ubyte[] buffer, ulong offset_start, ulong offset_end)
in
{
   assert(buffer != null);
   assert(offset_start < offset_end);
   assert(offset_end <= buffer.length);
}
body
{
   ulong bufflen = offset_end - offset_start;

   // add one to the lenth for null-termination
   ubyte[] temp = new ubyte[bufflen+1];
   temp[0..bufflen] = buffer[offset_start..offset_end];
   temp[bufflen] = '\0';

   return strip(to!string(cast(const char*) temp.ptr));
}

unittest
{
   ubyte[] no_null = [' ', 'A', 'B', 'C', ' '];
   assert("ABC" == bufferGetString(no_null, 0, no_null.length));
   assert("ABC" == bufferGetString(no_null, 1, no_null.length-1));
   assert("A" == bufferGetString(no_null, 1, 2));
}
« First   ‹ Prev
1 2