Thread overview
ubyte[] to string with @nogc - different behaviors
Aug 05
Luhrel
Aug 05
Luhrel
Aug 05
Luhrel
Aug 05
Luhrel
August 05
Hi,

I'm currently upgrading Druntime DWARF. To do so, I have to convert some bytes to a string. However, I got different behaviors between the following codes.

Code made for the test, which works:
---
import core.stdc.stdlib;
import std.stdio;

string toString(const(ubyte)* bytes_ptr, size_t len) @nogc
{
    const(ubyte)[] bytes = bytes_ptr[0 .. len];
    char* str_ptr = cast(char*)malloc(len * ubyte.sizeof);
    string str = cast(string)str_ptr[0 .. len];
    str = cast(string)bytes[];
    return str;
}

void main()
{
    const(ubyte)[] data = [0x48, 0x61, 0x76, 0x65, 0x20, 0x61, 0x20, 0x67, 0x6f,
        0x6f, 0x64, 0x20, 0x64, 0x61, 0x79, 0x20, 0x21];

    string str = toString(data.ptr, data.length);
    writeln(str);  // Works
}
---

Code that I want to implement in rt.backtrace.dwarf:
---
extern(D) string readString(ref const(ubyte)[] buffer) @nogc nothrow
{
    Array!ubyte bytes;

    ubyte b = buffer.read!ubyte();
    while (b)
    {
        bytes.insertBack(b);
        b = buffer.read!ubyte();
    }

    if (bytes.length())
    {
        import core.stdc.stdio : printf;
        import core.stdc.stdlib;

        char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof);
        string str = cast(string)str_ptr[0 .. bytes.length()];
        str = cast(string)bytes[];

        printf("String: %s\tLength: %lld\n", str.ptr, bytes.length);  // Works

        return str;
    }
    return null;
}

unittest
{
    const(ubyte)[] data = [0x48, 0x61, 0x76, 0x65, 0x20, 0x61, 0x20, 0x67, 0x6f,
        0x6f, 0x64, 0x20, 0x64, 0x61, 0x79, 0x20, 0x21, 0x00];

    import core.stdc.stdio : printf;

    string res = data.readString();
    printf("String: %s\tLength: %lld\n", res.ptr, res.length);  // Outputs `w�h�U

    assert(res == "Have a good day !");  // Fails
    assert(data == []);

    data = [0x00];
    assert(data.readString == null);
}
---

What am I missing ?
August 05
On Wednesday, 5 August 2020 at 18:17:15 UTC, Luhrel wrote:
> string toString(const(ubyte)* bytes_ptr, size_t len) @nogc
> {
>     const(ubyte)[] bytes = bytes_ptr[0 .. len];
>     char* str_ptr = cast(char*)malloc(len * ubyte.sizeof);
>     string str = cast(string)str_ptr[0 .. len];
>     str = cast(string)bytes[];

You just leaked str_ptr there. It gets overwritten by bytes' ptr.
I imagine what you meant to do was

str_ptr[0 .. len] = bytes_ptr[0 .. len];
string str = cast(string)str_ptr[0 .. len];
return str;

or something like that instead to copy the data over. (Though note this is still liable to leak since the user has no indication they need to free it...)

You mnight be best off just doing plain

return cast(string) bytes_ptr[0 .. len].idup;

as the whole function.




>         char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof);
>         string str = cast(string)str_ptr[0 .. bytes.length()];
>         str = cast(string)bytes[];

but since you again leak the new memory and just refer back to the old, it is liable to be overwritten once the function exists.


August 05
On 8/5/20 2:17 PM, Luhrel wrote:
> extern(D) string readString(ref const(ubyte)[] buffer) @nogc nothrow
> {
>      Array!ubyte bytes;

Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.

> 
>      ubyte b = buffer.read!ubyte();
>      while (b)
>      {
>          bytes.insertBack(b);
>          b = buffer.read!ubyte();
>      }
> 
>      if (bytes.length())
>      {
>          import core.stdc.stdio : printf;
>          import core.stdc.stdlib;
> 
>          char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof);
>          string str = cast(string)str_ptr[0 .. bytes.length()];
>          str = cast(string)bytes[];

Adam is right, you are not copying data here, just reassigning the reference.

Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.

What you likely want is:
str[] = bytes.data[];

This will copy all the data from the Array to the string.

However, I don't really understand the point of this function. What does buffer.read!ubyte do?

If you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.

> 
>          printf("String: %s\tLength: %lld\n", str.ptr, bytes.length);  // Works
> 
>          return str;
>      }
>      return null;
> }

At this point the data used by bytes (and therefore pointed at by str) are freed. So you are returning a dangling pointer here.

-Steve
August 05
On Wednesday, 5 August 2020 at 18:25:37 UTC, Adam D. Ruppe wrote:
> On Wednesday, 5 August 2020 at 18:17:15 UTC, Luhrel wrote:
>> string toString(const(ubyte)* bytes_ptr, size_t len) @nogc
>> {
>>     const(ubyte)[] bytes = bytes_ptr[0 .. len];
>>     char* str_ptr = cast(char*)malloc(len * ubyte.sizeof);
>>     string str = cast(string)str_ptr[0 .. len];
>>     str = cast(string)bytes[];
>
> You just leaked str_ptr there. It gets overwritten by bytes' ptr.
> I imagine what you meant to do was
>
> str_ptr[0 .. len] = bytes_ptr[0 .. len];
> string str = cast(string)str_ptr[0 .. len];
> return str;
>
> or something like that instead to copy the data over. (Though note this is still liable to leak since the user has no indication they need to free it...)

Thanks, that's what I was looking for.

> You mnight be best off just doing plain
>
> return cast(string) bytes_ptr[0 .. len].idup;
>
> as the whole function.
>

Unfortunately, this uses the GC.

>
>
>>         char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof);
>>         string str = cast(string)str_ptr[0 .. bytes.length()];
>>         str = cast(string)bytes[];
>
> but since you again leak the new memory and just refer back to the old, it is liable to be overwritten once the function exists.

Ooooooh I see now.
The different behavior is due to `read!ubyte` which removes the byte from the buffer, that's why I can't simply write `return cast(string)bytes[];`.

August 05
On Wednesday, 5 August 2020 at 18:46:19 UTC, Steven Schveighoffer wrote:
> On 8/5/20 2:17 PM, Luhrel wrote:
>> extern(D) string readString(ref const(ubyte)[] buffer) @nogc nothrow
>> {
>>      Array!ubyte bytes;
>
> Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.
>

Oh, okay I see now.

>> 
>>      ubyte b = buffer.read!ubyte();
>>      while (b)
>>      {
>>          bytes.insertBack(b);
>>          b = buffer.read!ubyte();
>>      }
>> 
>>      if (bytes.length())
>>      {
>>          import core.stdc.stdio : printf;
>>          import core.stdc.stdlib;
>> 
>>          char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof);
>>          string str = cast(string)str_ptr[0 .. bytes.length()];
>>          str = cast(string)bytes[];
>
> Adam is right, you are not copying data here, just reassigning the reference.
>
> Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.

It's a shortcut for bytes[0 .. bytes.length]

>
> What you likely want is:
> str[] = bytes.data[];

Nope. I use the Array struct from rt.utils.container.array, which doesn't have a `data` property.

>
> This will copy all the data from the Array to the string.
>
> However, I don't really understand the point of this function. What does buffer.read!ubyte do?
>

It returns the first ubyte in the `buffer`. Once read, the ubyte is deleted from `buffer`.

> If you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.
>

I'm improving rt.backtrace.dwarf, specifically the readLineNumberProgram function, which is @nogc, the source of all my problems :D.

DWARF v5 have a special DW_LNCT_path tag, which can be in the form DW_FORM_string.
From the spec: "A string is a sequence of contiguous non-null bytes followed by one null
byte." - Commonly called a C string.

So, I need to read all non-null bytes in the buffer, until I found one. But, as readLineNumberProgram is @nogc, it's kinda hard for me (I don't come from C/C++).


>> 
>>          printf("String: %s\tLength: %lld\n", str.ptr, bytes.length);  // Works
>> 
>>          return str;
>>      }
>>      return null;
>> }
>
> At this point the data used by bytes (and therefore pointed at by str) are freed. So you are returning a dangling pointer here.
>
> -Steve
August 05
On 8/5/20 3:43 PM, Luhrel wrote:
> On Wednesday, 5 August 2020 at 18:46:19 UTC, Steven Schveighoffer wrote:
>> On 8/5/20 2:17 PM, Luhrel wrote:
>>> extern(D) string readString(ref const(ubyte)[] buffer) @nogc nothrow
>>> {
>>>      Array!ubyte bytes;
>>
>> Array is a reference counted type -- at the end of this scope, if there are no more references to it, it will *free the memory* being used.
>>
> 
> Oh, okay I see now.
> 
>>>
>>>      ubyte b = buffer.read!ubyte();
>>>      while (b)
>>>      {
>>>          bytes.insertBack(b);
>>>          b = buffer.read!ubyte();
>>>      }
>>>
>>>      if (bytes.length())
>>>      {
>>>          import core.stdc.stdio : printf;
>>>          import core.stdc.stdlib;
>>>
>>>          char* str_ptr = cast(char*)malloc(bytes.length() * ubyte.sizeof);
>>>          string str = cast(string)str_ptr[0 .. bytes.length()];
>>>          str = cast(string)bytes[];
>>
>> Adam is right, you are not copying data here, just reassigning the reference.
>>
>> Not only that, but the expression bytes[] is not just a simple array, it's a range. I don't know what this does, but I wouldn't trust it.
> 
> It's a shortcut for bytes[0 .. bytes.length]

Oh crap. I was thinking this was std.container.array.Array. But it's rt.util.container.array.Array.

This is not reference counted, but does destroy its used memory on destruction.

So it's still the same problem. But the cast is not bad from what I can tell -- bytes[] returns a ubyte[].

If this were std.container.array.Array, it would not have compiled.

>>
>> What you likely want is:
>> str[] = bytes.data[];
> 
> Nope. I use the Array struct from rt.utils.container.array, which doesn't have a `data` property.

Yep, my mistake. Wrong Array type.

But still, you are mallocing a block and throwing it away.

And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data.

I think there is likely a better way to do what you are trying to do.

> 
>>
>> This will copy all the data from the Array to the string.
>>
>> However, I don't really understand the point of this function. What does buffer.read!ubyte do?
>>
> 
> It returns the first ubyte in the `buffer`. Once read, the ubyte is deleted from `buffer`.

OK, yeah I see the implementation in that file.

> 
>> If you have a more descriptive statement on what you want to accomplish, I'm sure it can be done in a simpler way than what you are trying to do here.
>>
> 
> I'm improving rt.backtrace.dwarf, specifically the readLineNumberProgram function, which is @nogc, the source of all my problems :D.
> 
> DWARF v5 have a special DW_LNCT_path tag, which can be in the form DW_FORM_string.
>  From the spec: "A string is a sequence of contiguous non-null bytes followed by one null
> byte." - Commonly called a C string.
> 
> So, I need to read all non-null bytes in the buffer, until I found one. But, as readLineNumberProgram is @nogc, it's kinda hard for me (I don't come from C/C++).

What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements.

I would think:

return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))];

or something like that.

-Steve
August 05
On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:
>
> ...
>
> But still, you are mallocing a block and throwing it away.
>
> And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data.
>
> I think there is likely a better way to do what you are trying to do.
>

Maybe with a ref parameter, but I think this will only move the problem outside the function.

>
>...
>
> What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements.
>
> I would think:
>
> return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))];
>
> or something like that.
>
> -Steve

Will try that.
August 10
On Wednesday, 5 August 2020 at 21:15:10 UTC, Luhrel wrote:
> On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:
>>
>> ...
>>
>> But still, you are mallocing a block and throwing it away.
>>
>> And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data.
>>
>> I think there is likely a better way to do what you are trying to do.
>>
>
> Maybe with a ref parameter, but I think this will only move the problem outside the function.
>
>>
>>...
>>
>> What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements.
>>
>> I would think:
>>
>> return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))];
>>
>> or something like that.
>>
>> -Steve
>
> Will try that.


What am I missing here ??
Prior to reading this post, I would have tried

 ubyte[4] ss= [0x23, 0x24, 0x25, 0x26] ;    writeln ("ss ", ss);
.. which outputs (as expected) ss  [35, 36, 37, 38]

And to convert ss to string, I would have tried :
    string sss = cast(string) ss;           writeln ("sss ", sss);
.. which outputs (as expected) sss  #$%&


So, unless I'm missing something, isn't converting ubyte[] to string super simple ?

August 10
On 8/9/20 11:56 PM, Andy Balba wrote:
> On Wednesday, 5 August 2020 at 21:15:10 UTC, Luhrel wrote:
>> On Wednesday, 5 August 2020 at 20:28:22 UTC, Steven Schveighoffer wrote:
>>>
>>> ...
>>>
>>> But still, you are mallocing a block and throwing it away.
>>>
>>> And I realize, this wouldn't have worked anyway, as str is a string, which means you can't overwrite the data.
>>>
>>> I think there is likely a better way to do what you are trying to do.
>>>
>>
>> Maybe with a ref parameter, but I think this will only move the problem outside the function.
>>
>>>
>>> ...
>>>
>>> What is the source of the incoming data? Will it be around by the time you need to use this string? Can you just slice it without copying? I'm not familiar with this code, so I don't know the requirements.
>>>
>>> I would think:
>>>
>>> return cast(const(char)[])buffer[0 .. strnlen(buffer.ptr, buffer.length))];
>>>
>>> or something like that.
>>>
>>
>> Will try that.
> 
> 
> What am I missing here ??
> Prior to reading this post, I would have tried
> 
>   ubyte[4] ss= [0x23, 0x24, 0x25, 0x26] ;    writeln ("ss ", ss);
> ... which outputs (as expected) ss  [35, 36, 37, 38]
> 
> And to convert ss to string, I would have tried :
>      string sss = cast(string) ss;           writeln ("sss ", sss);
> ... which outputs (as expected) sss  #$%&
> 
> 
> So, unless I'm missing something, isn't converting ubyte[] to string super simple ?

There were other problems in this code example, and once you fix the underlying problems, it's no longer a conversion of ubyte[] to string.

But yes, conversion between arrays is pretty easy in D.

-Steve