A use case for fromStringz

Mar 31, 2011

Andrej Mitrovic

Mar 31, 2011

Mar 31, 2011

Mar 31, 2011

Mar 31, 2011

Mar 31, 2011

Apr 01, 2011

Apr 01, 2011

Apr 16, 2011

Apr 16, 2011

Apr 16, 2011

March 31, 2011

A use case for fromStringz

Posted by Andrej Mitrovic

Permalink

Andrej Mitrovic

Permalink

There are situations where you have to call a C dispatch function, and pass it a void* and a selector. The selector lets you choose what the C function does, for example an enum constant selector `kGetProductName` could ask the C function to fill a null-terminated string at the location of the void* you've passed in.

One way of doing this is to pass the .ptr field of a static or dynamic char array to the C function, letting it fill the array with a null-terminated string.

But here's the problem: If you try to print out that array in D code with e.g. writefln, it will print out the _entire length_ of the array.

This is a problem because the array could quite likely be filled with garbage values after the null terminator. In fact I just had that case when interfacing with C.

to!string can convert a null-terminated C string to a D string, with the length matching the location of the null-terminator. But for char arrays, it won't do any checks for null terminators. It only does this if you explicitly pass it a char*.

So I've come up with a very simple solution:

module fromStringz2;

import std.stdio;
import std.conv;
import std.traits;
import std.string;

enum
{
    kGetProductName = 1
}

// imagine this function is defined in a C DLL
extern(C) void cDispatch(void* payload, int selector)
{
    if (selector == kGetProductName)
    {
        char* val = cast(char*)payload;
        val[0] = 'a';
        val[1] = 'b';
        val[2] = 'c';
        val[3] = '\0';
    }
}

string fromStringz(T)(T value)
{
    static if (isArray!T)
    {
        return to!string(cast(char*)value);
    }
    else
    {
        return to!string(value);
    }
}

string getNameOld()
{
    static char[256] name;
    cDispatch(name.ptr, kGetProductName);
    return to!string(name);
}

string getNameNew()
{
    static char[256] name;
    cDispatch(name.ptr, kGetProductName);
    return fromStringz(name);
}

void main()
{
    assert(getNameOld().length == 256);  // values after [3] could quite
                                                          // likely be garbage
    assert(getNameNew().length == 3);
}


I admit I didn't take Unicode into account, so its far from being perfect or safe.

In any case I think its useful to have such a function, since you generally do not want the part of a C string after the null terminator.

Actually, this still suffers from the problem when the returned char* doesn't have a null terminator. It really sucks when C code does that, and I've just experienced that. There is a solution though: Since we can detect the length of the D array passed into `fromStringz`, we can do the job of to!string ourselves and check for a null terminator. If one isn't found, we return a string of length 0. Here's an updated version which doesn't suffer from the missing null terminator problem: string fromStringz(T)(T value) { static if (isArray!T) { if (value is null || value.length == 0) { return ""; } auto nullPos = value.indexOf("\0"); if (nullPos == -1) return ""; return to!string(value[0..nullPos]); } else { return to!string(value); } }

On 3/31/11, Jesse Phillips <jessekphillips+D@gmail.com> wrote: > Why not: > > string getNameOld() > { > static char[256] name; > cDispatch(name.ptr, kGetProductName); > return to!string(name.ptr); > } > Nice catch! But see my second reply. If a null terminator is missing and we know we're operating on a D array (which has a length), then it could be best to check for a null terminator. If there isn't one it is highly likely that the array contains garbage.

Andrej Mitrovic Wrote: > Actually, this still suffers from the problem when the returned char* doesn't have a null terminator. It really sucks when C code does that, and I've just experienced that. There is a solution though: > > Since we can detect the length of the D array passed into `fromStringz`, we can do the job of to!string ourselves and check for a null terminator. If one isn't found, we return a string of length 0. Here's an updated version which doesn't suffer from the missing null terminator problem: I do not know the proper action if the string you receive is garbage. Shouldn't it throw an exception since it did not receive a string? This to me seems like a validation issue. If the functions you are calling are expected to return improper data _you_ must validate what your receive, that includes running it through utf validation.

On 3/31/11 11:18 PM, Andrej Mitrovic wrote: > Actually, this still suffers from the problem when the returned char* > doesn't have a null terminator. It really sucks when C code does that, > and I've just experienced that. There is a solution though: In those cases, doesn't the function return the length of the filled data or something like that? > Since we can detect the length of the D array passed into > `fromStringz`, we can do the job of to!string ourselves and check for > a null terminator. If one isn't found, we return a string of length 0. > Here's an updated version which doesn't suffer from the missing null > terminator problem: > > string fromStringz(T)(T value) > { > static if (isArray!T) > { > if (value is null || value.length == 0) > { > return ""; > } > > auto nullPos = value.indexOf("\0"); > > if (nullPos == -1) > return ""; > > return to!string(value[0..nullPos]); > } > else > { > return to!string(value); > } > } -- /Jacob Carlborg

On 4/1/11, Jacob Carlborg <doob@me.com> wrote: > In those cases, doesn't the function return the length of the filled data or something like that? I know what you mean. I would expect a C function to do just that, but in this case it does not. Its lame but I have to deal with it.

Hmm.. now I need a function that converts a wchar* to a wchar[] or wstring. There doesn't seem to be anything in Phobos for this type of conversion. Or maybe I haven't looked hard enough? I don't know whether this is safe since I'm not sure how the null terminator is represented in utf16, but it does seem to work ok from a few test cases: wstring fromWStringz(wchar* value) { if (value is null) return ""; auto oldPos = value; uint nullPos; while (*value++ != '\0') { nullPos++; } if (nullPos == 0) return ""; return to!wstring(oldPos[0..nullPos]); } I thought we would pay more attention to interfacing with C code. Since D is supposed to work side-by-side with C, we should have more functions that convert common data types between the two languages.

Microsoft has some of the most ridiculous functions. This one (GetEnvironmentStrings) returns a pointer to a block of null-terminated strings, with no information on the count of strings returned. Each string ends with a null-terminator, standard stuff. But only when you find two null terminators in succession you'll know that you've reached the end of the entire block of strings. So from some example code I've seen, people usually create a count variable and increment it for every null terminator in the block until they find a double null terminator. And then they have to loop all over again when constructing a list of strings. Talk about inefficient designs.. There's also a wchar* edition of this function, I don't want to even touch it. Here's what the example code looks like: char *l_EnvStr; l_EnvStr = GetEnvironmentStrings(); LPTSTR l_str = l_EnvStr; int count = 0; while (true) { if (*l_str == 0) break; while (*l_str != 0) l_str++; l_str++; count++; } for (int i = 0; i < count; i++) { printf("%s\n", l_EnvStr); while(*l_EnvStr != '\0') l_EnvStr++; l_EnvStr++; } FreeEnvironmentStrings(l_EnvStr); I wonder.. in all these years.. have they ever thought about using a convention in C where the length is embedded as a 32/64bit value at the pointed location of a pointer, followed by the array contents? I mean something like the following (I'm pseudocoding here, this is not valid C code, and it's 7 AM.): // allocate memory for the length field + character count char* mystring = malloc(sizeof(size_t) + sizeof(char)*length); *(cast(size_t*)mystring) = length; // embed the length // call a function expecting a char* printString(mystring); // void printString(char* string) { size_t length = *(cast(size_t*)string); (cast(size_t*)string)++; // skip count to reach first char // now print all chars one by one for (size_t i; i < length; i++) { printChar(*string++); } } Well, they can always use an extra parameter in a function that has the length, but it seems many people are too lazy to even do that. I guess C programmers just *love* their nulls. :p

Forums