January 19, 2005
"Walter" <newshound@digitalmars.com> wrote in message news:csjffu$1qtp$1@digitaldaemon.com...
>
> "Ben Hinkle" <Ben_member@pathlink.com> wrote in message news:csj4hq$1cvi$1@digitaldaemon.com...
> > The reason why the length changed is that toStringz looks at one past
the
> length
> > of the string to see if it is 0 and does nothing to the string if it is.
> But the
> > sample code then changes the byte past the string by touching a
completely
> > different variable and so the toStringz result is "corrupted". I have
> toStringz
> > calls sprinkled through my code when I call C functions and now I'm
> starting to
> > get nervous about the lifespans of those strings and how to figure out
if
> they
> > are valid or not. Thoughts? Walter, is there a guideline I should
follow?
> The
> > most extreme one that comes to mind is "only call toStringz for strings
> that get
> > immediately copied".
>
> It's "COW" (Copy On Write) to the rescue. The idea is only modify a string that you know is unique. If you don't know it is unique, make a copy of it before modifying it. After the toStringz(), you're modifying the argument
to
> toStringz() but there's another reference to that string that expects it
to
> not change.

In case you need another example, I can imagine just the act of calling a function could corrupt a toStringz result. Suppose the char[] was stored on the stack and the last element of the array is at the very top of the stack and that the next item after the stack is zero (and that the stack grows up in memory). Then calling toStringz (also suppose it that call was inlined just for simplicity) wouldn't make a copy.but calling a function after that would push another stack frame which could potentially set a non-zero byte immediately following the array. That would corrupt the result of toStringz. I couldn't get this to happen on any machine I have around here but it depends on the stack architecture and how function calls work but the problem is still there for some architectures.

So I have a suggestion. Have toStringz always copy if the array is on the stack. Have it never copy if the array is in the data segment (so literals behave as they do today) and have it check the GC capacity to ask the GC for control over the byte following the array (though the length of the array would be unchanged). To implement this toStringz would probably have to be moved out of std.string and into internal. If it copied everything except literals then I can see keeping it in std.string. Anyhow, I agree wth Anders that something should be done.

-Ben


January 19, 2005
Ben Hinkle wrote:

> So I have a suggestion. Have toStringz always copy if the array is on the
> stack. Have it never copy if the array is in the data segment (so literals
> behave as they do today) and have it check the GC capacity to ask the GC for
> control over the byte following the array (though the length of the array
> would be unchanged). To implement this toStringz would probably have to be
> moved out of std.string and into internal. If it copied everything except
> literals then I can see keeping it in std.string. Anyhow, I agree wth Anders
> that something should be done.
> 

Would this implementation work?

----------------------------------------------------------------
char* toStringzz(char[] str) {
  str.length++;
  str[length-1] = cast(char)0x00;
  return cast(char*)&str;
}
----------------------------------------------------------------

That is to say is the array resizing implementation sufficient to determine whether str is dynamic or static on its own and if it is dynamic deal wisely with cases where incrementing length might be sufficient? Can you break toStringzz in any of the cases that toStringz breaks?
January 19, 2005
"parabolis" <parabolis@softhome.net> wrote in message news:csmbbh$444$1@digitaldaemon.com...
> Ben Hinkle wrote:
>
> > So I have a suggestion. Have toStringz always copy if the array is on
the
> > stack. Have it never copy if the array is in the data segment (so
literals
> > behave as they do today) and have it check the GC capacity to ask the GC
for
> > control over the byte following the array (though the length of the
array
> > would be unchanged). To implement this toStringz would probably have to
be
> > moved out of std.string and into internal. If it copied everything
except
> > literals then I can see keeping it in std.string. Anyhow, I agree wth
Anders
> > that something should be done.
> >
>
> Would this implementation work?
>
> ----------------------------------------------------------------
> char* toStringzz(char[] str) {
>    str.length++;
>    str[length-1] = cast(char)0x00;
>    return cast(char*)&str;
> }
> ----------------------------------------------------------------
>
> That is to say is the array resizing implementation sufficient to determine whether str is dynamic or static on its own and if it is dynamic deal wisely with cases where incrementing length might be sufficient? Can you break toStringzz in any of the cases that toStringz breaks?

Nice idea. I think it's on the right track. I've cleaned it up a bit:
char* toStringzz(char[] str) {
    str.length = str.length+1;
    str[length-1] = 0;
    return str.ptr;
}

Also it copies string literals. If there is an easy way to check if something is a string literal we can add that to your code and have a good solution, I think.


January 19, 2005
Ben Hinkle wrote:
> Nice idea. I think it's on the right track. I've cleaned it up a bit:
> char* toStringzz(char[] str) {
>     str.length = str.length+1;
>     str[length-1] = 0;
>     return str.ptr;
> }
> 
> Also it copies string literals. If there is an easy way to check if something is a string literal we can add that to your code and have a good solution, I think.

Hm, doesn't initialize D uninitialized chars to 0 (here str[length-1]), so you can leave out the str[length-1] = 0; part?

Thus better:

char* toStringzz(char[] str) {
    str.length = str.length+1;
    return str.ptr;
}

But this actually alters the parameter (is this intended?)

My version would be:

char* toStringz( in char[] str )
{
  char[] new_str;
  new_str.length = str.length + 1;
  new_str[0 .. length-2] = str[0 .. length-1];
  return &new_str[0];
}

Creating a copy of the parameter, thus not changing it as you would think for in-parameters. I checked and it works for string literals, too.
January 19, 2005
"Lukas Pinkowski" <Lukas.Pinkowski@web.de> wrote in message news:csmfl4$a4c$1@digitaldaemon.com...
> Ben Hinkle wrote:
> > Nice idea. I think it's on the right track. I've cleaned it up a bit:
> > char* toStringzz(char[] str) {
> >     str.length = str.length+1;
> >     str[length-1] = 0;
> >     return str.ptr;
> > }
> >
> > Also it copies string literals. If there is an easy way to check if something is a string literal we can add that to your code and have a
good
> > solution, I think.
>
> Hm, doesn't initialize D uninitialized chars to 0 (here str[length-1]), so you can leave out the str[length-1] = 0; part?

the initializer for char is 0xFF.

> Thus better:
>
> char* toStringzz(char[] str) {
>     str.length = str.length+1;
>     return str.ptr;
> }
>
> But this actually alters the parameter (is this intended?)

an array is a pointer to data and a length. Those are passed by value, so changing the length does not change the original string passed to the function.

> My version would be:
>
> char* toStringz( in char[] str )
> {
>   char[] new_str;
>   new_str.length = str.length + 1;
>   new_str[0 .. length-2] = str[0 .. length-1];
>   return &new_str[0];
> }
>
> Creating a copy of the parameter, thus not changing it as you would think for in-parameters. I checked and it works for string literals, too.

watch out for the case when new_str.ptr is str.ptr since I expect the array copy will error if you try to copy overlapping arrays.


January 19, 2005
Ben Hinkle wrote:
> There's something about toStringz that has me uncomfortable. Consider this code:

There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations. So the toStringz function should probably look like this:

----------------------------------------------------------------
char* toStringz(char[] dStr) {
  char[] cStr = new char[dStr.length+1];
  foreach(int i, char dChar; dStr) {
    if(!(cStr[i] = dChar)) throw new Exception("Null char");
  }
  return &cStr;
----------------------------------------------------------------

Now seems like a great time for plugging the unless/until feature of Perl as being nice in this context allowing:

  unless(cStr[i] = dChar) throw new Exception("Null char");
January 19, 2005
another version:

char* toStringzz(char[] str) {
    str ~= 0;
    return str.ptr;
}


January 19, 2005
"parabolis" <parabolis@softhome.net> wrote in message news:csmiqa$edp$1@digitaldaemon.com...
> Ben Hinkle wrote:
>> There's something about toStringz that has me uncomfortable. Consider this code:
>
> There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations.

I hate to disagree but.. that doesn't bother me. I don't see anything wrong
with ignoring interior zeros. toStringz just makes sure it is
zero-terminated - not that that aren't any internal zeros.
[snip]


January 20, 2005
Ben Hinkle wrote:
> "parabolis" <parabolis@softhome.net> wrote in message news:csmiqa$edp$1@digitaldaemon.com...
> 
>>Ben Hinkle wrote:
>>
>>>There's something about toStringz that has me uncomfortable. Consider this code:
>>
>>There is something else that you should be uncomfortable about - the domains of C strings and D strings are not the same. The toStringz function is so named because C strings are 'Z'ero (or null) terminated. That implies they cannot contain a null character yet D strings have no such silly limitations.
> 
> 
> I hate to disagree but.. that doesn't bother me. I don't see anything wrong with ignoring interior zeros. toStringz just makes sure it is zero-terminated - not that that aren't any internal zeros.
> [snip]
> 
> 
----------------------------------------------------------------
char* toStringz(char[] dStr, bit ignoreNullsInString = true)
----------------------------------------------------------------
January 20, 2005
In article <csjffu$1qtp$1@digitaldaemon.com>, Walter says...
>
>
>"Ben Hinkle" <Ben_member@pathlink.com> wrote in message news:csj4hq$1cvi$1@digitaldaemon.com...
>> The reason why the length changed is that toStringz looks at one past the
>length
>> of the string to see if it is 0 and does nothing to the string if it is.
>But the
>> sample code then changes the byte past the string by touching a completely different variable and so the toStringz result is "corrupted". I have
>toStringz
>> calls sprinkled through my code when I call C functions and now I'm
>starting to
>> get nervous about the lifespans of those strings and how to figure out if
>they
>> are valid or not. Thoughts? Walter, is there a guideline I should follow?
>The
>> most extreme one that comes to mind is "only call toStringz for strings
>that get
>> immediately copied".
>
>It's "COW" (Copy On Write) to the rescue. The idea is only modify a string that you know is unique. If you don't know it is unique, make a copy of it before modifying it. After the toStringz(), you're modifying the argument to toStringz() but there's another reference to that string that expects it to not change.
>

ok, one last try. Walter, I can't tell if you still think this counts as COW. So
let me boil it down to a question. Given the code
char[1] str;
char* cstr = toStringz(str);
ubyte x = 1;
what is strlen(cstr)?
I claim the answer is compiler dependent and depends on if the compiler stuck
the storage location for x immediately following str. Sure running the code
doesn't have a problem due to word alignment etc but following the language
definition and the definition of toStringz the strlen is unknown.

-Ben