View mode: basic / threaded / horizontal-split · Log in · Help
January 19, 2005
Re: toStringz and predictability
"Walter" <newshound@digitalmars.com> wrote in message
news:csjffu$1qtp$1@digitaldaemon.com...
>
> "Ben Hinkle" <Ben_member@pathlink.com> wrote in message
> news:csj4hq$1cvi$1@digitaldaemon.com...
> > The reason why the length changed is that toStringz looks at one past
the
> length
> > of the string to see if it is 0 and does nothing to the string if it is.
> But the
> > sample code then changes the byte past the string by touching a
completely
> > different variable and so the toStringz result is "corrupted". I have
> toStringz
> > calls sprinkled through my code when I call C functions and now I'm
> starting to
> > get nervous about the lifespans of those strings and how to figure out
if
> they
> > are valid or not. Thoughts? Walter, is there a guideline I should
follow?
> The
> > most extreme one that comes to mind is "only call toStringz for strings
> that get
> > immediately copied".
>
> It's "COW" (Copy On Write) to the rescue. The idea is only modify a string
> that you know is unique. If you don't know it is unique, make a copy of it
> before modifying it. After the toStringz(), you're modifying the argument
to
> toStringz() but there's another reference to that string that expects it
to
> not change.

In case you need another example, I can imagine just the act of calling a
function could corrupt a toStringz result. Suppose the char[] was stored on
the stack and the last element of the array is at the very top of the stack
and that the next item after the stack is zero (and that the stack grows up
in memory). Then calling toStringz (also suppose it that call was inlined
just for simplicity) wouldn't make a copy.but calling a function after that
would push another stack frame which could potentially set a non-zero byte
immediately following the array. That would corrupt the result of toStringz.
I couldn't get this to happen on any machine I have around here but it
depends on the stack architecture and how function calls work but the
problem is still there for some architectures.

So I have a suggestion. Have toStringz always copy if the array is on the
stack. Have it never copy if the array is in the data segment (so literals
behave as they do today) and have it check the GC capacity to ask the GC for
control over the byte following the array (though the length of the array
would be unchanged). To implement this toStringz would probably have to be
moved out of std.string and into internal. If it copied everything except
literals then I can see keeping it in std.string. Anyhow, I agree wth Anders
that something should be done.

-Ben
January 19, 2005
Re: toStringz and predictability
Ben Hinkle wrote:

> So I have a suggestion. Have toStringz always copy if the array is on the
> stack. Have it never copy if the array is in the data segment (so literals
> behave as they do today) and have it check the GC capacity to ask the GC for
> control over the byte following the array (though the length of the array
> would be unchanged). To implement this toStringz would probably have to be
> moved out of std.string and into internal. If it copied everything except
> literals then I can see keeping it in std.string. Anyhow, I agree wth Anders
> that something should be done.
> 

Would this implementation work?

----------------------------------------------------------------
char* toStringzz(char[] str) {
  str.length++;
  str[length-1] = cast(char)0x00;
  return cast(char*)&str;
}
----------------------------------------------------------------

That is to say is the array resizing implementation sufficient to 
determine whether str is dynamic or static on its own and if it is 
dynamic deal wisely with cases where incrementing length might be 
sufficient? Can you break toStringzz in any of the cases that 
toStringz breaks?
January 19, 2005
Re: toStringz and predictability
"parabolis" <parabolis@softhome.net> wrote in message
news:csmbbh$444$1@digitaldaemon.com...
> Ben Hinkle wrote:
>
> > So I have a suggestion. Have toStringz always copy if the array is on
the
> > stack. Have it never copy if the array is in the data segment (so
literals
> > behave as they do today) and have it check the GC capacity to ask the GC
for
> > control over the byte following the array (though the length of the
array
> > would be unchanged). To implement this toStringz would probably have to
be
> > moved out of std.string and into internal. If it copied everything
except
> > literals then I can see keeping it in std.string. Anyhow, I agree wth
Anders
> > that something should be done.
> >
>
> Would this implementation work?
>
> ----------------------------------------------------------------
> char* toStringzz(char[] str) {
>    str.length++;
>    str[length-1] = cast(char)0x00;
>    return cast(char*)&str;
> }
> ----------------------------------------------------------------
>
> That is to say is the array resizing implementation sufficient to
> determine whether str is dynamic or static on its own and if it is
> dynamic deal wisely with cases where incrementing length might be
> sufficient? Can you break toStringzz in any of the cases that
> toStringz breaks?

Nice idea. I think it's on the right track. I've cleaned it up a bit:
char* toStringzz(char[] str) {
   str.length = str.length+1;
   str[length-1] = 0;
   return str.ptr;
}

Also it copies string literals. If there is an easy way to check if
something is a string literal we can add that to your code and have a good
solution, I think.
January 19, 2005
Re: toStringz and predictability
Ben Hinkle wrote:
> Nice idea. I think it's on the right track. I've cleaned it up a bit:
> char* toStringzz(char[] str) {
>     str.length = str.length+1;
>     str[length-1] = 0;
>     return str.ptr;
> }
> 
> Also it copies string literals. If there is an easy way to check if
> something is a string literal we can add that to your code and have a good
> solution, I think.

Hm, doesn't initialize D uninitialized chars to 0 (here str[length-1]), so
you can leave out the str[length-1] = 0; part?

Thus better:

char* toStringzz(char[] str) {
   str.length = str.length+1;
   return str.ptr;
}

But this actually alters the parameter (is this intended?)

My version would be:

char* toStringz( in char[] str )
{
 char[] new_str;
 new_str.length = str.length + 1;
 new_str[0 .. length-2] = str[0 .. length-1];
 return &new_str[0];
}

Creating a copy of the parameter, thus not changing it as you would think
for in-parameters. I checked and it works for string literals, too.
January 19, 2005
Re: toStringz and predictability
"Lukas Pinkowski" <Lukas.Pinkowski@web.de> wrote in message
news:csmfl4$a4c$1@digitaldaemon.com...
> Ben Hinkle wrote:
> > Nice idea. I think it's on the right track. I've cleaned it up a bit:
> > char* toStringzz(char[] str) {
> >     str.length = str.length+1;
> >     str[length-1] = 0;
> >     return str.ptr;
> > }
> >
> > Also it copies string literals. If there is an easy way to check if
> > something is a string literal we can add that to your code and have a
good
> > solution, I think.
>
> Hm, doesn't initialize D uninitialized chars to 0 (here str[length-1]), so
> you can leave out the str[length-1] = 0; part?

the initializer for char is 0xFF.

> Thus better:
>
> char* toStringzz(char[] str) {
>     str.length = str.length+1;
>     return str.ptr;
> }
>
> But this actually alters the parameter (is this intended?)

an array is a pointer to data and a length. Those are passed by value, so
changing the length does not change the original string passed to the
function.

> My version would be:
>
> char* toStringz( in char[] str )
> {
>   char[] new_str;
>   new_str.length = str.length + 1;
>   new_str[0 .. length-2] = str[0 .. length-1];
>   return &new_str[0];
> }
>
> Creating a copy of the parameter, thus not changing it as you would think
> for in-parameters. I checked and it works for string literals, too.

watch out for the case when new_str.ptr is str.ptr since I expect the array
copy will error if you try to copy overlapping arrays.
January 19, 2005
Re: toStringz and predictability
Ben Hinkle wrote:
> There's something about toStringz that has me uncomfortable. Consider this code:

There is something else that you should be uncomfortable about - the 
domains of C strings and D strings are not the same. The toStringz 
function is so named because C strings are 'Z'ero (or null) 
terminated. That implies they cannot contain a null character yet D 
strings have no such silly limitations. So the toStringz function 
should probably look like this:

----------------------------------------------------------------
char* toStringz(char[] dStr) {
  char[] cStr = new char[dStr.length+1];
  foreach(int i, char dChar; dStr) {
    if(!(cStr[i] = dChar)) throw new Exception("Null char");
  }
  return &cStr;
----------------------------------------------------------------

Now seems like a great time for plugging the unless/until feature of 
Perl as being nice in this context allowing:

  unless(cStr[i] = dChar) throw new Exception("Null char");
January 19, 2005
Re: toStringz and predictability
another version:

char* toStringzz(char[] str) {
   str ~= 0;
   return str.ptr;
}
January 19, 2005
Re: toStringz and predictability
"parabolis" <parabolis@softhome.net> wrote in message 
news:csmiqa$edp$1@digitaldaemon.com...
> Ben Hinkle wrote:
>> There's something about toStringz that has me uncomfortable. Consider 
>> this code:
>
> There is something else that you should be uncomfortable about - the 
> domains of C strings and D strings are not the same. The toStringz 
> function is so named because C strings are 'Z'ero (or null) terminated. 
> That implies they cannot contain a null character yet D strings have no 
> such silly limitations.

I hate to disagree but.. that doesn't bother me. I don't see anything wrong 
with ignoring interior zeros. toStringz just makes sure it is 
zero-terminated - not that that aren't any internal zeros.
[snip]
January 20, 2005
Re: toStringz and predictability
Ben Hinkle wrote:
> "parabolis" <parabolis@softhome.net> wrote in message 
> news:csmiqa$edp$1@digitaldaemon.com...
> 
>>Ben Hinkle wrote:
>>
>>>There's something about toStringz that has me uncomfortable. Consider 
>>>this code:
>>
>>There is something else that you should be uncomfortable about - the 
>>domains of C strings and D strings are not the same. The toStringz 
>>function is so named because C strings are 'Z'ero (or null) terminated. 
>>That implies they cannot contain a null character yet D strings have no 
>>such silly limitations.
> 
> 
> I hate to disagree but.. that doesn't bother me. I don't see anything wrong 
> with ignoring interior zeros. toStringz just makes sure it is 
> zero-terminated - not that that aren't any internal zeros.
> [snip]
> 
> 
----------------------------------------------------------------
char* toStringz(char[] dStr, bit ignoreNullsInString = true)
----------------------------------------------------------------
January 20, 2005
Re: toStringz and predictability
In article <csjffu$1qtp$1@digitaldaemon.com>, Walter says...
>
>
>"Ben Hinkle" <Ben_member@pathlink.com> wrote in message
>news:csj4hq$1cvi$1@digitaldaemon.com...
>> The reason why the length changed is that toStringz looks at one past the
>length
>> of the string to see if it is 0 and does nothing to the string if it is.
>But the
>> sample code then changes the byte past the string by touching a completely
>> different variable and so the toStringz result is "corrupted". I have
>toStringz
>> calls sprinkled through my code when I call C functions and now I'm
>starting to
>> get nervous about the lifespans of those strings and how to figure out if
>they
>> are valid or not. Thoughts? Walter, is there a guideline I should follow?
>The
>> most extreme one that comes to mind is "only call toStringz for strings
>that get
>> immediately copied".
>
>It's "COW" (Copy On Write) to the rescue. The idea is only modify a string
>that you know is unique. If you don't know it is unique, make a copy of it
>before modifying it. After the toStringz(), you're modifying the argument to
>toStringz() but there's another reference to that string that expects it to
>not change.
>

ok, one last try. Walter, I can't tell if you still think this counts as COW. So
let me boil it down to a question. Given the code
char[1] str;
char* cstr = toStringz(str);
ubyte x = 1;
what is strlen(cstr)?
I claim the answer is compiler dependent and depends on if the compiler stuck
the storage location for x immediately following str. Sure running the code
doesn't have a problem due to word alignment etc but following the language
definition and the definition of toStringz the strlen is unknown.

-Ben
1 2 3
Top | Discussion index | About this forum | D home