Thread overview
Implementation of char[] std.string.toString(char)
Aug 01, 2005
Stefan
Aug 01, 2005
David L. Davis
Aug 01, 2005
Derek Parnell
Aug 01, 2005
Stefan
Aug 01, 2005
Ben Hinkle
Aug 01, 2005
Stefan
Aug 01, 2005
Russ Lewis
Aug 01, 2005
Stefan
Aug 01, 2005
Russ Lewis
August 01, 2005
I recently noticed that char[] std.string.toString(char) in
Phobos (DMD 0.127) is implemented this way:

# char[] toString(char c)
# {
#   char[] result = new char[2];
#   result[0] = c;
#   result[1] = 0;
#   return result[0 .. 1];
# }


Why is it not simply

# char[] toString(char c)
# {
#  char[] result = new char[1];
#  result[0] = c;
#  return result;
# }


Can anyone shed a light on this?

Thanks in advance,
Stefan


August 01, 2005
In article <dckpo7$23vs$1@digitaldaemon.com>, Stefan says...
>
>I recently noticed that char[] std.string.toString(char) in
>Phobos (DMD 0.127) is implemented this way:
>
># char[] toString(char c)
># {
>#   char[] result = new char[2];
>#   result[0] = c;
>#   result[1] = 0;
>#   return result[0 .. 1];
># }
>
>
>Why is it not simply
>
># char[] toString(char c)
># {
>#  char[] result = new char[1];
>#  result[0] = c;
>#  return result;
># }
>
>
>Can anyone shed a light on this?
>
>Thanks in advance,
>Stefan
>
>

At first I thought it was because 'char' and 'int' (int are 2 bytes long) are
implicitly converted to one another as needed, below is an example of the
toString(char) coverting both a 'char' and a 'int' without a cast().

# //int2char.d
# private import std.stdio;
#
# char[] toString1(char c)
# {
#     char[] result = new char[2];
#     result[0] = c;
#     result[1] = 0;
#     return result[0 .. 1];
# }
#
# char[] toString2(char c)
# {
#     char[] result = new char[1];
#     result[0] = c;
#     return result;
# }
#
# int main()
# {
#     char c;
#     int  i = 67;
#
#     c = i; // no cast() needed
#     writefln("toString1(c)=\"%s\" toString1(i)=\"%s\"",
#               .toString1(c), .toString1(i));
#     writefln("toString2(c)=\"%s\" toString2(i)=\"%s\"",
#               .toString2(c), .toString2(i));
#     return 0;
# }

C:\dmd>dmd int2char.d
C:\dmd\bin\..\..\dm\bin\link.exe int2char,,,user32+kernel32/noi;

C:\dmd>int2char
toString1(c)="C" toString1(i)="C"
toString2(c)="C" toString2(i)="C"

C:\dmd>

But that's clearly not the case...umm...not sure at this point. Sorry I wasn't more helpful on the matter.

David L.

-------------------------------------------------------------------
"Dare to reach for the Stars...Dare to Dream, Build, and Achieve!"
-------------------------------------------------------------------

MKoD: http://spottedtiger.tripod.com/D_Language/D_Main_XP.html
August 01, 2005
On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:

> I recently noticed that char[] std.string.toString(char) in
> Phobos (DMD 0.127) is implemented this way:
> 
> # char[] toString(char c)
> # {
> #   char[] result = new char[2];
> #   result[0] = c;
> #   result[1] = 0;
> #   return result[0 .. 1];
> # }
> 
> 
> Why is it not simply
> 
> # char[] toString(char c)
> # {
> #  char[] result = new char[1];
> #  result[0] = c;
> #  return result;
> # }
> 
> 
> Can anyone shed a light on this?

I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.

Your alternate routine certainly returns a 1-byte string, but the byte after the string is undetermined.

-- 
Derek Parnell
Melbourne, Australia
1/08/2005 9:47:37 PM
August 01, 2005
In article <gu39ywiarmwp.1vayamiha3tm3.dlg@40tude.net>, Derek Parnell says...
>
>On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:
>
>> I recently noticed that char[] std.string.toString(char) in
>> Phobos (DMD 0.127) is implemented this way:
>> 
>> # char[] toString(char c)
>> # {
>> #   char[] result = new char[2];
>> #   result[0] = c;
>> #   result[1] = 0;
>> #   return result[0 .. 1];
>> # }
>> 
>> 
>> Why is it not simply
>> 
>> # char[] toString(char c)
>> # {
>> #  char[] result = new char[1];
>> #  result[0] = c;
>> #  return result;
>> # }
>> 
>> 
>> Can anyone shed a light on this?
>
>I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.


Hhm, I initially thought the same. But as I understand it, there are a
lot of toString() routines in there that don't zero-terminate (e.g. char[]
toString(uint u)). So, I thought I must have missed something?

Thanks for your reply,
Stefan


>
>Your alternate routine certainly returns a 1-byte string, but the byte after the string is undetermined.
>
>-- 
>Derek Parnell
>Melbourne, Australia
>1/08/2005 9:47:37 PM


August 01, 2005
"Stefan" <Stefan_member@pathlink.com> wrote in message news:dcl59f$2jhr$1@digitaldaemon.com...
> In article <gu39ywiarmwp.1vayamiha3tm3.dlg@40tude.net>, Derek Parnell says...
>>
>>On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:
>>
>>> I recently noticed that char[] std.string.toString(char) in
>>> Phobos (DMD 0.127) is implemented this way:
>>>
>>> # char[] toString(char c)
>>> # {
>>> #   char[] result = new char[2];
>>> #   result[0] = c;
>>> #   result[1] = 0;
>>> #   return result[0 .. 1];
>>> # }
>>>
>>>
>>> Why is it not simply
>>>
>>> # char[] toString(char c)
>>> # {
>>> #  char[] result = new char[1];
>>> #  result[0] = c;
>>> #  return result;
>>> # }
>>>
>>>
>>> Can anyone shed a light on this?
>>
>>I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.
>
>
> Hhm, I initially thought the same. But as I understand it, there are a
> lot of toString() routines in there that don't zero-terminate (e.g. char[]
> toString(uint u)). So, I thought I must have missed something?

Since the GC allocates in blocks of 16 bytes or more allocating a single byte will actually allocate 16 so it doesn't hurt space-wise to ask for 2. Other functions probably don't know they'll always fit in one block. Note different GCs might not behave that way.


August 01, 2005
In article <dclc4k$2qgv$1@digitaldaemon.com>, Ben Hinkle says...
>
>
>"Stefan" <Stefan_member@pathlink.com> wrote in message news:dcl59f$2jhr$1@digitaldaemon.com...
>> In article <gu39ywiarmwp.1vayamiha3tm3.dlg@40tude.net>, Derek Parnell says...
>>>
>>>On Mon, 1 Aug 2005 09:24:23 +0000 (UTC), Stefan wrote:
>>>
>>>> I recently noticed that char[] std.string.toString(char) in
>>>> Phobos (DMD 0.127) is implemented this way:
>>>>
>>>> # char[] toString(char c)
>>>> # {
>>>> #   char[] result = new char[2];
>>>> #   result[0] = c;
>>>> #   result[1] = 0;
>>>> #   return result[0 .. 1];
>>>> # }
>>>>
>>>>
>>>> Why is it not simply
>>>>
>>>> # char[] toString(char c)
>>>> # {
>>>> #  char[] result = new char[1];
>>>> #  result[0] = c;
>>>> #  return result;
>>>> # }
>>>>
>>>>
>>>> Can anyone shed a light on this?
>>>
>>>I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.
>>
>>
>> Hhm, I initially thought the same. But as I understand it, there are a
>> lot of toString() routines in there that don't zero-terminate (e.g. char[]
>> toString(uint u)). So, I thought I must have missed something?
>
>Since the GC allocates in blocks of 16 bytes or more allocating a single byte will actually allocate 16 so it doesn't hurt space-wise to ask for 2. Other functions probably don't know they'll always fit in one block. Note different GCs might not behave that way.

Yes, that might explain it. Thanks a lot.

Best regards,
Stefan


August 01, 2005
Derek Parnell wrote:
> I believe its because Walter is trying to be 'C' friendly. The returned
> 'string' must have a length of 1, because it only holds one char, but it
> must own a 2-byte memory allocation because the byte after the string must
> be zero for potential C usage.

Nearly correct.  toString() is not required to return something that has the "hidden" zero trailing it, but it's useful when it does.  Look at the implementation of toStringz() (convert to zero-terminated string). That will look at the trailing character and see if it just happens to be 0; if so, then it can convert the string without any copying.

Ofc, that implementation of toStringz() is controversial, and when you're talking about a string of length 1, the cost of copying is very small.  But I suppose that even that small of a copy might kick off a GC sweep, so it's probably not a bad idea that it works the way it does.
August 01, 2005
In article <dcldo3$2s6r$1@digitaldaemon.com>, Russ Lewis says...
>
>Derek Parnell wrote:
>> I believe its because Walter is trying to be 'C' friendly. The returned 'string' must have a length of 1, because it only holds one char, but it must own a 2-byte memory allocation because the byte after the string must be zero for potential C usage.
>
>Nearly correct.  toString() is not required to return something that has the "hidden" zero trailing it, but it's useful when it does.  Look at the implementation of toStringz() (convert to zero-terminated string). That will look at the trailing character and see if it just happens to be 0; if so, then it can convert the string without any copying.

In my Phobos source (DMD 0.127) that code is commented out.
The impl is essentially:

# char* toStringz(char[] string)
# {
#   char[] copy;
#   if (string.length == 0)
#     return "";
#
#   // Need to make a copy
#   copy = new char[string.length + 1];
#   copy[0..string.length] = string;
#   copy[string.length] = 0;
#   return copy;
# }

Or are we talking about different things here?

Best regards,
Stefan


>Ofc, that implementation of toStringz() is controversial, and when you're talking about a string of length 1, the cost of copying is very small.  But I suppose that even that small of a copy might kick off a GC sweep, so it's probably not a bad idea that it works the way it does.


August 01, 2005
Stefan wrote:
> In article <dcldo3$2s6r$1@digitaldaemon.com>, Russ Lewis says...
> 
>>Derek Parnell wrote:
>>
>>>I believe its because Walter is trying to be 'C' friendly. The returned
>>>'string' must have a length of 1, because it only holds one char, but it
>>>must own a 2-byte memory allocation because the byte after the string must
>>>be zero for potential C usage.
>>
>>Nearly correct.  toString() is not required to return something that has the "hidden" zero trailing it, but it's useful when it does.  Look at the implementation of toStringz() (convert to zero-terminated string). That will look at the trailing character and see if it just happens to be 0; if so, then it can convert the string without any copying.
> 
> 
> In my Phobos source (DMD 0.127) that code is commented out.

It appears you are right; I guess I missed the change.  Looks to me like it was commented out in version 0.113.  My thought is that, then, the implementation of toString(char) can be simplified.  At least, I don't perceive any reason not to...