Thread overview
Could not believe it!
Mar 16, 2004
Bob W
Mar 17, 2004
Ilya Minkov
Mar 17, 2004
Bob W
Mar 18, 2004
Ilya Minkov
Mar 18, 2004
Bob W
March 16, 2004
/*

Strange Bug:

This program produces erroneous outputs,
if the number of characters joined together
in the printf() function matches those in the
examples shown.


System:
  dmd V0.81 on WinXP


Output:

Look at - this!
Look at           <<<<< joined array contents duplicated

Test2: Look + at  <<<<< correct - unless 1 char removed this!


Now look at that#!  <<<<< control character inserted

*/

import std.string;

char[][] a;
char[][] b;

int main (char[][] args) {

// This produces an incorrect output:

  a~="Look";  a~="at";
  printf("\n");
  printf(join(a," ") ~ " - this!\n");


// This is fine, but when for example
// 'Test2' is changed to 'Test' the
// output will be incorrect

  printf("\n\n");
  printf("Test2: " ~ join(a," + ") ~ "\n");
  printf("this!\n");


// This puts a control character between
// 'that' and the exclamation mark

  printf("\n\n");
  b~="Now";  b~="look";  b~="at";  b~="that";
  printf(join(b," "));
  printf("!\n");

  return(0);
}



March 17, 2004
Nothing strange. printf is a C function. In C, strings are 0-terminated. In D, only literals for compatibility are, generic strings as resulting from array operations are not.

See http://www.prowiki.org/wiki4d/wiki.cgi?HowTo/printf

You can use something like toStringz to make a null-terminated string from D string.

In general, printf would either get "banned" (from public examples etc, not from library) or we put a D-specific version together. But we haven't decided yet. Stream IO from Phobos is strongly recommended over C IO unless you know exactly what you are doing.

So welcome here and good luck.

-eye



Bob W schrieb:
> /*
> 
> Strange Bug:
> 
> This program produces erroneous outputs,
> if the number of characters joined together
> in the printf() function matches those in the
> examples shown.
> 
> 
> System:
>   dmd V0.81 on WinXP
> 
> 
> Output:
> 
> Look at - this!
> Look at           <<<<< joined array contents duplicated
> 
> Test2: Look + at  <<<<< correct - unless 1 char removed
> this!
> 
> 
> Now look at that#!  <<<<< control character inserted
> 
> */
> 
> import std.string;
> 
> char[][] a;
> char[][] b;
> 
> int main (char[][] args) {
> 
> // This produces an incorrect output:
> 
>   a~="Look";  a~="at";
>   printf("\n");
>   printf(join(a," ") ~ " - this!\n");
> 
> 
> // This is fine, but when for example
> // 'Test2' is changed to 'Test' the
> // output will be incorrect
> 
>   printf("\n\n");
>   printf("Test2: " ~ join(a," + ") ~ "\n");
>   printf("this!\n");
> 
> 
> // This puts a control character between
> // 'that' and the exclamation mark
> 
>   printf("\n\n");
>   b~="Now";  b~="look";  b~="at";  b~="that";
>   printf(join(b," "));
>   printf("!\n");
> 
>   return(0);
> }
> 
> 
> 
March 17, 2004
Thanks for your reply.

If I understand you correctly the following would happen:


// This works because a literal is used as parameter1

    printf("Simple string 1\n");


// This will still work because 's' points to a literal

    char[] s="Simple string 2\n";
    printf(s);


// This is ok, because both literals are merged during compile-time

    printf("Simple " ~ "string 3\n");


// Just this one is a problem because variable and literal seem
// to be merged at runtime, so a 'genuine' D string is created.
// Furthermore the end of the new string is sitting just before a
// 16-bytes boundary, which prevents eventual zero padding to
// come as a rescue.

    char[] s="Simple ";
    printf(s ~ "string 4\n");      // quick fix: add '\0' after '\n' ?


-----------------------------------------------------

"Ilya Minkov" <minkov@cs.tum.edu> wrote in message news:c3880m$2jc1$1@digitaldaemon.com...
> Nothing strange. printf is a C function. In C, strings are 0-terminated. In D, only literals for compatibility are, generic strings as resulting from array operations are not.
>
> See http://www.prowiki.org/wiki4d/wiki.cgi?HowTo/printf
>
> You can use something like toStringz to make a null-terminated string from D string.
>
> In general, printf would either get "banned" (from public examples etc, not from library) or we put a D-specific version together. But we haven't decided yet. Stream IO from Phobos is strongly recommended over C IO unless you know exactly what you are doing.
>
> So welcome here and good luck.
>
> -eye
>
>
>
> Bob W schrieb:
> > /*
> >
> > Strange Bug:
> >
> > This program produces erroneous outputs,
> > if the number of characters joined together
> > in the printf() function matches those in the
> > examples shown.
> >
> >
> > System:
> >   dmd V0.81 on WinXP
> >
> >
> > Output:
> >
> > Look at - this!
> > Look at           <<<<< joined array contents duplicated
> >
> > Test2: Look + at  <<<<< correct - unless 1 char removed this!
> >
> >
> > Now look at that#!  <<<<< control character inserted
> >
> > */
> >
> > import std.string;
> >
> > char[][] a;
> > char[][] b;
> >
> > int main (char[][] args) {
> >
> > // This produces an incorrect output:
> >
> >   a~="Look";  a~="at";
> >   printf("\n");
> >   printf(join(a," ") ~ " - this!\n");
> >
> >
> > // This is fine, but when for example
> > // 'Test2' is changed to 'Test' the
> > // output will be incorrect
> >
> >   printf("\n\n");
> >   printf("Test2: " ~ join(a," + ") ~ "\n");
> >   printf("this!\n");
> >
> >
> > // This puts a control character between
> > // 'that' and the exclamation mark
> >
> >   printf("\n\n");
> >   b~="Now";  b~="look";  b~="at";  b~="that";
> >   printf(join(b," "));
> >   printf("!\n");
> >
> >   return(0);
> > }
> >
> >
> >


March 18, 2004
Bob W schrieb:
> Thanks for your reply.

You're welcome!

> If I understand you correctly the following would happen:

Everything is right except for:

> // This is ok, because both literals are merged during compile-time
> 
>     printf("Simple " ~ "string 3\n");

With DMD it's true, but i'm not sure it is defined whether this should be so or not. That is, current compiler does it so, but if we get popular and get rivals, there this might mean a non null terminated array. It is even not defined, whether this concatenation happens at compile time or execution time.

> // Just this one is a problem because variable and literal seem
> // to be merged at runtime, so a 'genuine' D string is created.
> // Furthermore the end of the new string is sitting just before a
> // 16-bytes boundary, which prevents eventual zero padding to
> // come as a rescue.
> 
>     char[] s="Simple ";
>     printf(s ~ "string 4\n");      // quick fix: add '\0' after '\n' ?

True. But you cannot rely on zero padding to work either. Adding \0 works, but such a string only makes sense for C functions. In D functions, this might cause problems, because when i.e. you conatinate something else to the end, you get a string with embedded 0!

The thing is, when constant strings are emitted, they are padded with zeroes at the end. At runtime, a slice (a slice is a value, consisting of data pointer and length) into the constant area is assigned to the array variable. So there is a 0 right behind the array bound. As soon as any operations increasing the length are done, the array data is requiered to be copied. A new memory area is being allocated. Thus the zeroes are lost. In fact, it is a convention to copy on any change except for slicing.

There are also other funny things which may happen, including:

* You slice into a string literal. Simplest thing is you have a string, and decrease its length, You printf it and have the string go not to its real end, but further to 0 teminator, ie original length.
* You printf using format string, which contains %s, and something afterwards. This something gets replaced by noise... %s is the wrong format, you should use ... can't remember, see the link.
* Functions can write in the arrays they get as input, but if you change the length the change is not propagated back to the caller. In other words, semantics is semi-constant, where you have to make sure you either copy an array (array.dup) before changing it - if the change needs not be propagated - or use the inout modifyer.

I think this should be in some newbee FAQ, please someone add if it's not. It's too late here.

-eye
March 18, 2004
Got the info that my last reply to this thread was misposted. I guess this one should work:



----- Original Message ----- 
From: "Ilya Minkov" <minkov@cs.tum.edu>
Newsgroups: D
Sent: Thursday, 18 March, 2004 01:02
Subject: Re: Could not believe it!


> Bob W schrieb:
> > Thanks for your reply.
>
> You're welcome!
>
> > If I understand you correctly the following would happen:
>
> Everything is right except for:
>
> > // This is ok, because both literals are merged during compile-time
> >
> >     printf("Simple " ~ "string 3\n");
>
> With DMD it's true, but i'm not sure it is defined whether this should be so or not. That is, current compiler does it so, but if we get popular and get rivals, there this might mean a non null terminated array. It is even not defined, whether this concatenation happens at compile time or execution time.

I am aware of this, but I just wanted to understand correctly the current phenomena I've experienced.


>
> > // Just this one is a problem because variable and literal seem
> > // to be merged at runtime, so a 'genuine' D string is created.
> > // Furthermore the end of the new string is sitting just before a
> > // 16-bytes boundary, which prevents eventual zero padding to
> > // come as a rescue.
> >
> >     char[] s="Simple ";
> >     printf(s ~ "string 4\n");      // quick fix: add '\0' after '\n' ?
>
> True. But you cannot rely on zero padding to work either. Adding \0 works, but such a string only makes sense for C functions. In D functions, this might cause problems, because when i.e. you conatinate something else to the end, you get a string with embedded 0!

As long as the string is used just to be displayed, I personally would not worry about a '\0' being added. Otherwise it is a potential pitfall, I agree to that.



> The thing is, when constant strings are emitted, they are padded with zeroes at the end. At runtime, a slice (a slice is a value, consisting of data pointer and length) into the constant area is assigned to the array variable. So there is a 0 right behind the array bound. As soon as any operations increasing the length are done, the array data is requiered to be copied. A new memory area is being allocated. Thus the zeroes are lost. In fact, it is a convention to copy on any change except for slicing.
>
> There are also other funny things which may happen, including:
>
> * You slice into a string literal. Simplest thing is you have a string,
> and decrease its length, You printf it and have the string go not to its
> real end, but further to 0 teminator, ie original length.
> * You printf using format string, which contains %s, and something
> afterwards. This something gets replaced by noise... %s is the wrong
> format, you should use ... can't remember, see the link.

If you are referring to the '%.*s' crutch, I was quite astonished that
there were no other measures found to get printf() to work with
D-strings. I know that D is still in the alpha stage, but offering
something like '%t' instead of '%.*s' to handle D-type of strings would
help, because printf is something almost everyone will be using during
an initial evaluation of D and beyond. Obscuring one of the most popular
conversion specifiers for printf() probably does not really help in getting
D promoted. Besides, a recent survey has shown that for some reason
C++ is loosing market share to good old C, so printf() is here to stay for
the next 100 years or so anyway .....    : )


> * Functions can write in the arrays they get as input, but if you change the length the change is not propagated back to the caller. In other words, semantics is semi-constant, where you have to make sure you either copy an array (array.dup) before changing it - if the change needs not be propagated - or use the inout modifyer.

That is good to know, thanks.



> I think this should be in some newbee FAQ, please someone add if it's not. It's too late here.
>
> -eye