Jump to page: 1 2
Thread overview
Bug in docs and phobos (kill printf)
Jun 25, 2002
Sandor Hojtsy
Jun 25, 2002
Martin M. Pedersen
Jun 26, 2002
Sandor Hojtsy
Jun 26, 2002
Pavel Minayev
Jun 27, 2002
Matthew Wilson
Jun 27, 2002
Pavel Minayev
Jun 27, 2002
Sean L. Palmer
Jun 27, 2002
Pavel Minayev
Jun 27, 2002
Martin M. Pedersen
Jun 27, 2002
OddesE
Jul 10, 2002
Walter
June 25, 2002
Hi,
The D spec says (note the order of the pointer and the dimension):

Memory Model:
A dynamic array consists of:
 0: pointer to array data
 4: array dimension

Interfacing to C:
Although printf is designed to handle 0 terminated strings, not D dynamic
arrays of chars,
it turns out that since D dynamic arrays are a length followed by a pointer
to the data,
the %.*s format works perfectly

First, the docs at the on the memory modell are buggy. But here are some
more important conceptual errors:
It seems that in the very first place the pointer was before the dimension
(as convetional), and later it was changed to conform to the legacy printf.
This shouldn't have caused any trouble in the client source code, because
that should *not* rely on an implementation detail such as the order of
hidden fields inside a built-in type. But now it does. I consider this a bad
practice.

OTOH, this usage *does not* conform to the specification of printf. The C standard tells:

"As noted above, a field width, or precision, or both, may be indicated by
an asterisk. In
this case, an int argument supplies the field width or precision. The
arguments
specifying field width, or precision, or both, shall appear (in that order)
before the
argument (if any) to be converted."
"If any argument is not the correct type for the corresponding
coversion specification, the behavior is undefined."

Does the standard say, that the next 4 bytes on the stack is considered as an integer encoding the precision, and the 4 bytes after it as a pointer? NO! So the example results in undefined behaviour in any compiler stricty conforming to the specification. Yes I understand that in this particular case it will work, because in your compiler the undefined behaviour always turns out to be doing the right thing. But from the specification side, this is still undefined behaviour, and any other compiler can claim to be fully conformant to the specs, and still freeze on this printf. I think you should not encourage code that results in undefined behaviour.

No go on the format string of printf. It is a string literal (?const? char[]), implicitely casted to (const char *). All string literals are terminated by a 0, which is stored past-the-end. This is an interesting idea. In short this means that string literals and string variables with the same contents are not interchangeable as function parameters. The same nuissance that occured in C++, when a fine (std::string) type was introduced, but string literals remained (const char *), or even worse (char *). I was told that D doesn't need to conform to any legacy, so it can be more effective. This is just the opposite. Why don't we have a function:

int dprintf(char [] format_str, ...)

which you can call as:

char [] d = "aa %s cc %d", e = "bb";
dprintf(d, e, 3);

resulting in "aa bb cc 3"?

I promissed bugs in phobos. Well I am not sure that this is a bug:

stream.d:
void writeLine(char[] s)
{
  writeString(s);
  write(cast(char)"\n"); // <-------
}

What is this supposed to do? Can you cast a string literal into a char?! Is
it done implicitely too? Where is it documented? Is it usefull?
Another one:

string.toStringz():
...
p = &string[0] + string.length;

// Peek past end of string[], if it's 0, no conversion necessary.
// Note that the compiler will put a 0 past the end of static
// strings, and the storage allocator will put a 0 past the end
// of newly allocated char[]'s.
if (*p == 0)        // <--------
    return string;

This is undefined behaviour again. Is reading past the end of an allocated memory block valid and guaranteed not to produce a General Protection Fault? If yes, can you please document this interesting property of the memory model?

Yours,
Sandor


June 25, 2002
Hi,

"Sandor Hojtsy" <hojtsy@index.hu> wrote in message news:af9bra$16te$1@digitaldaemon.com...
> turns out to be doing the right thing. But from the specification side,
this
> is still undefined behaviour, and any other compiler can claim to be fully conformant to the specs, and still freeze on this printf. I think you
should
> not encourage code that results in undefined behaviour.

Very good points :-)

> I promissed bugs in phobos. Well I am not sure that this is a bug:
>   write(cast(char)"\n"); // <-------
>
> What is this supposed to do? Can you cast a string literal into a char?!

You don't have character literals in D, so this is a way write this kind of thing.

> string.toStringz():
> ...
> p = &string[0] + string.length;
>
> // Peek past end of string[], if it's 0, no conversion necessary.
> // Note that the compiler will put a 0 past the end of static
> // strings, and the storage allocator will put a 0 past the end
> // of newly allocated char[]'s.
> if (*p == 0)        // <--------
>     return string;
>
> This is undefined behaviour again. Is reading past the end of an allocated memory block valid and guaranteed not to produce a General Protection
Fault?
> If yes, can you please document this interesting property of the memory model?

On some platforms, it certainly would be able to cause a GPF. Perhaps, Walter can guarantee that it does not with his Windows implementatation, and if this is the case, he is free to use dirty tricks like this. The spec needs to be portable, but the RTL does not behind the scenes.

This one reminds me of a bug I once saw under SunOS (pre-Solaris):
printf("%.*s", len, str) internally read (at least) one byte more than
specified by 'len', and thereby introduced a GPF. If this was a bug in Sun's
printf() or the user code, I cannot tell (I didn't care, as I had to fix it
anyway). But it shows that one cannot portably rely on peeking a single byte
beyond the string data.

Regards,
Martin M. Pedersen



June 26, 2002
> >   write(cast(char)"\n"); // <-------
> >
> > What is this supposed to do? Can you cast a string literal into a char?!
>
> You don't have character literals in D, so this is a way write this kind
of
> thing.

Hey I missed that point: D doesn't have character literals!
Oh no! You have to explicitely cast a string literal to a char type to get a
literal char value?
I have searched the docs, and there are nothing about casting char[] to
char.

So if you go on like:

char [] a = "long long text here";
for(int i = 0; i < a.length; i++)
{
  if(a[i] == "a") // <----
    printf("%c", a[i]);
}

Will the compare in the noted line be a nasty string compare, with constructing a temporary string value from the character in a[i], and then comparing all the (one) characters of the two strings? That would ruin the performance.

Or is this string literal converted into a char literal, just because it is 1 character long and/or compared to a char type?

Does this mean you can also cast int[] to int? And int[][] to int[] to int. Implicit and/or explicit?

Yours,
Sandor


June 26, 2002
On Wed, 26 Jun 2002 16:53:10 +0200 "Sandor Hojtsy" <hojtsy@index.hu> wrote:

> Hey I missed that point: D doesn't have character literals!
> Oh no! You have to explicitely cast a string literal to a char type to get a
> literal char value?

No. It is done implicitly whenever possible. Only when you have two versions of overloaded function, one taking char[], and another just char, then you have to use a cast.

> So if you go on like:
> 
> char [] a = "long long text here";
> for(int i = 0; i < a.length; i++)
> {
>   if(a[i] == "a") // <----
>     printf("%c", a[i]);
> }
> 
> Will the compare in the noted line be a nasty string compare, with constructing a temporary string value from the character in a[i], and then comparing all the (one) characters of the two strings? That would ruin the performance.

No, it'll be a char compare. I'll give another example where it is clearer. You can write:

	char c = getc();
	int n = c - "0";

It'll work. Obviously, "0" is a char here, not a string.

Also, this only applies to string _literals_ - variables, char or not, cannot be casted to arrays, or vice versa.
June 27, 2002
Pavel

Do you know what the reasoning behind the prohibition of char literals was?

Matthew

"Pavel Minayev" <evilone@omen.ru> wrote in message
news:CFN374338570968982@news.digitalmars.com...
On Wed, 26 Jun 2002 16:53:10 +0200 "Sandor Hojtsy" <hojtsy@index.hu> wrote:

> Hey I missed that point: D doesn't have character literals!
> Oh no! You have to explicitely cast a string literal to a char type to get
a
> literal char value?

No. It is done implicitly whenever possible. Only when you have two versions of overloaded function, one taking char[], and another just char, then you have to use a cast.

> So if you go on like:
>
> char [] a = "long long text here";
> for(int i = 0; i < a.length; i++)
> {
>   if(a[i] == "a") // <----
>     printf("%c", a[i]);
> }
>
> Will the compare in the noted line be a nasty string compare, with constructing a temporary string value from the character in a[i], and then comparing all the (one) characters of the two strings? That would ruin the performance.

No, it'll be a char compare. I'll give another example where it is clearer. You can write:

char c = getc();
int n = c - "0";

It'll work. Obviously, "0" is a char here, not a string.

Also, this only applies to string _literals_ - variables, char or not, cannot be casted to arrays, or vice versa.


June 27, 2002
On Thu, 27 Jun 2002 11:20:58 +1000 "Matthew Wilson" <matthew@thedjournal.com> wrote:

> Pavel
> 
> Do you know what the reasoning behind the prohibition of char literals was?

The reason was to simplify the language (no need to remember where to use single quotes and where double ones are required), and to free single quotes for another purpose, I think. Just to remind, in D single-quoted string literals don't support escape characters, so they are good for writing pathnames:

was	"C:\\bla\\bla\\bla"
now   'C:\bla\bla\bla'
June 27, 2002
How do you embed a single quote into a single-quoted string then?

And this difference alone makes people have to remember the difference between a single- and double-quoted string.

I liked C's char literals better.  At least you always knew what you were getting.

Sean

"Pavel Minayev" <evilone@omen.ru> wrote in message
news:CFN374344654990394@news.digitalmars.com...
On Thu, 27 Jun 2002 11:20:58 +1000 "Matthew Wilson"
<matthew@thedjournal.com>
wrote:

> Pavel
>
> Do you know what the reasoning behind the prohibition of char literals
was?

The reason was to simplify the language (no need to remember where to use single quotes and where double ones are required), and to free single quotes for another purpose, I think. Just to remind, in D single-quoted string literals don't support escape characters, so they are good for writing pathnames:

was "C:\\bla\\bla\\bla"
now   'C:\bla\bla\bla'


June 27, 2002
On Thu, 27 Jun 2002 02:04:31 -0700 "Sean L. Palmer" <seanpalmer@earthlink.net> wrote:

> How do you embed a single quote into a single-quoted string then?

'this is a' \' 'quoted' \' 'string'

> And this difference alone makes people have to remember the difference between a single- and double-quoted string.

But it's quite common. I've seen it in PHP before, and somewhere else,
I just don't remember. Besides, it's just so convenient for pathnames
under Windows. And if you don't like it, you can just always use
double quotes, after all.

> I liked C's char literals better.  At least you always knew what you were getting.

It _might_ seem confusing (it did so to me at first), but it turns out you get used to it rather quickly. BTW, Pascal programmers use that kind of thing for more than fifteen years already, and I didn't hear anyone complain!
June 27, 2002
Hi,

"Pavel Minayev" <evilone@omen.ru> wrote in message news:CFN374346986391204@news.digitalmars.com...
> But it's quite common. I've seen it in PHP before, and somewhere else,

The UNIX shell does the same thing.

Regards,
Martin M. Pedersen



June 27, 2002
"Pavel Minayev" <evilone@omen.ru> wrote in message news:CFN374346986391204@news.digitalmars.com...
> On Thu, 27 Jun 2002 02:04:31 -0700 "Sean L. Palmer"
<seanpalmer@earthlink.net>
> wrote:
>
> > How do you embed a single quote into a single-quoted string then?
>
> 'this is a' \' 'quoted' \' 'string'
>
> > And this difference alone makes people have to remember the difference between a single- and double-quoted string.
>
> But it's quite common. I've seen it in PHP before, and somewhere else, I just don't remember. Besides, it's just so convenient for pathnames under Windows. And if you don't like it, you can just always use double quotes, after all.
>
> > I liked C's char literals better.  At least you always knew what you
were
> > getting.
>
> It _might_ seem confusing (it did so to me at first), but it turns out you get used to it rather quickly. BTW, Pascal programmers use that kind of thing for more than fifteen years already, and I didn't hear anyone complain!


I started with Pascal before I used C and was
rather surprised that you had to use different
quotes for characters than for strings.
Pascal handles strings a lot better than C,
so I thought that it was like an 'advanced'
feature!  :)

I kinda liked the C style though, but this is
even better. I do a little PHP programming,
and being able to turn escaping on and off at
will is really convenient!


--
Stijn
OddesE_XYZ@hotmail.com
http://OddesE.cjb.net
_________________________________________________
Remove _XYZ from my address when replying by mail



« First   ‹ Prev
1 2