== for char[] - broken - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » D » == for char[] - broken

Thread overview

== for char[] - broken
Oct 11, 2003 Matthew Wilson
Oct 11, 2003 Charles Sanders
Oct 11, 2003 Matthew Wilson
Oct 11, 2003 Walter
Oct 12, 2003 Matthew Wilson
Oct 12, 2003 Matthew Wilson
Oct 12, 2003 Matthew Wilson
Oct 12, 2003 Walter
Oct 12, 2003 Hauke Duden
Oct 12, 2003 Walter
Oct 12, 2003 Hauke Duden
Oct 12, 2003 Walter
Oct 12, 2003 Hauke Duden
Oct 12, 2003 Walter

October 11, 2003

== for char[] - broken

Posted by Matthew Wilson

Matthew Wilson

This is something that's going to come up again and again.

I'm just writing some unittests for the registry module, along the lines of

    // (ii) Catch that can throw and be caught by Exception
    {
        char[]  message =   "Test 2";
        int     code    =   3;
        char[]  string  =   "Test 2 (3)";

        try
        {
            throw new Win32Exception("Test 2", code);
        }
        catch(Exception x)
        {
            if(string != x.toString())
            {
                printf( "UnitTest failure for Win32Exception:\n"
                        "  x.toString() [%d;\"%.*s\"] does not equal
[%d;\"%.*s\"]\n"
                    ,   x.toString().length, x.toString()
                    ,   string.length, string);
            }
            assert(string == x.toString());
        }
    }

The test fires with


UnitTest failure for Win32Exception:
  x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
Error: Assertion Failure registry(180)

In other words, the strings are the same, but the arrays are not. I understand what's going on here, and why the current support for arrays of char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not apply == and != to them. It's just asking more than human nature can give.

Possible solutions:

1. Always ensure that length or a char[] represents the C-style length. This
would not allow a null terminator (so it'd be pretty hard to make it work,
no?), not to mention proscribing the often useful technique of
(pre-)allocating beyond the current extent.
2. implement == and != different for char[] than for the other arrays. After
all, if you're using char[] to hold bytes (not characters), you're ([mod:
expletives deleted]) and not availing yourselves of the full power of D's
extended (over C and C++) types.
3. Provide a separate string class that works in a "sensible" way.

Since is unworkable, and I'm out of inspiration, let's discuss 2 and 3. Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.

October 11, 2003

Re: == for char[] - broken

Posted by Charles Sanders
in reply to Matthew Wilson

Charles Sanders

Posted in reply to Matthew Wilson

Eeep!  I defintly think this deserves top priority.  Although Im a little confused, why is the length of  x.ToString() 30 ?  And also wouldn't this problem apply to other arrays of any type ?



"Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> This is something that's going to come up again and again.
>
> I'm just writing some unittests for the registry module, along the lines
of
>
>     // (ii) Catch that can throw and be caught by Exception
>     {
>         char[]  message =   "Test 2";
>         int     code    =   3;
>         char[]  string  =   "Test 2 (3)";
>
>         try
>         {
>             throw new Win32Exception("Test 2", code);
>         }
>         catch(Exception x)
>         {
>             if(string != x.toString())
>             {
>                 printf( "UnitTest failure for Win32Exception:\n"
>                         "  x.toString() [%d;\"%.*s\"] does not equal
> [%d;\"%.*s\"]\n"
>                     ,   x.toString().length, x.toString()
>                     ,   string.length, string);
>             }
>             assert(string == x.toString());
>         }
>     }
>
> The test fires with
>
>
> UnitTest failure for Win32Exception:
>   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> Error: Assertion Failure registry(180)
>
> In other words, the strings are the same, but the arrays are not. I understand what's going on here, and why the current support for arrays of char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not apply == and != to them. It's just asking more than human nature can give.
>
> Possible solutions:
>
> 1. Always ensure that length or a char[] represents the C-style length.
This
> would not allow a null terminator (so it'd be pretty hard to make it work,
> no?), not to mention proscribing the often useful technique of
> (pre-)allocating beyond the current extent.
> 2. implement == and != different for char[] than for the other arrays.
After
> all, if you're using char[] to hold bytes (not characters), you're ([mod:
> expletives deleted]) and not availing yourselves of the full power of D's
> extended (over C and C++) types.
> 3. Provide a separate string class that works in a "sensible" way.
>
> Since is unworkable, and I'm out of inspiration, let's discuss 2 and 3. Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.
>
>
>

October 11, 2003

Re: == for char[] - broken

Posted by Matthew Wilson
in reply to Charles Sanders

Matthew Wilson

Posted in reply to Charles Sanders

> Eeep!  I defintly think this deserves top priority.

Agreed

>  Although Im a little
> confused, why is the length of  x.ToString() 30 ?

I have no idea. It will be something inside the exception infrastructure that does it.

But the fact is, it is easy to have a "string" lie about its length

char[] s = "A nice string";
char[] badChar = new char[1];
badChar[0] = 0;

for(int i = 0; i < 10; ++i)
{
  s ~= badChar;
}

printf("%d;%.*s\n", s.length, s);

What do you think that should print? You get

    "23;A nice string" - kind of nasty

>  And also wouldn't this
> problem apply to other arrays of any type ?

No, because other types don't use 0 as a terminal marker, as character strings do.


> "Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> > This is something that's going to come up again and again.
> >
> > I'm just writing some unittests for the registry module, along the lines
> of
> >
> >     // (ii) Catch that can throw and be caught by Exception
> >     {
> >         char[]  message =   "Test 2";
> >         int     code    =   3;
> >         char[]  string  =   "Test 2 (3)";
> >
> >         try
> >         {
> >             throw new Win32Exception("Test 2", code);
> >         }
> >         catch(Exception x)
> >         {
> >             if(string != x.toString())
> >             {
> >                 printf( "UnitTest failure for Win32Exception:\n"
> >                         "  x.toString() [%d;\"%.*s\"] does not equal
> > [%d;\"%.*s\"]\n"
> >                     ,   x.toString().length, x.toString()
> >                     ,   string.length, string);
> >             }
> >             assert(string == x.toString());
> >         }
> >     }
> >
> > The test fires with
> >
> >
> > UnitTest failure for Win32Exception:
> >   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> > Error: Assertion Failure registry(180)
> >
> > In other words, the strings are the same, but the arrays are not. I understand what's going on here, and why the current support for arrays
of
> > char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not
apply
> > == and != to them. It's just asking more than human nature can give.
> >
> > Possible solutions:
> >
> > 1. Always ensure that length or a char[] represents the C-style length.
> This
> > would not allow a null terminator (so it'd be pretty hard to make it
work,
> > no?), not to mention proscribing the often useful technique of
> > (pre-)allocating beyond the current extent.
> > 2. implement == and != different for char[] than for the other arrays.
> After
> > all, if you're using char[] to hold bytes (not characters), you're
([mod:
> > expletives deleted]) and not availing yourselves of the full power of
D's
> > extended (over C and C++) types.
> > 3. Provide a separate string class that works in a "sensible" way.
> >
> > Since is unworkable, and I'm out of inspiration, let's discuss 2 and 3. Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.
> >
> >
> >
>
>

October 11, 2003

Re: == for char[] - broken

Posted by Walter
in reply to Matthew Wilson

Walter

Posted in reply to Matthew Wilson

What is happening here is trying to simultaneously use two different representations of strings, one with an explicit length, and one with a 0 termination. If you are going to use both in the same array, you'll need to set the explicit length properly.

You're also seeing an artifact of printf's "%.*s" format where the * is
taken to be the maximum length, not the minimum length. printf is still a C
function, and quits when it sees a 0 byte. What your string really is is:
    "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
and a D printf would print it that way.

Alternatively, what you can do is:
1) Use char[] for a D string.
2) Use char* for a C string.
Which should be clear to anyone examining the code.

More comments embedded.

"Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> This is something that's going to come up again and again.
>
> I'm just writing some unittests for the registry module, along the lines
of
>
>     // (ii) Catch that can throw and be caught by Exception
>     {
>         char[]  message =   "Test 2";
>         int     code    =   3;
>         char[]  string  =   "Test 2 (3)";
>
>         try
>         {
>             throw new Win32Exception("Test 2", code);
>         }
>         catch(Exception x)
>         {
>             if(string != x.toString())
>             {
>                 printf( "UnitTest failure for Win32Exception:\n"
>                         "  x.toString() [%d;\"%.*s\"] does not equal
> [%d;\"%.*s\"]\n"
>                     ,   x.toString().length, x.toString()
>                     ,   string.length, string);
>             }
>             assert(string == x.toString());
>         }
>     }
>
> The test fires with
>
>
> UnitTest failure for Win32Exception:
>   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> Error: Assertion Failure registry(180)
>
> In other words, the strings are the same, but the arrays are not.

No, the strings are not the same. printf just quit when it saw a '\0'. To compare null terminated strings, slice them to set the length properly, or use strcmp().

> I
> understand what's going on here, and why the current support for arrays of
> char are implementated as they are, but this is just going to keep
> recurring: you can't expect people to use char[] for strings and not apply
> == and != to them. It's just asking more than human nature can give.
>
> Possible solutions:
>
> 1. Always ensure that length or a char[] represents the C-style length.
This
> would not allow a null terminator (so it'd be pretty hard to make it work,
> no?),

Not a problem, since slices work!
    string=string[0..strlen(string)];
takes a slice of the existing array,
the terminating 0 does not go away. The .length property of an array is
*not* the allocated length, it is only guaranteed to be <= the allocated
length.

> not to mention proscribing the often useful technique of
> (pre-)allocating beyond the current extent.

Not at all. The .length is not the allocated length of an array. It's the length of slice of the allocated length. Don't worry about the allocated length, the gc will manage that for you. Only by explicitly changing the .length property is it possible that the allocated length changes; merely doing a slice is guaranteed to not affect the allocated length. (And it could not, otherwise the semantics and usefulness of slices completely disintegrates.)

> 2. implement == and != different for char[] than for the other arrays.
After
> all, if you're using char[] to hold bytes (not characters), you're ([mod:
> expletives deleted]) and not availing yourselves of the full power of D's
> extended (over C and C++) types.

I think that will cause more confusion. Consistency is worth a great deal.

> 3. Provide a separate string class that works in a "sensible" way.

I can see if someone wants a C string class, but I'd call it Cstrings or ASCIZ strings.

> Since is unworkable, and I'm out of inspiration, let's discuss 2 and 3. Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.

There will always be some details to deal with when trying to use two
different string representations with one format. The solution is to use D
strings throughout the program, only converting to a C string when calling a
C API function using toStringz(), and when receiving a string from a C API
function, immediately convert it to a D string using:
    string = string[0..strlen(string)];
and I think the problems you're having will disappear. Alternatively, use
char[] for D strings, and char* for C strings.

I should make a FAQ entry for this. <g>

October 12, 2003

Re: == for char[] - broken

Posted by Matthew Wilson
in reply to Walter

Matthew Wilson

Posted in reply to Walter

Walter

You've mistaken what I was saying.

Of course I understand that a D string can contain embedded NULLs, and I *assure* you that I grok the difference between a C and a D string.

You're actually explaining the reverse situation. If you look at the example code, you can quite clearly see that the problem is the reverse of what (I think that) you think I'm thinking.

The constructor for a Win32Exception does the following:

    this(char[] message, int error)
    {
        char    sz[24];

        wsprintfA(sz, " (%d)", error);

        m_message = message;
        m_error = error;

        super(message ~ sz);
    }

In the invariant, the code passes "Test 2" and 3 to the ctor, and then checks that the message from the caught exception is "Test 2 (3)".

If it does this by comparing them as D strings, the comparison fails. This is demonstrated by the fact that the assertion fails but printf prints them as equal. As you say, it should print

    "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"

The problem is, I've written the Win32Exception ctor, and I've written the test case. In no part of the code have I requested any extra storage, and yet the two strings are not equal! Hence, I cannot trust D strings to be strings, or at least the current implementation of the exception-handling mechanism is broken.

This is the specific case. In the general case, to which I admit much of your response is persuasive, I still think there is a problem, but that is more of a user expectation than a bona fide flaw.

Can you address my specific problem? My workaround has been to use the function string_equal(), but that chews

boolean string_equal(char[] s1, char[] s2)
{
    return 0 == strcmp(toStringz(s1), toStringz(s2));
}

Matthew



"Walter" <walter@digitalmars.com> wrote in message news:bma49n$2g22$1@digitaldaemon.com...
> What is happening here is trying to simultaneously use two different representations of strings, one with an explicit length, and one with a 0 termination. If you are going to use both in the same array, you'll need
to
> set the explicit length properly.
>
> You're also seeing an artifact of printf's "%.*s" format where the * is taken to be the maximum length, not the minimum length. printf is still a
C
> function, and quits when it sees a 0 byte. What your string really is is:
>     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
> and a D printf would print it that way.
>
> Alternatively, what you can do is:
> 1) Use char[] for a D string.
> 2) Use char* for a C string.
> Which should be clear to anyone examining the code.
>
> More comments embedded.
>
> "Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> > This is something that's going to come up again and again.
> >
> > I'm just writing some unittests for the registry module, along the lines
> of
> >
> >     // (ii) Catch that can throw and be caught by Exception
> >     {
> >         char[]  message =   "Test 2";
> >         int     code    =   3;
> >         char[]  string  =   "Test 2 (3)";
> >
> >         try
> >         {
> >             throw new Win32Exception("Test 2", code);
> >         }
> >         catch(Exception x)
> >         {
> >             if(string != x.toString())
> >             {
> >                 printf( "UnitTest failure for Win32Exception:\n"
> >                         "  x.toString() [%d;\"%.*s\"] does not equal
> > [%d;\"%.*s\"]\n"
> >                     ,   x.toString().length, x.toString()
> >                     ,   string.length, string);
> >             }
> >             assert(string == x.toString());
> >         }
> >     }
> >
> > The test fires with
> >
> >
> > UnitTest failure for Win32Exception:
> >   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> > Error: Assertion Failure registry(180)
> >
> > In other words, the strings are the same, but the arrays are not.
>
> No, the strings are not the same. printf just quit when it saw a '\0'. To compare null terminated strings, slice them to set the length properly, or use strcmp().
>
> > I
> > understand what's going on here, and why the current support for arrays
of
> > char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not
apply
> > == and != to them. It's just asking more than human nature can give.
> >
> > Possible solutions:
> >
> > 1. Always ensure that length or a char[] represents the C-style length.
> This
> > would not allow a null terminator (so it'd be pretty hard to make it
work,
> > no?),
>
> Not a problem, since slices work!
>     string=string[0..strlen(string)];
> takes a slice of the existing array,
> the terminating 0 does not go away. The .length property of an array is
> *not* the allocated length, it is only guaranteed to be <= the allocated
> length.
>
> > not to mention proscribing the often useful technique of
> > (pre-)allocating beyond the current extent.
>
> Not at all. The .length is not the allocated length of an array. It's the length of slice of the allocated length. Don't worry about the allocated length, the gc will manage that for you. Only by explicitly changing the .length property is it possible that the allocated length changes; merely doing a slice is guaranteed to not affect the allocated length. (And it could not, otherwise the semantics and usefulness of slices completely disintegrates.)
>
> > 2. implement == and != different for char[] than for the other arrays.
> After
> > all, if you're using char[] to hold bytes (not characters), you're
([mod:
> > expletives deleted]) and not availing yourselves of the full power of
D's
> > extended (over C and C++) types.
>
> I think that will cause more confusion. Consistency is worth a great deal.
>
> > 3. Provide a separate string class that works in a "sensible" way.
>
> I can see if someone wants a C string class, but I'd call it Cstrings or ASCIZ strings.
>
> > Since is unworkable, and I'm out of inspiration, let's discuss 2 and 3. Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.
>
> There will always be some details to deal with when trying to use two different string representations with one format. The solution is to use D strings throughout the program, only converting to a C string when calling
a
> C API function using toStringz(), and when receiving a string from a C API
> function, immediately convert it to a D string using:
>     string = string[0..strlen(string)];
> and I think the problems you're having will disappear. Alternatively, use
> char[] for D strings, and char* for C strings.
>
> I should make a FAQ entry for this. <g>
>
>

October 12, 2003

Re: == for char[] - broken

Posted by Matthew Wilson
in reply to Matthew Wilson

Matthew Wilson

Posted in reply to Matthew Wilson

Gah!

Hoisted again.

Does

       super(message ~ sz);

add the whole length of sz? I guess the answer is a sorry yes.

Well, the specifc case is answered, but the general one gathers weight. :(

(And I look like a tool in public once again ...)

"Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bma5kj$2hv1$1@digitaldaemon.com...
> Walter
>
> You've mistaken what I was saying.
>
> Of course I understand that a D string can contain embedded NULLs, and I *assure* you that I grok the difference between a C and a D string.
>
> You're actually explaining the reverse situation. If you look at the
example
> code, you can quite clearly see that the problem is the reverse of what (I think that) you think I'm thinking.
>
> The constructor for a Win32Exception does the following:
>
>     this(char[] message, int error)
>     {
>         char    sz[24];
>
>         wsprintfA(sz, " (%d)", error);
>
>         m_message = message;
>         m_error = error;
>
>         super(message ~ sz);
>     }
>
> In the invariant, the code passes "Test 2" and 3 to the ctor, and then checks that the message from the caught exception is "Test 2 (3)".
>
> If it does this by comparing them as D strings, the comparison fails. This is demonstrated by the fact that the assertion fails but printf prints
them
> as equal. As you say, it should print
>
>     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
>
> The problem is, I've written the Win32Exception ctor, and I've written the test case. In no part of the code have I requested any extra storage, and yet the two strings are not equal! Hence, I cannot trust D strings to be strings, or at least the current implementation of the exception-handling mechanism is broken.
>
> This is the specific case. In the general case, to which I admit much of your response is persuasive, I still think there is a problem, but that is more of a user expectation than a bona fide flaw.
>
> Can you address my specific problem? My workaround has been to use the function string_equal(), but that chews
>
> boolean string_equal(char[] s1, char[] s2)
> {
>     return 0 == strcmp(toStringz(s1), toStringz(s2));
> }
>
> Matthew
>
>
>
> "Walter" <walter@digitalmars.com> wrote in message news:bma49n$2g22$1@digitaldaemon.com...
> > What is happening here is trying to simultaneously use two different representations of strings, one with an explicit length, and one with a
0
> > termination. If you are going to use both in the same array, you'll need
> to
> > set the explicit length properly.
> >
> > You're also seeing an artifact of printf's "%.*s" format where the * is taken to be the maximum length, not the minimum length. printf is still
a
> C
> > function, and quits when it sees a 0 byte. What your string really is
is:
> >     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
> > and a D printf would print it that way.
> >
> > Alternatively, what you can do is:
> > 1) Use char[] for a D string.
> > 2) Use char* for a C string.
> > Which should be clear to anyone examining the code.
> >
> > More comments embedded.
> >
> > "Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> > > This is something that's going to come up again and again.
> > >
> > > I'm just writing some unittests for the registry module, along the
lines
> > of
> > >
> > >     // (ii) Catch that can throw and be caught by Exception
> > >     {
> > >         char[]  message =   "Test 2";
> > >         int     code    =   3;
> > >         char[]  string  =   "Test 2 (3)";
> > >
> > >         try
> > >         {
> > >             throw new Win32Exception("Test 2", code);
> > >         }
> > >         catch(Exception x)
> > >         {
> > >             if(string != x.toString())
> > >             {
> > >                 printf( "UnitTest failure for Win32Exception:\n"
> > >                         "  x.toString() [%d;\"%.*s\"] does not equal
> > > [%d;\"%.*s\"]\n"
> > >                     ,   x.toString().length, x.toString()
> > >                     ,   string.length, string);
> > >             }
> > >             assert(string == x.toString());
> > >         }
> > >     }
> > >
> > > The test fires with
> > >
> > >
> > > UnitTest failure for Win32Exception:
> > >   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> > > Error: Assertion Failure registry(180)
> > >
> > > In other words, the strings are the same, but the arrays are not.
> >
> > No, the strings are not the same. printf just quit when it saw a '\0'.
To
> > compare null terminated strings, slice them to set the length properly,
or
> > use strcmp().
> >
> > > I
> > > understand what's going on here, and why the current support for
arrays
> of
> > > char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not
> apply
> > > == and != to them. It's just asking more than human nature can give.
> > >
> > > Possible solutions:
> > >
> > > 1. Always ensure that length or a char[] represents the C-style
length.
> > This
> > > would not allow a null terminator (so it'd be pretty hard to make it
> work,
> > > no?),
> >
> > Not a problem, since slices work!
> >     string=string[0..strlen(string)];
> > takes a slice of the existing array,
> > the terminating 0 does not go away. The .length property of an array is
> > *not* the allocated length, it is only guaranteed to be <= the allocated
> > length.
> >
> > > not to mention proscribing the often useful technique of
> > > (pre-)allocating beyond the current extent.
> >
> > Not at all. The .length is not the allocated length of an array. It's
the
> > length of slice of the allocated length. Don't worry about the allocated length, the gc will manage that for you. Only by explicitly changing the .length property is it possible that the allocated length changes;
merely
> > doing a slice is guaranteed to not affect the allocated length. (And it could not, otherwise the semantics and usefulness of slices completely disintegrates.)
> >
> > > 2. implement == and != different for char[] than for the other arrays.
> > After
> > > all, if you're using char[] to hold bytes (not characters), you're
> ([mod:
> > > expletives deleted]) and not availing yourselves of the full power of
> D's
> > > extended (over C and C++) types.
> >
> > I think that will cause more confusion. Consistency is worth a great
deal.
> >
> > > 3. Provide a separate string class that works in a "sensible" way.
> >
> > I can see if someone wants a C string class, but I'd call it Cstrings or ASCIZ strings.
> >
> > > Since is unworkable, and I'm out of inspiration, let's discuss 2 and
3.
> > > Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.
> >
> > There will always be some details to deal with when trying to use two different string representations with one format. The solution is to use
D
> > strings throughout the program, only converting to a C string when
calling
> a
> > C API function using toStringz(), and when receiving a string from a C
API
> > function, immediately convert it to a D string using:
> >     string = string[0..strlen(string)];
> > and I think the problems you're having will disappear. Alternatively,
use
> > char[] for D strings, and char* for C strings.
> >
> > I should make a FAQ entry for this. <g>
> >
> >
>
>

October 12, 2003

Re: == for char[] - broken

Posted by Matthew Wilson
in reply to Matthew Wilson

Matthew Wilson

Posted in reply to Matthew Wilson

Yep, that was it. Thank goodness we can slice a C array!

    this(char[] message, int error)
    {
        char    sz[24]; // Enough for the three " ()" characters and a
64-bit integer value
        int     cch = wsprintfA(sz, " (%d)", error);

        m_message = message;
        m_error   = error;

        super(message ~ sz[0 .. cch]);
    }

Now it works correctly.

I still maintain that this is a nasty waiting to catch the unwary. Maybe the answer is to require that the ~/~= operators take a D array, or a C-array slice, rather than a C array or char*. Is that workable?

"Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bma5vq$2ier$1@digitaldaemon.com...
> Gah!
>
> Hoisted again.
>
> Does
>
>        super(message ~ sz);
>
> add the whole length of sz? I guess the answer is a sorry yes.
>
> Well, the specifc case is answered, but the general one gathers weight. :(
>
> (And I look like a tool in public once again ...)
>
> "Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bma5kj$2hv1$1@digitaldaemon.com...
> > Walter
> >
> > You've mistaken what I was saying.
> >
> > Of course I understand that a D string can contain embedded NULLs, and I *assure* you that I grok the difference between a C and a D string.
> >
> > You're actually explaining the reverse situation. If you look at the
> example
> > code, you can quite clearly see that the problem is the reverse of what
(I
> > think that) you think I'm thinking.
> >
> > The constructor for a Win32Exception does the following:
> >
> >     this(char[] message, int error)
> >     {
> >         char    sz[24];
> >
> >         wsprintfA(sz, " (%d)", error);
> >
> >         m_message = message;
> >         m_error = error;
> >
> >         super(message ~ sz);
> >     }
> >
> > In the invariant, the code passes "Test 2" and 3 to the ctor, and then checks that the message from the caught exception is "Test 2 (3)".
> >
> > If it does this by comparing them as D strings, the comparison fails.
This
> > is demonstrated by the fact that the assertion fails but printf prints
> them
> > as equal. As you say, it should print
> >
> >     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
> >
> > The problem is, I've written the Win32Exception ctor, and I've written
the
> > test case. In no part of the code have I requested any extra storage,
and
> > yet the two strings are not equal! Hence, I cannot trust D strings to be strings, or at least the current implementation of the
exception-handling
> > mechanism is broken.
> >
> > This is the specific case. In the general case, to which I admit much of your response is persuasive, I still think there is a problem, but that
is
> > more of a user expectation than a bona fide flaw.
> >
> > Can you address my specific problem? My workaround has been to use the function string_equal(), but that chews
> >
> > boolean string_equal(char[] s1, char[] s2)
> > {
> >     return 0 == strcmp(toStringz(s1), toStringz(s2));
> > }
> >
> > Matthew
> >
> >
> >
> > "Walter" <walter@digitalmars.com> wrote in message news:bma49n$2g22$1@digitaldaemon.com...
> > > What is happening here is trying to simultaneously use two different representations of strings, one with an explicit length, and one with
a
> 0
> > > termination. If you are going to use both in the same array, you'll
need
> > to
> > > set the explicit length properly.
> > >
> > > You're also seeing an artifact of printf's "%.*s" format where the *
is
> > > taken to be the maximum length, not the minimum length. printf is
still
> a
> > C
> > > function, and quits when it sees a 0 byte. What your string really is
> is:
> > >     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
> > > and a D printf would print it that way.
> > >
> > > Alternatively, what you can do is:
> > > 1) Use char[] for a D string.
> > > 2) Use char* for a C string.
> > > Which should be clear to anyone examining the code.
> > >
> > > More comments embedded.
> > >
> > > "Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> > > > This is something that's going to come up again and again.
> > > >
> > > > I'm just writing some unittests for the registry module, along the
> lines
> > > of
> > > >
> > > >     // (ii) Catch that can throw and be caught by Exception
> > > >     {
> > > >         char[]  message =   "Test 2";
> > > >         int     code    =   3;
> > > >         char[]  string  =   "Test 2 (3)";
> > > >
> > > >         try
> > > >         {
> > > >             throw new Win32Exception("Test 2", code);
> > > >         }
> > > >         catch(Exception x)
> > > >         {
> > > >             if(string != x.toString())
> > > >             {
> > > >                 printf( "UnitTest failure for Win32Exception:\n"
> > > >                         "  x.toString() [%d;\"%.*s\"] does not equal
> > > > [%d;\"%.*s\"]\n"
> > > >                     ,   x.toString().length, x.toString()
> > > >                     ,   string.length, string);
> > > >             }
> > > >             assert(string == x.toString());
> > > >         }
> > > >     }
> > > >
> > > > The test fires with
> > > >
> > > >
> > > > UnitTest failure for Win32Exception:
> > > >   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> > > > Error: Assertion Failure registry(180)
> > > >
> > > > In other words, the strings are the same, but the arrays are not.
> > >
> > > No, the strings are not the same. printf just quit when it saw a '\0'.
> To
> > > compare null terminated strings, slice them to set the length
properly,
> or
> > > use strcmp().
> > >
> > > > I
> > > > understand what's going on here, and why the current support for
> arrays
> > of
> > > > char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not
> > apply
> > > > == and != to them. It's just asking more than human nature can give.

> > > >
> > > > Possible solutions:
> > > >
> > > > 1. Always ensure that length or a char[] represents the C-style
> length.
> > > This
> > > > would not allow a null terminator (so it'd be pretty hard to make it
> > work,
> > > > no?),
> > >
> > > Not a problem, since slices work!
> > >     string=string[0..strlen(string)];
> > > takes a slice of the existing array,
> > > the terminating 0 does not go away. The .length property of an array
is
> > > *not* the allocated length, it is only guaranteed to be <= the
allocated
> > > length.
> > >
> > > > not to mention proscribing the often useful technique of
> > > > (pre-)allocating beyond the current extent.
> > >
> > > Not at all. The .length is not the allocated length of an array. It's
> the
> > > length of slice of the allocated length. Don't worry about the
allocated
> > > length, the gc will manage that for you. Only by explicitly changing
the
> > > .length property is it possible that the allocated length changes;
> merely
> > > doing a slice is guaranteed to not affect the allocated length. (And
it
> > > could not, otherwise the semantics and usefulness of slices completely disintegrates.)
> > >
> > > > 2. implement == and != different for char[] than for the other
arrays.
> > > After
> > > > all, if you're using char[] to hold bytes (not characters), you're
> > ([mod:
> > > > expletives deleted]) and not availing yourselves of the full power
of
> > D's
> > > > extended (over C and C++) types.
> > >
> > > I think that will cause more confusion. Consistency is worth a great
> deal.
> > >
> > > > 3. Provide a separate string class that works in a "sensible" way.
> > >
> > > I can see if someone wants a C string class, but I'd call it Cstrings
or
> > > ASCIZ strings.
> > >
> > > > Since is unworkable, and I'm out of inspiration, let's discuss 2 and
> 3.
> > > > Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.
> > >
> > > There will always be some details to deal with when trying to use two different string representations with one format. The solution is to
use
> D
> > > strings throughout the program, only converting to a C string when
> calling
> > a
> > > C API function using toStringz(), and when receiving a string from a C
> API
> > > function, immediately convert it to a D string using:
> > >     string = string[0..strlen(string)];
> > > and I think the problems you're having will disappear. Alternatively,
> use
> > > char[] for D strings, and char* for C strings.
> > >
> > > I should make a FAQ entry for this. <g>
> > >
> > >
> >
> >
>
>

October 12, 2003

Re: == for char[] - broken

Posted by Walter
in reply to Matthew Wilson

Walter

Posted in reply to Matthew Wilson

You *are* allocating extra storage - the
    char sz[24];
creates a string 24 bytes long. The fix is to rewrite the super constructor
call from:
    super(message ~ sz);
to:
    super(message ~ sz[0..strlen(sz]);
and it should work fine.

"Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bma5kj$2hv1$1@digitaldaemon.com...
> Walter
>
> You've mistaken what I was saying.
>
> Of course I understand that a D string can contain embedded NULLs, and I *assure* you that I grok the difference between a C and a D string.
>
> You're actually explaining the reverse situation. If you look at the
example
> code, you can quite clearly see that the problem is the reverse of what (I think that) you think I'm thinking.
>
> The constructor for a Win32Exception does the following:
>
>     this(char[] message, int error)
>     {
>         char    sz[24];
>
>         wsprintfA(sz, " (%d)", error);
>
>         m_message = message;
>         m_error = error;
>
>         super(message ~ sz);
>     }
>
> In the invariant, the code passes "Test 2" and 3 to the ctor, and then checks that the message from the caught exception is "Test 2 (3)".
>
> If it does this by comparing them as D strings, the comparison fails. This is demonstrated by the fact that the assertion fails but printf prints
them
> as equal. As you say, it should print
>
>     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
>
> The problem is, I've written the Win32Exception ctor, and I've written the test case. In no part of the code have I requested any extra storage, and yet the two strings are not equal! Hence, I cannot trust D strings to be strings, or at least the current implementation of the exception-handling mechanism is broken.
>
> This is the specific case. In the general case, to which I admit much of your response is persuasive, I still think there is a problem, but that is more of a user expectation than a bona fide flaw.
>
> Can you address my specific problem? My workaround has been to use the function string_equal(), but that chews
>
> boolean string_equal(char[] s1, char[] s2)
> {
>     return 0 == strcmp(toStringz(s1), toStringz(s2));
> }
>
> Matthew
>
>
>
> "Walter" <walter@digitalmars.com> wrote in message news:bma49n$2g22$1@digitaldaemon.com...
> > What is happening here is trying to simultaneously use two different representations of strings, one with an explicit length, and one with a
0
> > termination. If you are going to use both in the same array, you'll need
> to
> > set the explicit length properly.
> >
> > You're also seeing an artifact of printf's "%.*s" format where the * is taken to be the maximum length, not the minimum length. printf is still
a
> C
> > function, and quits when it sees a 0 byte. What your string really is
is:
> >     "Test 2 (3)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"
> > and a D printf would print it that way.
> >
> > Alternatively, what you can do is:
> > 1) Use char[] for a D string.
> > 2) Use char* for a C string.
> > Which should be clear to anyone examining the code.
> >
> > More comments embedded.
> >
> > "Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bm9tdq$26up$1@digitaldaemon.com...
> > > This is something that's going to come up again and again.
> > >
> > > I'm just writing some unittests for the registry module, along the
lines
> > of
> > >
> > >     // (ii) Catch that can throw and be caught by Exception
> > >     {
> > >         char[]  message =   "Test 2";
> > >         int     code    =   3;
> > >         char[]  string  =   "Test 2 (3)";
> > >
> > >         try
> > >         {
> > >             throw new Win32Exception("Test 2", code);
> > >         }
> > >         catch(Exception x)
> > >         {
> > >             if(string != x.toString())
> > >             {
> > >                 printf( "UnitTest failure for Win32Exception:\n"
> > >                         "  x.toString() [%d;\"%.*s\"] does not equal
> > > [%d;\"%.*s\"]\n"
> > >                     ,   x.toString().length, x.toString()
> > >                     ,   string.length, string);
> > >             }
> > >             assert(string == x.toString());
> > >         }
> > >     }
> > >
> > > The test fires with
> > >
> > >
> > > UnitTest failure for Win32Exception:
> > >   x.toString() [30;"Test 2 (3)"] does not equal [10;"Test 2 (3)"]
> > > Error: Assertion Failure registry(180)
> > >
> > > In other words, the strings are the same, but the arrays are not.
> >
> > No, the strings are not the same. printf just quit when it saw a '\0'.
To
> > compare null terminated strings, slice them to set the length properly,
or
> > use strcmp().
> >
> > > I
> > > understand what's going on here, and why the current support for
arrays
> of
> > > char are implementated as they are, but this is just going to keep recurring: you can't expect people to use char[] for strings and not
> apply
> > > == and != to them. It's just asking more than human nature can give.
> > >
> > > Possible solutions:
> > >
> > > 1. Always ensure that length or a char[] represents the C-style
length.
> > This
> > > would not allow a null terminator (so it'd be pretty hard to make it
> work,
> > > no?),
> >
> > Not a problem, since slices work!
> >     string=string[0..strlen(string)];
> > takes a slice of the existing array,
> > the terminating 0 does not go away. The .length property of an array is
> > *not* the allocated length, it is only guaranteed to be <= the allocated
> > length.
> >
> > > not to mention proscribing the often useful technique of
> > > (pre-)allocating beyond the current extent.
> >
> > Not at all. The .length is not the allocated length of an array. It's
the
> > length of slice of the allocated length. Don't worry about the allocated length, the gc will manage that for you. Only by explicitly changing the .length property is it possible that the allocated length changes;
merely
> > doing a slice is guaranteed to not affect the allocated length. (And it could not, otherwise the semantics and usefulness of slices completely disintegrates.)
> >
> > > 2. implement == and != different for char[] than for the other arrays.
> > After
> > > all, if you're using char[] to hold bytes (not characters), you're
> ([mod:
> > > expletives deleted]) and not availing yourselves of the full power of
> D's
> > > extended (over C and C++) types.
> >
> > I think that will cause more confusion. Consistency is worth a great
deal.
> >
> > > 3. Provide a separate string class that works in a "sensible" way.
> >
> > I can see if someone wants a C string class, but I'd call it Cstrings or ASCIZ strings.
> >
> > > Since is unworkable, and I'm out of inspiration, let's discuss 2 and
3.
> > > Unless I'm in a minority of one again in believing this is another imperfection in the language that's going to be a constant source of problems.
> >
> > There will always be some details to deal with when trying to use two different string representations with one format. The solution is to use
D
> > strings throughout the program, only converting to a C string when
calling
> a
> > C API function using toStringz(), and when receiving a string from a C
API
> > function, immediately convert it to a D string using:
> >     string = string[0..strlen(string)];
> > and I think the problems you're having will disappear. Alternatively,
use
> > char[] for D strings, and char* for C strings.
> >
> > I should make a FAQ entry for this. <g>
> >
> >
>
>

October 12, 2003

Re: == for char[] - broken

Posted by Walter
in reply to Matthew Wilson

Walter

Posted in reply to Matthew Wilson

"Matthew Wilson" <matthew@stlsoft.org> wrote in message news:bma69s$2ipi$1@digitaldaemon.com...
> I still maintain that this is a nasty waiting to catch the unwary. Maybe
the
> answer is to require that the ~/~= operators take a D array, or a C-array slice, rather than a C array or char*. Is that workable?

I don't know how since there is no distinct C array type in D. I think a better solution is to replace all the C functions that deal with strings with corresponding D functions.

October 12, 2003

Re: == for char[] - broken

Posted by Hauke Duden
in reply to Walter

Hauke Duden

Posted in reply to Walter

"Walter" <walter@digitalmars.com> wrote in message news:bmapvs$bdj$1@digitaldaemon.com...
> > I still maintain that this is a nasty waiting to catch the unwary. Maybe
> the
> > answer is to require that the ~/~= operators take a D array, or a
C-array
> > slice, rather than a C array or char*. Is that workable?
>
> I don't know how since there is no distinct C array type in D. I think a better solution is to replace all the C functions that deal with strings with corresponding D functions.

That might be possible for functions in the runtime lib, but what about 3rd party C libraries?

As I understand it, one of the major design goals of D is to be able to easily interact with existing C code. And I agree with Matthew that the need for this kind of manual conversion is very error-prone. I think this is definitely a problem that needs to be addressed.

The more I think about it, the more I realize that we'll need a real string class that takes care of such issues. Maybe that class could always add a terminating zero that is not included in the length. That way we maintain compatibility with all the existing string libraries and can still pull off nifty stuff like having embedded zeros if you use your strings only with pure D code.

Another thing: what about OS functions? The whole Win32 API expects zero-terminated strings. And we cannot wrap everything into D functions, can we? It gets even worse with COM interfaces. Since these use an object oriented approach with a virtual function table, you cannot replace individual functions with wrappers. You would have to wrap the whole object with all its interfaces - which isn't possible, since you might not know all the interfaces it supports!

Hauke

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation