Jump to page: 1 2 3
Thread overview
[Issue 8384] New: Poor wchar/dchar* to string conversion support
Jul 13, 2012
Vladimir Panteleev
Jul 13, 2012
Jonathan M Davis
Jul 13, 2012
Vladimir Panteleev
Jul 13, 2012
Vladimir Panteleev
Jul 13, 2012
Jonathan M Davis
[Issue 8384] std.conv.to should allow conversion between any pair of string/wstring/dstring/char*/wchar*/dchar*
Jul 13, 2012
Vladimir Panteleev
Jul 13, 2012
Vladimir Panteleev
Jul 13, 2012
klickverbot
Jul 13, 2012
Jonathan M Davis
Aug 15, 2012
Vladimir Panteleev
Aug 15, 2012
Jonathan M Davis
Aug 15, 2012
Vladimir Panteleev
Aug 15, 2012
Vladimir Panteleev
Aug 15, 2012
Jonathan M Davis
Aug 15, 2012
Vladimir Panteleev
Aug 15, 2012
Jonathan M Davis
Aug 15, 2012
Vladimir Panteleev
Aug 15, 2012
Vladimir Panteleev
Aug 15, 2012
Adam D. Ruppe
Aug 15, 2012
Jonathan M Davis
Aug 15, 2012
Vladimir Panteleev
Jan 13, 2013
Andrej Mitrovic
Jan 13, 2013
Andrej Mitrovic
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384

           Summary: Poor wchar/dchar* to string conversion support
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P3
         Component: Phobos
        AssignedTo: nobody@puremagic.com
        ReportedBy: thecybershadow@gmail.com


--- Comment #0 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-07-13 05:23:29 PDT ---
import std.conv;
import std.string;

unittest
{
    static void test(T)(T lp)
    {
        assert(format("%s", lp) == "Hello, world!");
        assert(to!string(lp)    == "Hello, world!");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

wchar* conversion is commonly needed for Windows programming, as UTF-16 is the native encoding for Unicode Windows API functions.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384


Jonathan M Davis <jmdavisProg@gmx.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmdavisProg@gmx.com


--- Comment #1 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-07-13 12:00:53 PDT ---
So, you expect %s on a pointer to give you the string that it points to? Why? It's pointer, not a string. It's going to convert the pointer. That works as expected.

to!string should take null-terminated string and give you a string, and it does that. This code passes:

import std.conv;
import std.string;

void main()
{
    static void test(T)(T lp)
    {
        assert(to!string(lp), "hello world");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

So, I'd say that as far as your code goes, there's nothing wrong with it. It functions exactly as expected. What _doesn't_ work is this:

import std.conv;
import std.string;

void main()
{
    static void test(T)(T lp)
    {
        assert(to!wstring(lp), "hello world");
        assert(to!dstring(lp), "hello world");
    }

    test("Hello, world!" .ptr);
    test("Hello, world!"w.ptr);
    test("Hello, world!"d.ptr);
}

The code doesn't even compile, giving these errors:

/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(819): Error:
incompatible types for
((cast(immutable(dchar)[])_adDupT(&_D12TypeInfo_Aya6__initZ,value[cast(ulong)0..strlen(cast(const(char*))value)]))
? (null)): 'immutable(dchar)[]' and 'string'
/home/jmdavis/dmd2/linux/bin/../../src/phobos/std/conv.d(268): Error: template
instance std.conv.toImpl!(immutable(dchar)[],immutable(char)*) error
instantiating
q.d(8):        instantiated from here: to!(immutable(char)*)
q.d(11):        instantiated from here: test!(immutable(char)*)
q.d(8): Error: template instance
std.conv.to!(immutable(dchar)[]).to!(immutable(char)*) error instantiating
q.d(11):        instantiated from here: test!(immutable(char)*)
q.d(11): Error: template instance q.main.test!(immutable(char)*) error
instantiating

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384



--- Comment #2 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-07-13 13:36:05 PDT ---
> to!string should take null-terminated string and give you a string, and it does that. This code passes:

Is it something that was fixed recently (within the last two weeks)? My two-week-old dmd git build and dpaste still print offsets for wchar* and dchar*: http://dpaste.dzfl.pl/26a2b284

> So, you expect %s on a pointer to give you the string that it points to? Why?

I think that, before all else, we should be looking for good reasons why
format("%s", foo) and to!string(foo) produce different results. Why should one
format the offset and the other do a conversion?

Second, I believe that the principle of least surprise is making this case rather clear: if the programmer tries to print a char*, it's almost certain that they want to print the null-terminated string at the given address, rather than a hexadecimal representation of the address (which are rarely useful to the end-user). Generic code is the only exception I can think of, in which case a cast to void* is in order.

> What _doesn't_ work is this:

I think this should call the appropriate toUTFx functions from std.utf.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384



--- Comment #3 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-07-13 13:42:17 PDT ---
> I think this should call the appropriate toUTFx functions from std.utf.

Sorry about that, misread your example. I guess, ideally, conversion between any pair of {|w|d}{char*|string} should work.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384



--- Comment #4 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-07-13 13:59:09 PDT ---
format and writeln are supposed to behave the same, because they both operate on format strings (they _don't_ currently behave 100% the same, but format's current implementation will be replaced with the new xformat's implementation in a few months - after the "scheduled for deprecation" time period). to!string is an entirely different beast.

std.conv.to is asking for an explicit conversion to string, whereas format and writeln are converting according to the format specifiers, and %s indicates the default string representation of the type. char*, wchar*, and dchar* are pointers - _not_ strings - and should not be treated as strings. Pointers print their address with %s. Making char*, wchar*, and dchar* print themselves as strings would be inconsistent with other pointer types, and operating on char*, wchar*, and dchar* should be discouraged, not encouraged.

to!string is treated differently, because you're asking for an explicit conversion, and we _do_ need to be able to convert null-terminated strings to D strings.

So, while I can see your point, I really don't think that having format or writeln treat char*, wchar*, or dchar* as null-terminated strings is a good idea. We should provide a means of converting them to D strings but not do anything to encourage using them as-is without converting them.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384


Vladimir Panteleev <thecybershadow@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Poor wchar/dchar* to string |std.conv.to should allow
                   |conversion support          |conversion between any pair
                   |                            |of
                   |                            |string/wstring/dstring/char
                   |                            |*/wchar*/dchar*


--- Comment #5 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-07-13 14:25:36 PDT ---
OK, fair enough.

I've updated the enhancement request's title according to my previous comment.

Test:

-----------------------------------------------------------------------------

import std.conv;

void test1(T)(T lp)
{
    test2!( string)(lp);
    test2!(wstring)(lp);
    test2!(dstring)(lp);
    test2!(  char*)(lp);
    test2!( wchar*)(lp);
    test2!( dchar*)(lp);
}

void test2(D, S)(S lp)
{
    D dest = to!D(lp);
    assert(to!string(dest) == "Hello, world!");
}

unittest
{
    test1("Hello, world!" );
    test1("Hello, world!"w);
    test1("Hello, world!"d);
    test1("Hello, world!" .ptr);
    test1("Hello, world!"w.ptr);
    test1("Hello, world!"d.ptr);
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384



--- Comment #6 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-07-13 14:31:04 PDT ---
Oh, I forgot about constness.

I guess that raises the number of combinations to (2*3*3)^2 = 324.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384


klickverbot <code@klickverbot.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |code@klickverbot.at


--- Comment #7 from klickverbot <code@klickverbot.at> 2012-07-13 14:37:07 PDT ---
Hooray for using "static" foreach to conveniently enumerate all the cases to test!

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
July 13, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384



--- Comment #8 from Jonathan M Davis <jmdavisProg@gmx.com> 2012-07-13 14:48:31 PDT ---
> Hooray for using "static" foreach to conveniently enumerate all the cases to
test!

Yeah. I do that all of the time when I have to test with multiple types (especially with strings), and I always push for string-related tests to do that when I see that someone is looking to submit code to Phobos for a function that takes one or more strings as templated types, and their tests don't do that. It's just one of those things that everyone who writes much in the way of unit tests in D should learn and know about.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
August 15, 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8384



--- Comment #9 from Vladimir Panteleev <thecybershadow@gmail.com> 2012-08-15 13:24:08 PDT ---
Another case of confusion due to format treating C strings as pointers:

http://stackoverflow.com/q/11975353/21501

I still think that the current behavior, regardless of how much it makes sense from a design/consistency/orthogonality/etc. perspective, is simply not useful and fails the principle of least surprise in most expected cases.

I strongly believe that we should either forbid passing char pointers to format/writeln (and force the user to cast to void* or convert to a D string), or print them as C null-terminated strings.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
« First   ‹ Prev
1 2 3