Thread overview
[Issue 203] New: std.format.doFormat() pads width incorrectly on Unicode strings
Jun 17, 2006
d-bugmail
Apr 29, 2007
d-bugmail
Jun 24, 2008
d-bugmail
Jul 10, 2008
d-bugmail
June 17, 2006
http://d.puremagic.com/issues/show_bug.cgi?id=203

           Summary: std.format.doFormat() pads width incorrectly on Unicode
                    strings
           Product: D
           Version: 0.160
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Keywords: wrong-code
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla@digitalmars.com
        ReportedBy: deewiant@gmail.com


import std.string;

void main() {
        assert(format("%8s", "foo")             == "     foo");
        assert(format("%8s", "foobar")          == "  foobar");
        assert(format("%8s", "hello")           == "   hello");
        assert(format("%8s", "h\u00e9ll\u00f4") == "   h\u00e9ll\u00f4");
        // this passes, though it shouldn't: assert(format("%8s",
"h\u00e9ll\u00f4") == " h\u00e9ll\u00f4");
}
--

In the above, the last assertion fails.

One would expect the last two strings, having five characters each, to both be padded in the front by three spaces: however, it appears the byte count is being used for determining the length and not the actual character count, and so the last string is padded by only one space.


-- 

April 29, 2007
http://d.puremagic.com/issues/show_bug.cgi?id=203


thomas-dloop@kuehne.cn changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|wrong-code                  |spec
         OS/Version|Windows                     |All




------- Comment #1 from thomas-dloop@kuehne.cn  2007-04-29 02:09 -------
> One would expect the last two strings, having five characters each, to both be padded in the front by three spaces: however, it appears the byte count is being used for determining the length and not the actual character count, and so the last string is padded by only one space.

The only relevant documentation I found is:
> Width
>    Specifies the minimum field width. If the width is a *, the next
>    argument, which must be of type int, is taken as the width. If
>    the width is negative, it is as if the - was given as a Flags
>    character.

"field width" could be both interpreted as " byte length" and "UTF codepoint count".


-- 

June 24, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=203





------- Comment #2 from bugzilla@digitalmars.com  2008-06-24 01:57 -------
I suggest it's codepoint count, as field width is for display purposes.


-- 

July 10, 2008
http://d.puremagic.com/issues/show_bug.cgi?id=203


bugzilla@digitalmars.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Comment #3 from bugzilla@digitalmars.com  2008-07-09 22:30 -------
Fixed dmd 1.032 and 2.016


--