Thread overview
[Issue 5257] New: std.algorithm.count works incorrectly with UTF8 and UTF16 strings
Nov 24, 2010
Masahiro Nakagawa
Nov 26, 2010
Masahiro Nakagawa
November 22, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5257

           Summary: std.algorithm.count works incorrectly with UTF8 and
                    UTF16 strings
           Product: D
           Version: D2
          Platform: Other
        OS/Version: Mac OS X
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody@puremagic.com
        ReportedBy: andrei@metalanguage.com


--- Comment #0 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-11-22 10:54:01 PST ---
import std.stdio;
import std.algorithm;

void main() {
  writeln(count!("true")("日本語")); // Three characters.
}

The code prints 9 but should print 3.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 22, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5257


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|nobody@puremagic.com        |andrei@metalanguage.com


--- Comment #1 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-11-22 10:54:48 PST ---
Submitted on behalf of Rainer Deyke.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 22, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5257


jakobovrum@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakobovrum@gmail.com


--- Comment #2 from jakobovrum@gmail.com 2010-11-22 12:28:48 PST ---
This is almost entirely off-topic, but I don't think such a tiny change deserves its own issue... sorry if I should have :(

When this gets fixed, count() will be useful as a generic way to count the amount of code points in a UTF encoded string. But I don't think the interface is very pretty for this simple use case.

As a completely non-breaking change, how about changing:
size_t count(alias pred, Range)(Range r) if (isInputRange!(Range))

to:
size_t count(alias pred = "true", Range)(Range r) if (isInputRange!(Range))

So one could simply do count("日本語")?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 24, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5257


Masahiro Nakagawa <repeatedly@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|andrei@metalanguage.com     |repeatedly@gmail.com


--- Comment #3 from Masahiro Nakagawa <repeatedly@gmail.com> 2010-11-24 07:18:51 PST ---
Created an attachment (id=831)
Patch for this issue.

I wrote a simple patch. This patch decodes each char types to dchar and passes predication.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 25, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5257


Andrei Alexandrescu <andrei@metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


--- Comment #4 from Andrei Alexandrescu <andrei@metalanguage.com> 2010-11-25 14:51:45 PST ---
Thanks, Masahiro. I fixed with simpler means that don't need special casing.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
November 26, 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5257



--- Comment #5 from Masahiro Nakagawa <repeatedly@gmail.com> 2010-11-25 21:48:06 PST ---
(In reply to comment #4)
> Thanks, Masahiro. I fixed with simpler means that don't need special casing.

Good!

Are you going to deprecate std.utf.count?
std.algorithm.count(now, default pred is "true") and std.utf.count seem to be
duplicate.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------