Thread overview
Bug in countUntil?
Oct 12, 2012
monarch_dodra
Oct 12, 2012
Jonathan M Davis
Oct 12, 2012
monarch_dodra
Oct 13, 2012
Sönke Ludwig
October 12, 2012
I was looking in countUntil to fix another issue, and I think the string support is broken

This program:
//----
import std.algorithm;
import std.stdio;

void main()
{
    "日本語".countUntil('本').writeln();
}
//----

Will produce "3".

...

I'd have straight up said it was a bug, but the implementation goes out of its way to special case narrow strings, when the default implementation would have produced the right result anyway. So I was thinking it is somehow by design...?

Am I missing something, or is it just implementation sillyness?


October 12, 2012
On Friday, October 12, 2012 21:02:47 monarch_dodra wrote:
> I was looking in countUntil to fix another issue, and I think the string support is broken
> 
> This program:
> //----
> import std.algorithm;
> import std.stdio;
> 
> void main()
> {
>      "日本語".countUntil('本').writeln();
> }
> //----
> 
> Will produce "3".
> 
> ...
> 
> I'd have straight up said it was a bug, but the implementation goes out of its way to special case narrow strings, when the default implementation would have produced the right result anyway. So I was thinking it is somehow by design...?
> 
> Am I missing something, or is it just implementation sillyness?

Many algorithms special case narrow strings for efficiency. However, in this case, it looks just plain wrong. countUntil is supposed to return the number of elements (i.e. code points in this case), but it looks like it's returning the number of code units. So, I'd say that it's definitely wrong. If you want code units, then use std.string.indexOf. countUntil is supposed to return the number of code points.

- Jonathan M Davis
October 12, 2012
On Friday, 12 October 2012 at 19:17:13 UTC, Jonathan M Davis wrote:
> On Friday, October 12, 2012 21:02:47 monarch_dodra wrote:
>> I was looking in countUntil to fix another issue, and I think the
>> string support is broken
>> 
>> This program:
>> //----
>> import std.algorithm;
>> import std.stdio;
>> 
>> void main()
>> {
>>      "日本語".countUntil('本').writeln();
>> }
>> //----
>> 
>> Will produce "3".
>> 
>> ...
>> 
>> I'd have straight up said it was a bug, but the implementation
>> goes out of its way to special case narrow strings, when the
>> default implementation would have produced the right result
>> anyway. So I was thinking it is somehow by design...?
>> 
>> Am I missing something, or is it just implementation sillyness?
>
> Many algorithms special case narrow strings for efficiency. However, in this
> case, it looks just plain wrong. countUntil is supposed to return the number
> of elements (i.e. code points in this case), but it looks like it's returning
> the number of code units. So, I'd say that it's definitely wrong. If you want
> code units, then use std.string.indexOf. countUntil is supposed to return the
> number of code points.
>
> - Jonathan M Davis

yeah, that's what I thought, but wanted it double checked. I'll take care of it then.
October 13, 2012
Am 10/12/2012 9:27 PM, schrieb monarch_dodra:
> 
> yeah, that's what I thought, but wanted it double checked. I'll take care of it then.

Just wanted to mention that this kind of subtle change in behavior can break a lot of code in non-obvious ways. In any case, the documentation for countUntil, but more importantly for (last)IndexOf, needs to state clearly what it does for narrow strings (the countUntil docs at least imply this by using the term "elements", but an explicit statement can do no harm).