March 22, 2005
I've seen this sort of code ...

  int  FooFind(char[] X, dchar D)
  {
    foreach(int i, dchar C; X)
    {
      if (c == D) return i;
    }
    return -1;
  }

Now I understand that the foreach correctly packages up the utf-8 codepoint fragments to form a valid utf32 character, but when the value of 'i' is returned, it is an index in to the original utf-8 string or an index into the equivalent utf32 string? I'm pretty sure its a utf-8 index and that is a useful thing, as it tells you where in the original string the set of code fragements that make up the character begins. However, it doesn't tell you how many characters into the utf-8 string that the searched-for character was found.

I wrote this routine below, but I'm not sure if I needed to.

  int  FooFind(dchar[] X, dchar D)
  {
    foreach(int i, dchar C; X)
    {
      if (c == D) return i;
    }
    return -1;
  }



-- 
Derek
Melbourne, Australia
22/03/2005 4:24:50 PM
March 22, 2005
"Derek Parnell" <derek@psych.ward> wrote in message news:1w6s40so7p838.8yz58w5g6l4q.dlg@40tude.net...
> I've seen this sort of code ...
>
>   int  FooFind(char[] X, dchar D)
>   {
>     foreach(int i, dchar C; X)
>     {
>       if (c == D) return i;
>     }
>     return -1;
>   }

The Phobos library routine std.string.find() does the same thing.

> Now I understand that the foreach correctly packages up the utf-8
codepoint
> fragments to form a valid utf32 character, but when the value of 'i' is returned, it is an index in to the original utf-8 string or an index into the equivalent utf32 string?

The former.

> I'm pretty sure its a utf-8 index and that is
> a useful thing, as it tells you where in the original string the set of
> code fragements that make up the character begins. However, it doesn't
tell
> you how many characters into the utf-8 string that the searched-for character was found.

That's right. You can feed the result into std.utf.toUCSindex() to get the
other index.

>
> I wrote this routine below, but I'm not sure if I needed to.
>
>   int  FooFind(dchar[] X, dchar D)
>   {
>     foreach(int i, dchar C; X)
>     {
>       if (c == D) return i;
>     }
>     return -1;
>   }

I think this will do what you wish as well (return UCS index):

int  FooFind(dchar[] X, dchar D)
 { int i;
   foreach(dchar C; X)
   {
     if (c == D) return i;
    i++;
   }
   return -1;
 }