August 31, 2004
Considering the code is not as straight forward as I am used to,
what the character() method is doing is decoding the string byte
by byte using the passed in index.  The index (i) is only used
to resume where you may have left off.  Ok.  So we have a little
optimization here so that we don't double-decode something...

It seemed a bit odd to me to do the b-a subtraction in the slice
method, but then I realized what you were doing (resuming from the
last point).

Of course this also assumes that someone didn't put in bad data
like:

slice(mystr, 5, 4);

Not to mention you could genericise the functions since they are
identical except for the element type of the array.

I suppose that is why C++ string object is templated (so you can
use wchar instead of char).

The decode method would actually be different though based on the
type.

Ben Hinkle wrote:

> import std.utf;
> 
> size_t character(char[] str, size_t n, size_t i = 0) {
>   while (n--) {
>     decode(str,i);
>   }
>   return i;
> }
> 
> size_t character(wchar[] str, size_t n, size_t i = 0) {
>   while (n--) {
>     decode(str,i);
>   }
>   return i;
> }
> 
> char[] slice(char[] str, size_t a, size_t b) {
>   size_t ai = character(str,a);
>   size_t bi = character(str,b-a,ai);
>   return str[ai .. bi];
> }
> 
> wchar[] slice(wchar[] str, size_t a, size_t b) {
>   size_t ai = character(str,a);
>   size_t bi = character(str,b-a,ai);
>   return str[ai .. bi];
> }
> 
> 
August 31, 2004
On Tue, 31 Aug 2004 17:10:50 -0400, Berin Loritsch <bloritsch@d-haven.org> wrote:
> Considering the code is not as straight forward as I am used to,
> what the character() method is doing is decoding the string byte
> by byte using the passed in index.  The index (i) is only used
> to resume where you may have left off.  Ok.  So we have a little
> optimization here so that we don't double-decode something...

Clever optimisation.

> It seemed a bit odd to me to do the b-a subtraction in the slice
> method, but then I realized what you were doing (resuming from the
> last point).

Yeah.. it took me a while too.

> Of course this also assumes that someone didn't put in bad data
> like:
>
> slice(mystr, 5, 4);

A perfect oppotunity for DbC eg.

char[] slice(char[] str, size_t a, size_t b)
in {
  assert(b > a); // b >= a?
}
body {
  size_t ai = character(str,a);
  size_t bi = character(str,b-a,ai);
  return str[ai .. bi];
}

> Not to mention you could genericise the functions since they are
> identical except for the element type of the array.

Yep.

template character(Type : Type[]) {
  size_t character(Type[] str, size_t n, size_t i = 0) {
    while (n--) {
      decode(str,i);
    }
    return i;
  }
}

template slice(Type : Type[])
{
  Type[] slice(Type[] str, size_t a, size_t b)
  in {
    assert(b > a); // b >= a?
  }
  body {
    size_t ai = character(str,a);
    size_t bi = character(str,b-a,ai);
    return str[ai .. bi];
  }
}

or something like that.

> I suppose that is why C++ string object is templated (so you can
> use wchar instead of char).

Probably.

> The decode method would actually be different though based on the
> type.

True.

Regan

> Ben Hinkle wrote:
>
>> import std.utf;
>>
>> size_t character(char[] str, size_t n, size_t i = 0) {
>>   while (n--) {
>>     decode(str,i);
>>   }
>>   return i;
>> }
>>
>> size_t character(wchar[] str, size_t n, size_t i = 0) {
>>   while (n--) {
>>     decode(str,i);
>>   }
>>   return i;
>> }
>>
>> char[] slice(char[] str, size_t a, size_t b) {
>>   size_t ai = character(str,a);
>>   size_t bi = character(str,b-a,ai);
>>   return str[ai .. bi];
>> }
>>
>> wchar[] slice(wchar[] str, size_t a, size_t b) {
>>   size_t ai = character(str,a);
>>   size_t bi = character(str,b-a,ai);
>>   return str[ai .. bi];
>> }
>>
>>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
August 31, 2004
This sort of useful code should go into the standard library, the 'phoenix' (or whatever we call it) library should include this..

On Wed, 01 Sep 2004 11:17:25 +1200, Regan Heath <regan@netwin.co.nz> wrote:

> On Tue, 31 Aug 2004 17:10:50 -0400, Berin Loritsch <bloritsch@d-haven.org> wrote:
>> Considering the code is not as straight forward as I am used to,
>> what the character() method is doing is decoding the string byte
>> by byte using the passed in index.  The index (i) is only used
>> to resume where you may have left off.  Ok.  So we have a little
>> optimization here so that we don't double-decode something...
>
> Clever optimisation.
>
>> It seemed a bit odd to me to do the b-a subtraction in the slice
>> method, but then I realized what you were doing (resuming from the
>> last point).
>
> Yeah.. it took me a while too.
>
>> Of course this also assumes that someone didn't put in bad data
>> like:
>>
>> slice(mystr, 5, 4);
>
> A perfect oppotunity for DbC eg.
>
> char[] slice(char[] str, size_t a, size_t b)
> in {
>    assert(b > a); // b >= a?
> }
> body {
>    size_t ai = character(str,a);
>    size_t bi = character(str,b-a,ai);
>    return str[ai .. bi];
> }
>
>> Not to mention you could genericise the functions since they are
>> identical except for the element type of the array.
>
> Yep.
>
> template character(Type : Type[]) {
>    size_t character(Type[] str, size_t n, size_t i = 0) {
>      while (n--) {
>        decode(str,i);
>      }
>      return i;
>    }
> }
>
> template slice(Type : Type[])
> {
>    Type[] slice(Type[] str, size_t a, size_t b)
>    in {
>      assert(b > a); // b >= a?
>    }
>    body {
>      size_t ai = character(str,a);
>      size_t bi = character(str,b-a,ai);
>      return str[ai .. bi];
>    }
> }
>
> or something like that.
>
>> I suppose that is why C++ string object is templated (so you can
>> use wchar instead of char).
>
> Probably.
>
>> The decode method would actually be different though based on the
>> type.
>
> True.
>
> Regan
>
>> Ben Hinkle wrote:
>>
>>> import std.utf;
>>>
>>> size_t character(char[] str, size_t n, size_t i = 0) {
>>>   while (n--) {
>>>     decode(str,i);
>>>   }
>>>   return i;
>>> }
>>>
>>> size_t character(wchar[] str, size_t n, size_t i = 0) {
>>>   while (n--) {
>>>     decode(str,i);
>>>   }
>>>   return i;
>>> }
>>>
>>> char[] slice(char[] str, size_t a, size_t b) {
>>>   size_t ai = character(str,a);
>>>   size_t bi = character(str,b-a,ai);
>>>   return str[ai .. bi];
>>> }
>>>
>>> wchar[] slice(wchar[] str, size_t a, size_t b) {
>>>   size_t ai = character(str,a);
>>>   size_t bi = character(str,b-a,ai);
>>>   return str[ai .. bi];
>>> }
>>>
>>>
>
>
>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
September 01, 2004
Nice work! Can I add it to std.string? Or should it go in std.utf?


September 01, 2004
In article <ch24jt$rs0$1@digitaldaemon.com>, Berin Loritsch says...

>I think having something generally useful for internationalization is very important, or we shoot ourselves in the foot (we want D to succeed, as long as you speak English does not make sense).  General purpose i18n and l10n is not easy to do by any stretch--but I think it is generally agreed that it would have to be done in libraries.

ICU has the class UnicodeString to encapsulate strings, as well as the abstract class CharacterIterator for iterating over characters, with concrete implementations UCharCharacterIterator and StringCharacterIterator.

It also has a lot more besides. Check out the API guide at http://oss.software.ibm.com/icu/apiref/classes.html.

All of this will be a part of D (yes, via a library) in the not-too-distant
future.


>I just don't think we can rely on D's native (up to now) way of dealing
>with String manipulation.

That's why I'm wrapping ICU as we speak.

Arcane Jill



September 01, 2004
Walter wrote:

> Nice work! Can I add it to std.string? Or should it go in std.utf?

cool, thanks. I think most people would look in std.string since the target of the operations are to index and slice strings - the encoding is somewhat secondary.
September 01, 2004
In article <opsdmdnblz5a2sq9@digitalmars.com>, Regan Heath says...
>[...]
>
>template slice(Type : Type[])
>{
>   Type[] slice(Type[] str, size_t a, size_t b)
>   in {
>     assert(b > a); // b >= a?
>   }
>   body {
>     size_t ai = character(str,a);
>     size_t bi = character(str,b-a,ai);
>     return str[ai .. bi];
>   }
>}
>
>or something like that.
>
>> I suppose that is why C++ string object is templated (so you can
>> use wchar instead of char).

Nice. Except now you have to add a !(char[]) for every slice operation, since D
doesn't auto detect types :-(

A workaround could be something like:

template slice_template(Type: Type[])
{...}

alias slice_template!(char[]) slice;
alias slice_template!(wchar[]) slice;

Nick


September 01, 2004
On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member@pathlink.com> wrote:
> In article <opsdmdnblz5a2sq9@digitalmars.com>, Regan Heath says...
>> [...]
>>
>> template slice(Type : Type[])
>> {
>>   Type[] slice(Type[] str, size_t a, size_t b)
>>   in {
>>     assert(b > a); // b >= a?
>>   }
>>   body {
>>     size_t ai = character(str,a);
>>     size_t bi = character(str,b-a,ai);
>>     return str[ai .. bi];
>>   }
>> }
>>
>> or something like that.
>>
>>> I suppose that is why C++ string object is templated (so you can
>>> use wchar instead of char).
>
> Nice. Except now you have to add a !(char[]) for every slice operation, since D
> doesn't auto detect types :-(
>
> A workaround could be something like:
>
> template slice_template(Type: Type[])
> {...}
>
> alias slice_template!(char[]) slice;
> alias slice_template!(wchar[]) slice;

Does that work? (I haven't tried it, but I'd expect the second to over-rule the first?)

The other option is to then write wrapper functions eg.

char[] slice(char[] str, size_t a, size_t b)
{
  return slice!(char[])(str,a,b);
}

wchar[] slice(wchar[] str, size_t a, size_t b)
{
  return slice!(wchar[])(str,a,b);
}

dchar[] slice(dchar[] str, size_t a, size_t b)
{
  return slice!(dchar[])(str,a,b);
}

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
September 02, 2004
In article <opsdn8ouhn5a2sq9@digitalmars.com>, Regan Heath says...
>
>On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member@pathlink.com> wrote:
>
>> alias slice_template!(char[]) slice;
>> alias slice_template!(wchar[]) slice;
>
>Does that work? (I haven't tried it, but I'd expect the second to over-rule the first?)

Yes, it works because the prototypes are different.  I used this trick at some point in my std.stream rewrite, though I think I tossed all the template code before I posted the verison that's available now.


Sean


September 02, 2004
In article <opsdn8ouhn5a2sq9@digitalmars.com>, Regan Heath says...
>
>On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member@pathlink.com> wrote:
>>
>> alias slice_template!(char[]) slice;
>> alias slice_template!(wchar[]) slice;
>
>Does that work? (I haven't tried it, but I'd expect the second to over-rule the first?)

Yep, it works. The second does not over-rule the first, it over-*loads* it, meaning slice() is subject to normal function overloading rules. I use this on almost all my templates, I find it makes the code less rough on the eyes and means less typing as well.

Nick


1 2
Next ›   Last »