arrays and strings (page 2) - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » arrays and strings (page 2)

August 31, 2004

Re: arrays and strings

Posted by Berin Loritsch
in reply to Ben Hinkle

Berin Loritsch

Posted in reply to Ben Hinkle

Considering the code is not as straight forward as I am used to,
what the character() method is doing is decoding the string byte
by byte using the passed in index.  The index (i) is only used
to resume where you may have left off.  Ok.  So we have a little
optimization here so that we don't double-decode something...

It seemed a bit odd to me to do the b-a subtraction in the slice
method, but then I realized what you were doing (resuming from the
last point).

Of course this also assumes that someone didn't put in bad data
like:

slice(mystr, 5, 4);

Not to mention you could genericise the functions since they are
identical except for the element type of the array.

I suppose that is why C++ string object is templated (so you can
use wchar instead of char).

The decode method would actually be different though based on the
type.

Ben Hinkle wrote:

> import std.utf;
> 
> size_t character(char[] str, size_t n, size_t i = 0) {
>   while (n--) {
>     decode(str,i);
>   }
>   return i;
> }
> 
> size_t character(wchar[] str, size_t n, size_t i = 0) {
>   while (n--) {
>     decode(str,i);
>   }
>   return i;
> }
> 
> char[] slice(char[] str, size_t a, size_t b) {
>   size_t ai = character(str,a);
>   size_t bi = character(str,b-a,ai);
>   return str[ai .. bi];
> }
> 
> wchar[] slice(wchar[] str, size_t a, size_t b) {
>   size_t ai = character(str,a);
>   size_t bi = character(str,b-a,ai);
>   return str[ai .. bi];
> }
> 
>

August 31, 2004

Re: arrays and strings

Posted by Regan Heath
in reply to Berin Loritsch

Regan Heath

Posted in reply to Berin Loritsch

On Tue, 31 Aug 2004 17:10:50 -0400, Berin Loritsch <bloritsch@d-haven.org> wrote:
> Considering the code is not as straight forward as I am used to,
> what the character() method is doing is decoding the string byte
> by byte using the passed in index.  The index (i) is only used
> to resume where you may have left off.  Ok.  So we have a little
> optimization here so that we don't double-decode something...

Clever optimisation.

> It seemed a bit odd to me to do the b-a subtraction in the slice
> method, but then I realized what you were doing (resuming from the
> last point).

Yeah.. it took me a while too.

> Of course this also assumes that someone didn't put in bad data
> like:
>
> slice(mystr, 5, 4);

A perfect oppotunity for DbC eg.

char[] slice(char[] str, size_t a, size_t b)
in {
  assert(b > a); // b >= a?
}
body {
  size_t ai = character(str,a);
  size_t bi = character(str,b-a,ai);
  return str[ai .. bi];
}

> Not to mention you could genericise the functions since they are
> identical except for the element type of the array.

Yep.

template character(Type : Type[]) {
  size_t character(Type[] str, size_t n, size_t i = 0) {
    while (n--) {
      decode(str,i);
    }
    return i;
  }
}

template slice(Type : Type[])
{
  Type[] slice(Type[] str, size_t a, size_t b)
  in {
    assert(b > a); // b >= a?
  }
  body {
    size_t ai = character(str,a);
    size_t bi = character(str,b-a,ai);
    return str[ai .. bi];
  }
}

or something like that.

> I suppose that is why C++ string object is templated (so you can
> use wchar instead of char).

Probably.

> The decode method would actually be different though based on the
> type.

True.

Regan

> Ben Hinkle wrote:
>
>> import std.utf;
>>
>> size_t character(char[] str, size_t n, size_t i = 0) {
>>   while (n--) {
>>     decode(str,i);
>>   }
>>   return i;
>> }
>>
>> size_t character(wchar[] str, size_t n, size_t i = 0) {
>>   while (n--) {
>>     decode(str,i);
>>   }
>>   return i;
>> }
>>
>> char[] slice(char[] str, size_t a, size_t b) {
>>   size_t ai = character(str,a);
>>   size_t bi = character(str,b-a,ai);
>>   return str[ai .. bi];
>> }
>>
>> wchar[] slice(wchar[] str, size_t a, size_t b) {
>>   size_t ai = character(str,a);
>>   size_t bi = character(str,b-a,ai);
>>   return str[ai .. bi];
>> }
>>
>>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

August 31, 2004

Re: arrays and strings

Posted by Regan Heath
in reply to Regan Heath

Regan Heath

Posted in reply to Regan Heath

This sort of useful code should go into the standard library, the 'phoenix' (or whatever we call it) library should include this..

On Wed, 01 Sep 2004 11:17:25 +1200, Regan Heath <regan@netwin.co.nz> wrote:

> On Tue, 31 Aug 2004 17:10:50 -0400, Berin Loritsch <bloritsch@d-haven.org> wrote:
>> Considering the code is not as straight forward as I am used to,
>> what the character() method is doing is decoding the string byte
>> by byte using the passed in index.  The index (i) is only used
>> to resume where you may have left off.  Ok.  So we have a little
>> optimization here so that we don't double-decode something...
>
> Clever optimisation.
>
>> It seemed a bit odd to me to do the b-a subtraction in the slice
>> method, but then I realized what you were doing (resuming from the
>> last point).
>
> Yeah.. it took me a while too.
>
>> Of course this also assumes that someone didn't put in bad data
>> like:
>>
>> slice(mystr, 5, 4);
>
> A perfect oppotunity for DbC eg.
>
> char[] slice(char[] str, size_t a, size_t b)
> in {
>    assert(b > a); // b >= a?
> }
> body {
>    size_t ai = character(str,a);
>    size_t bi = character(str,b-a,ai);
>    return str[ai .. bi];
> }
>
>> Not to mention you could genericise the functions since they are
>> identical except for the element type of the array.
>
> Yep.
>
> template character(Type : Type[]) {
>    size_t character(Type[] str, size_t n, size_t i = 0) {
>      while (n--) {
>        decode(str,i);
>      }
>      return i;
>    }
> }
>
> template slice(Type : Type[])
> {
>    Type[] slice(Type[] str, size_t a, size_t b)
>    in {
>      assert(b > a); // b >= a?
>    }
>    body {
>      size_t ai = character(str,a);
>      size_t bi = character(str,b-a,ai);
>      return str[ai .. bi];
>    }
> }
>
> or something like that.
>
>> I suppose that is why C++ string object is templated (so you can
>> use wchar instead of char).
>
> Probably.
>
>> The decode method would actually be different though based on the
>> type.
>
> True.
>
> Regan
>
>> Ben Hinkle wrote:
>>
>>> import std.utf;
>>>
>>> size_t character(char[] str, size_t n, size_t i = 0) {
>>>   while (n--) {
>>>     decode(str,i);
>>>   }
>>>   return i;
>>> }
>>>
>>> size_t character(wchar[] str, size_t n, size_t i = 0) {
>>>   while (n--) {
>>>     decode(str,i);
>>>   }
>>>   return i;
>>> }
>>>
>>> char[] slice(char[] str, size_t a, size_t b) {
>>>   size_t ai = character(str,a);
>>>   size_t bi = character(str,b-a,ai);
>>>   return str[ai .. bi];
>>> }
>>>
>>> wchar[] slice(wchar[] str, size_t a, size_t b) {
>>>   size_t ai = character(str,a);
>>>   size_t bi = character(str,b-a,ai);
>>>   return str[ai .. bi];
>>> }
>>>
>>>
>
>
>



-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

September 01, 2004

Re: arrays and strings

Posted by Walter
in reply to Ben Hinkle

Walter

Posted in reply to Ben Hinkle

Nice work! Can I add it to std.string? Or should it go in std.utf?

September 01, 2004

Re: arrays and strings

Posted by Arcane Jill
in reply to Berin Loritsch

Arcane Jill

Posted in reply to Berin Loritsch

In article <ch24jt$rs0$1@digitaldaemon.com>, Berin Loritsch says...

>I think having something generally useful for internationalization is very important, or we shoot ourselves in the foot (we want D to succeed, as long as you speak English does not make sense).  General purpose i18n and l10n is not easy to do by any stretch--but I think it is generally agreed that it would have to be done in libraries.

ICU has the class UnicodeString to encapsulate strings, as well as the abstract class CharacterIterator for iterating over characters, with concrete implementations UCharCharacterIterator and StringCharacterIterator.

It also has a lot more besides. Check out the API guide at http://oss.software.ibm.com/icu/apiref/classes.html.

All of this will be a part of D (yes, via a library) in the not-too-distant
future.


>I just don't think we can rely on D's native (up to now) way of dealing
>with String manipulation.

That's why I'm wrapping ICU as we speak.

Arcane Jill

September 01, 2004

Re: arrays and strings

Posted by Ben Hinkle
in reply to Walter

Ben Hinkle

Posted in reply to Walter

Walter wrote:

> Nice work! Can I add it to std.string? Or should it go in std.utf?

cool, thanks. I think most people would look in std.string since the target of the operations are to index and slice strings - the encoding is somewhat secondary.

September 01, 2004

Re: arrays and strings

Posted by Nick
in reply to Regan Heath

Nick

Posted in reply to Regan Heath

In article <opsdmdnblz5a2sq9@digitalmars.com>, Regan Heath says...
>[...]
>
>template slice(Type : Type[])
>{
>   Type[] slice(Type[] str, size_t a, size_t b)
>   in {
>     assert(b > a); // b >= a?
>   }
>   body {
>     size_t ai = character(str,a);
>     size_t bi = character(str,b-a,ai);
>     return str[ai .. bi];
>   }
>}
>
>or something like that.
>
>> I suppose that is why C++ string object is templated (so you can
>> use wchar instead of char).

Nice. Except now you have to add a !(char[]) for every slice operation, since D
doesn't auto detect types :-(

A workaround could be something like:

template slice_template(Type: Type[])
{...}

alias slice_template!(char[]) slice;
alias slice_template!(wchar[]) slice;

Nick

September 01, 2004

Re: arrays and strings

Posted by Regan Heath
in reply to Nick

Regan Heath

Posted in reply to Nick

On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member@pathlink.com> wrote:
> In article <opsdmdnblz5a2sq9@digitalmars.com>, Regan Heath says...
>> [...]
>>
>> template slice(Type : Type[])
>> {
>>   Type[] slice(Type[] str, size_t a, size_t b)
>>   in {
>>     assert(b > a); // b >= a?
>>   }
>>   body {
>>     size_t ai = character(str,a);
>>     size_t bi = character(str,b-a,ai);
>>     return str[ai .. bi];
>>   }
>> }
>>
>> or something like that.
>>
>>> I suppose that is why C++ string object is templated (so you can
>>> use wchar instead of char).
>
> Nice. Except now you have to add a !(char[]) for every slice operation, since D
> doesn't auto detect types :-(
>
> A workaround could be something like:
>
> template slice_template(Type: Type[])
> {...}
>
> alias slice_template!(char[]) slice;
> alias slice_template!(wchar[]) slice;

Does that work? (I haven't tried it, but I'd expect the second to over-rule the first?)

The other option is to then write wrapper functions eg.

char[] slice(char[] str, size_t a, size_t b)
{
  return slice!(char[])(str,a,b);
}

wchar[] slice(wchar[] str, size_t a, size_t b)
{
  return slice!(wchar[])(str,a,b);
}

dchar[] slice(dchar[] str, size_t a, size_t b)
{
  return slice!(dchar[])(str,a,b);
}

Regan

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

September 02, 2004

Re: arrays and strings

Posted by Sean Kelly
in reply to Regan Heath

Sean Kelly

Posted in reply to Regan Heath

In article <opsdn8ouhn5a2sq9@digitalmars.com>, Regan Heath says...
>
>On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member@pathlink.com> wrote:
>
>> alias slice_template!(char[]) slice;
>> alias slice_template!(wchar[]) slice;
>
>Does that work? (I haven't tried it, but I'd expect the second to over-rule the first?)

Yes, it works because the prototypes are different.  I used this trick at some point in my std.stream rewrite, though I think I tossed all the template code before I posted the verison that's available now.

Sean

September 02, 2004

Re: arrays and strings

Posted by Nick
in reply to Regan Heath

Nick

Posted in reply to Regan Heath

In article <opsdn8ouhn5a2sq9@digitalmars.com>, Regan Heath says...
>
>On Wed, 1 Sep 2004 12:50:28 +0000 (UTC), Nick <Nick_member@pathlink.com> wrote:
>>
>> alias slice_template!(char[]) slice;
>> alias slice_template!(wchar[]) slice;
>
>Does that work? (I haven't tried it, but I'd expect the second to over-rule the first?)

Yep, it works. The second does not over-rule the first, it over-*loads* it, meaning slice() is subject to normal function overloading rules. I use this on almost all my templates, I find it makes the code less rough on the eyes and means less typing as well.

Nick

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation