dchar unicode phobos (page 2)

June 07, 2006

Re: dchar unicode phobos

Posted by Oskar Linde
in reply to Sean Kelly

Permalink

Oskar Linde

Posted in reply to Sean Kelly

Permalink

Sean Kelly skrev:
> Johan Granberg wrote:
>>
>> Yes I have needed support for dchar[] with functions like split , splitline and strip in std.string.
>> Yes your ways of doing the support looks ok, I would choose 3 thou instead of 1. It may bee because I'm not 100% sure about how 1 would work. (care to give an example)
>> No I have not looked closly at mango yet. (Will do)
> 
> Oskar has an array template library that can do much of this, and I have the beginnings of one in Ares as well.  The source is here:
> 
> http://svn.dsource.org/projects/ares/trunk/src/ares/std/array.d
> 
> As you can see however, half the functions are commented out because template function overloading basically just doesn't work yet.

I agree that it would be really nice if those types of templates worked today, but all of those functions can be rewritten in a way that works with current D. Considering the amount of time it took us to get the current (most basic) ifti support, I would rather use a solution that works today, than wait an indefinite amount of time for something that may never happen. :) I fully appreciate your stand point though and would love to hear something from Walter regarding future ifti support.

Some things that I would like to see improved (in descending order of importance) are ifti support for:
1. template member functions
2. mixed explicit/implicit arguments: f!(int)('x') => f!(int,char)('x')
3. template specializations
4. better template function overloading
5. generic matching: template t(X) { void t(X[] a, X b) {}}

> Eventually however, I plan to add split, join, etc.  These will probably all assume fixed-width elements, with improved support for char and wchar strings in a std.string module, as supporting variable width encoding will slow down the algorithms.

It sounds reasonable to avoid any variable length awareness in std.array, but I don't really see how supporting that will make split or join any slower. For instance

(char[]).split(char)
(char[]).split(char[])
(char[]).split(bool delegate(char))

Aren't affected by variable length encodings. Only:

(char[]).split(dchar)
(char[]).split(bool delegate(dchar))

are, (by using a dchar foreach over a char[]), but here, the user is explicit about wanting a multi byte implementation. Putting the implementation of the last two versions in std.string gives a neat std.string/std.array separation, but risk confusing the user:

- Why would "abc".split('a') be in std.array while "abc".split('å') requires std.string?

Regards,

Oskar

Oskar Linde wrote:
> Sean Kelly skrev:
> 
>> Eventually however, I plan to add split, join, etc.  These will probably all assume fixed-width elements, with improved support for char and wchar strings in a std.string module, as supporting variable width encoding will slow down the algorithms.
> 
> It sounds reasonable to avoid any variable length awareness in std.array, but I don't really see how supporting that will make split or join any slower. For instance
> 
> (char[]).split(char)
> (char[]).split(char[])
> (char[]).split(bool delegate(char))
> 
> Aren't affected by variable length encodings.

The most obvious performance issue with variable width encodings is with searching and matching routines.  And most routines in std.array ultimately rely on searching and matching in some form.  However, I wasn't going to go so far as to support type conversion for this stuff:

    size_t find( char[] str, dchar elem );

which does help a bit.

> Only:
> 
> (char[]).split(dchar)
> (char[]).split(bool delegate(dchar))
> 
> are, (by using a dchar foreach over a char[]), but here, the user is explicit about wanting a multi byte implementation. Putting the implementation of the last two versions in std.string gives a neat std.string/std.array separation, but risk confusing the user:
> 
> - Why would "abc".split('a') be in std.array while "abc".split('å') requires std.string?

I had initially thought that std.utf.stride would be required to avoid false matches for search routines but have since been told otherwise, so there may be no reason for the specialized std.string functions I'd mentioned.  I forgot about this bit while writing my last post :-)

Sean

Forums