Thread overview
[phobos] Clean up patch for std.utf
Nov 14, 2010
Masahiro Nakagawa
Nov 18, 2010
Masahiro Nakagawa
Nov 18, 2010
Shin Fujishiro
Nov 20, 2010
Masahiro Nakagawa
Nov 24, 2010
Masahiro Nakagawa
November 14, 2010
Current std.utf is bit messy and lacks attributes, so I wrote a patch. This patch passes Phobos's unittests.

Changes:

* Remove UtfError

UtfError has been depreacated since Phobos 0.140 (from revision log on
dsource).
I think removing UtfError is no problem.

* Add @safe, @trusted, pure and nothrow attributes

I think Unicode operations should be @safe and pure, but dependent
functions are not.
So, some functions are @trusted and not pure.

* char version of stride

I removed assert because the comment says "0xFF meaning s[i] is not the
start of of UTF-8 sequence.".
Until now, my library checked 0xFF :(

* validate

Add constraint.

* toUTF* functions

Unify the argument type using 'in'.
Current implementation is mixed with "in char[]" and "const(char)[]".

Remove some functions that take string, wstring and dstring. The body of these functions call validate only. Need?

* count supports dchar

I wrote following code in my library.

static if (is(Char == dchar))
     immutable num = text.length;
else
     immutable num = text.count();

Why doesn't count support dchar?

In addition, Why does count depend walkLength?
count's call graph is:

std.utf.count -> std.range.walkLength -> std.array.empty, front, popFront -> std.utf.stride

This seems to be weird. I think count itself calculates the total number
of code points and
walkLength depends count is more better. The patch doesn't include this
proposal.

What do you think?


Masahiro
-------------- next part --------------
A non-text attachment was scrubbed...
Name: utf.patch
Type: application/octet-stream
Size: 31741 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101114/a0edf41c/attachment-0001.obj>
November 18, 2010
Does anyone see any problems with this patch?
If everything is OK, I will commit this patch around next sunday.


Masahiro

2010?11?14?15:27 Masahiro Nakagawa <repeatedly at gmail.com>:

> Current std.utf is bit messy and lacks attributes, so I wrote a patch. This patch passes Phobos's unittests.
>
> Changes:
>
> * Remove UtfError
>
> UtfError has been depreacated since Phobos 0.140 (from revision log on
> dsource).
> I think removing UtfError is no problem.
>
> * Add @safe, @trusted, pure and nothrow attributes
>
> I think Unicode operations should be @safe and pure, but dependent
> functions are not.
> So, some functions are @trusted and not pure.
>
> * char version of stride
>
> I removed assert because the comment says "0xFF meaning s[i] is not the
> start of of UTF-8 sequence.".
> Until now, my library checked 0xFF :(
>
> * validate
>
> Add constraint.
>
> * toUTF* functions
>
> Unify the argument type using 'in'.
> Current implementation is mixed with "in char[]" and "const(char)[]".
>
> Remove some functions that take string, wstring and dstring. The body of these functions call validate only. Need?
>
> * count supports dchar
>
> I wrote following code in my library.
>
> static if (is(Char == dchar))
>    immutable num = text.length;
> else
>    immutable num = text.count();
>
> Why doesn't count support dchar?
>
> In addition, Why does count depend walkLength?
> count's call graph is:
>
> std.utf.count -> std.range.walkLength -> std.array.empty, front, popFront -> std.utf.stride
>
> This seems to be weird. I think count itself calculates the total number of
> code points and
> walkLength depends count is more better. The patch doesn't include this
> proposal.
>
> What do you think?
>
>
> Masahiro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101118/65288af4/attachment.html>
November 19, 2010
"Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
> * char version of stride
> 
> I removed assert because the comment says "0xFF meaning s[i] is not the
> start of of UTF-8 sequence.".
> Until now, my library checked 0xFF :(

Shouldn't it throw an exception?  Consider this use of stride() with a
broken UTF-8 string:

  broken[broken.stride(0) .. $]

It silently succeeds if 'broken' is longer or equal to 255 bytes.


Shin
November 20, 2010
On Fri, 19 Nov 2010 06:21:45 +0900, Shin Fujishiro <rsinfu at gmail.com> wrote:

> "Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
>> * char version of stride
>>
>> I removed assert because the comment says "0xFF meaning s[i] is not the
>> start of of UTF-8 sequence.".
>> Until now, my library checked 0xFF :(
>
> Shouldn't it throw an exception?  Consider this use of stride() with a
> broken UTF-8 string:
>
>   broken[broken.stride(0) .. $]
>
> It silently succeeds if 'broken' is longer or equal to 255 bytes.
>

Hmm... I don't know the correct behavior. DDoc is outdated?
TDPL exmaple uses your code. If TDPL is correct, I will revert char
version of stride.


Masahiro
November 24, 2010
I did two commits.

- changeset 2189: apply patch
- changeset 2190: fix issue 5247

Thanks for response and reporting.


Masahiro

On Fri, 19 Nov 2010 06:21:45 +0900, Shin Fujishiro <rsinfu at gmail.com> wrote:

> "Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
>> * char version of stride
>>
>> I removed assert because the comment says "0xFF meaning s[i] is not the
>> start of of UTF-8 sequence.".
>> Until now, my library checked 0xFF :(
>
> Shouldn't it throw an exception?  Consider this use of stride() with a
> broken UTF-8 string:
>
>   broken[broken.stride(0) .. $]
>
> It silently succeeds if 'broken' is longer or equal to 255 bytes.
>
>
> Shin
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
January 10, 2011
Masahiro, I understand all work items in this older email of yours have been completed. If not, please reply to this.

Thanks,

Andrei

On 11/14/10 12:27 AM, Masahiro Nakagawa wrote:
> Current std.utf is bit messy and lacks attributes, so I wrote a patch. This patch passes Phobos's unittests.
> 
> Changes:
> 
> * Remove UtfError
> 
> UtfError has been depreacated since Phobos 0.140 (from revision log on
> dsource).
> I think removing UtfError is no problem.
> 
> * Add @safe, @trusted, pure and nothrow attributes
> 
> I think Unicode operations should be @safe and pure, but dependent
> functions are not.
> So, some functions are @trusted and not pure.
> 
> * char version of stride
> 
> I removed assert because the comment says "0xFF meaning s[i] is not the
> start of of UTF-8 sequence.".
> Until now, my library checked 0xFF :(
> 
> * validate
> 
> Add constraint.
> 
> * toUTF* functions
> 
> Unify the argument type using 'in'.
> Current implementation is mixed with "in char[]" and "const(char)[]".
> 
> Remove some functions that take string, wstring and dstring. The body of these functions call validate only. Need?
> 
> * count supports dchar
> 
> I wrote following code in my library.
> 
> static if (is(Char == dchar))
> immutable num = text.length;
> else
> immutable num = text.count();
> 
> Why doesn't count support dchar?
> 
> In addition, Why does count depend walkLength?
> count's call graph is:
> 
> std.utf.count -> std.range.walkLength -> std.array.empty, front, popFront -> std.utf.stride
> 
> This seems to be weird. I think count itself calculates the total number
> of code points and
> walkLength depends count is more better. The patch doesn't include this
> proposal.
> 
> What do you think?
> 
> 
> Masahiro
> 
> 
> 
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos