Thread overview | ||||||||
---|---|---|---|---|---|---|---|---|
|
November 14, 2010 [phobos] Clean up patch for std.utf | ||||
---|---|---|---|---|
| ||||
Current std.utf is bit messy and lacks attributes, so I wrote a patch. This patch passes Phobos's unittests. Changes: * Remove UtfError UtfError has been depreacated since Phobos 0.140 (from revision log on dsource). I think removing UtfError is no problem. * Add @safe, @trusted, pure and nothrow attributes I think Unicode operations should be @safe and pure, but dependent functions are not. So, some functions are @trusted and not pure. * char version of stride I removed assert because the comment says "0xFF meaning s[i] is not the start of of UTF-8 sequence.". Until now, my library checked 0xFF :( * validate Add constraint. * toUTF* functions Unify the argument type using 'in'. Current implementation is mixed with "in char[]" and "const(char)[]". Remove some functions that take string, wstring and dstring. The body of these functions call validate only. Need? * count supports dchar I wrote following code in my library. static if (is(Char == dchar)) immutable num = text.length; else immutable num = text.count(); Why doesn't count support dchar? In addition, Why does count depend walkLength? count's call graph is: std.utf.count -> std.range.walkLength -> std.array.empty, front, popFront -> std.utf.stride This seems to be weird. I think count itself calculates the total number of code points and walkLength depends count is more better. The patch doesn't include this proposal. What do you think? Masahiro -------------- next part -------------- A non-text attachment was scrubbed... Name: utf.patch Type: application/octet-stream Size: 31741 bytes Desc: not available URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101114/a0edf41c/attachment-0001.obj> |
November 18, 2010 [phobos] Clean up patch for std.utf | ||||
---|---|---|---|---|
| ||||
Posted in reply to Masahiro Nakagawa | Does anyone see any problems with this patch? If everything is OK, I will commit this patch around next sunday. Masahiro 2010?11?14?15:27 Masahiro Nakagawa <repeatedly at gmail.com>: > Current std.utf is bit messy and lacks attributes, so I wrote a patch. This patch passes Phobos's unittests. > > Changes: > > * Remove UtfError > > UtfError has been depreacated since Phobos 0.140 (from revision log on > dsource). > I think removing UtfError is no problem. > > * Add @safe, @trusted, pure and nothrow attributes > > I think Unicode operations should be @safe and pure, but dependent > functions are not. > So, some functions are @trusted and not pure. > > * char version of stride > > I removed assert because the comment says "0xFF meaning s[i] is not the > start of of UTF-8 sequence.". > Until now, my library checked 0xFF :( > > * validate > > Add constraint. > > * toUTF* functions > > Unify the argument type using 'in'. > Current implementation is mixed with "in char[]" and "const(char)[]". > > Remove some functions that take string, wstring and dstring. The body of these functions call validate only. Need? > > * count supports dchar > > I wrote following code in my library. > > static if (is(Char == dchar)) > immutable num = text.length; > else > immutable num = text.count(); > > Why doesn't count support dchar? > > In addition, Why does count depend walkLength? > count's call graph is: > > std.utf.count -> std.range.walkLength -> std.array.empty, front, popFront -> std.utf.stride > > This seems to be weird. I think count itself calculates the total number of > code points and > walkLength depends count is more better. The patch doesn't include this > proposal. > > What do you think? > > > Masahiro -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20101118/65288af4/attachment.html> |
November 19, 2010 [phobos] Clean up patch for std.utf | ||||
---|---|---|---|---|
| ||||
Posted in reply to Masahiro Nakagawa | "Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
> * char version of stride
>
> I removed assert because the comment says "0xFF meaning s[i] is not the
> start of of UTF-8 sequence.".
> Until now, my library checked 0xFF :(
Shouldn't it throw an exception? Consider this use of stride() with a
broken UTF-8 string:
broken[broken.stride(0) .. $]
It silently succeeds if 'broken' is longer or equal to 255 bytes.
Shin
|
November 20, 2010 [phobos] Clean up patch for std.utf | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shin Fujishiro | On Fri, 19 Nov 2010 06:21:45 +0900, Shin Fujishiro <rsinfu at gmail.com> wrote:
> "Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
>> * char version of stride
>>
>> I removed assert because the comment says "0xFF meaning s[i] is not the
>> start of of UTF-8 sequence.".
>> Until now, my library checked 0xFF :(
>
> Shouldn't it throw an exception? Consider this use of stride() with a
> broken UTF-8 string:
>
> broken[broken.stride(0) .. $]
>
> It silently succeeds if 'broken' is longer or equal to 255 bytes.
>
Hmm... I don't know the correct behavior. DDoc is outdated?
TDPL exmaple uses your code. If TDPL is correct, I will revert char
version of stride.
Masahiro
|
November 24, 2010 [phobos] Clean up patch for std.utf | ||||
---|---|---|---|---|
| ||||
Posted in reply to Shin Fujishiro | I did two commits.
- changeset 2189: apply patch
- changeset 2190: fix issue 5247
Thanks for response and reporting.
Masahiro
On Fri, 19 Nov 2010 06:21:45 +0900, Shin Fujishiro <rsinfu at gmail.com> wrote:
> "Masahiro Nakagawa" <repeatedly at gmail.com> wrote:
>> * char version of stride
>>
>> I removed assert because the comment says "0xFF meaning s[i] is not the
>> start of of UTF-8 sequence.".
>> Until now, my library checked 0xFF :(
>
> Shouldn't it throw an exception? Consider this use of stride() with a
> broken UTF-8 string:
>
> broken[broken.stride(0) .. $]
>
> It silently succeeds if 'broken' is longer or equal to 255 bytes.
>
>
> Shin
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
January 10, 2011 [phobos] Clean up patch for std.utf | ||||
---|---|---|---|---|
| ||||
Posted in reply to Masahiro Nakagawa | Masahiro, I understand all work items in this older email of yours have been completed. If not, please reply to this.
Thanks,
Andrei
On 11/14/10 12:27 AM, Masahiro Nakagawa wrote:
> Current std.utf is bit messy and lacks attributes, so I wrote a patch. This patch passes Phobos's unittests.
>
> Changes:
>
> * Remove UtfError
>
> UtfError has been depreacated since Phobos 0.140 (from revision log on
> dsource).
> I think removing UtfError is no problem.
>
> * Add @safe, @trusted, pure and nothrow attributes
>
> I think Unicode operations should be @safe and pure, but dependent
> functions are not.
> So, some functions are @trusted and not pure.
>
> * char version of stride
>
> I removed assert because the comment says "0xFF meaning s[i] is not the
> start of of UTF-8 sequence.".
> Until now, my library checked 0xFF :(
>
> * validate
>
> Add constraint.
>
> * toUTF* functions
>
> Unify the argument type using 'in'.
> Current implementation is mixed with "in char[]" and "const(char)[]".
>
> Remove some functions that take string, wstring and dstring. The body of these functions call validate only. Need?
>
> * count supports dchar
>
> I wrote following code in my library.
>
> static if (is(Char == dchar))
> immutable num = text.length;
> else
> immutable num = text.count();
>
> Why doesn't count support dchar?
>
> In addition, Why does count depend walkLength?
> count's call graph is:
>
> std.utf.count -> std.range.walkLength -> std.array.empty, front, popFront -> std.utf.stride
>
> This seems to be weird. I think count itself calculates the total number
> of code points and
> walkLength depends count is more better. The patch doesn't include this
> proposal.
>
> What do you think?
>
>
> Masahiro
>
>
>
> _______________________________________________
> phobos mailing list
> phobos at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/phobos
|
Copyright © 1999-2021 by the D Language Foundation