January 11, 2014
On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg
wrote:
> On 2014-01-11 08:50, Brad Anderson wrote:
>> The recent discussion got me wondering how Phobos stacked up
>> against the C++ Boost String Algorithms library.
>>
>> Some background on the design of the Boost library:
>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>>
>> TL;DR: It works somewhat like ranges.
>>
>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
>
> toLower/Upper doesn't really work in place.

Yeah, it's kind of an argument for and against Phobos/D. InPlace
can't be truly inplace like Boost's is because we have actual
unicode support.
January 11, 2014
12-Jan-2014 01:22, monarch_dodra пишет:
> On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg wrote:
>> On 2014-01-11 08:50, Brad Anderson wrote:
>>> The recent discussion got me wondering how Phobos stacked up
>>> against the C++ Boost String Algorithms library.
>>>
>>> Some background on the design of the Boost library:
>>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>>>
>>> TL;DR: It works somewhat like ranges.
>>>
>>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
>>
>> toLower/Upper doesn't really work in place.
>
> Yeah, "toLowerInplace" is actually more like "toLowerProbablyInPlace"

With high probablity :)

And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC.

-- 
Dmitry Olshansky
January 11, 2014
On Saturday, 11 January 2014 at 20:46:32 UTC, Andrei Alexandrescu
wrote:
> On 1/10/14 11:50 PM, Brad Anderson wrote:
>> The recent discussion got me wondering how Phobos stacked up
>> against the C++ Boost String Algorithms library.
>>
>> Some background on the design of the Boost library:
>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>>
>> TL;DR: It works somewhat like ranges.
>>
>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
> [snip]
>
> Awesome! Shall we create an issue and link the spreadsheet from there?
>
> Andrei

I'll probably just make an issue for each group of problems after
this is done getting feedback.

The big issues appear to boil down to two things: 1) The complete
inability to do replace/erase functions easily and 2) the lack of
Unicode collation support getting in the way of case-insensitive
operations which are correct in every language.

Number 1 is pretty serious for day to day coding. Number 2 would
just fill a hole in our otherwise excellent Unicode support
(something Boost doesn't even truly have, instead using locales
and character sets). In the meantime, for English and a few other
languages what we have already can be used to perform
case-insensitive operations.
January 12, 2014
On 2014-01-11 22:42, Dmitry Olshansky wrote:

> With high probablity :)
>
> And it's indeed quite high, the amount of "bad sheep" that gets
> longer/shorter across the whole Unicode is around 5-10 codepoints IRC.

The least we can do is make that very clear in the documentation.

-- 
/Jacob Carlborg
January 12, 2014
On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote:
> 12-Jan-2014 01:22, monarch_dodra пишет:
>> On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg wrote:
>>> On 2014-01-11 08:50, Brad Anderson wrote:
>>>> The recent discussion got me wondering how Phobos stacked up
>>>> against the C++ Boost String Algorithms library.
>>>>
>>>> Some background on the design of the Boost library:
>>>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>>>>
>>>> TL;DR: It works somewhat like ranges.
>>>>
>>>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
>>>
>>> toLower/Upper doesn't really work in place.
>>
>> Yeah, "toLowerInplace" is actually more like "toLowerProbablyInPlace"
>
> With high probablity :)
>
> And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC.

More important than the absolute amount of "bad sheep" is the frequency of them in your input :-)
January 13, 2014
On Sunday, 12 January 2014 at 12:48:05 UTC, Tobias Pankrath wrote:
> On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote:
>> 12-Jan-2014 01:22, monarch_dodra пишет:
>> And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC.
>
> More important than the absolute amount of "bad sheep" is the frequency of them in your input :-)

In german the frequency of "ß" is 0.31% and the mess with getting a longer
result ("SS") is only for toUpper().
I think greak has a similar problem but don't know the frequency there...
January 13, 2014
On 2014-01-13 17:15:21 +0000, "Dominikus Dittes Scherkl" <Dominikus.Scherkl@continental-corporation.com> said:

> On Sunday, 12 January 2014 at 12:48:05 UTC, Tobias Pankrath wrote:
>> On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote:
>>> 12-Jan-2014 01:22, monarch_dodra пишет:
>>> And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC.
>> 
>> More important than the absolute amount of "bad sheep" is the frequency of them in your input :-)
> 
> In german the frequency of "ß" is 0.31% and the mess with getting a longer
> result ("SS") is only for toUpper().
> I think greak has a similar problem but don't know the frequency there...

The funny thing about "ß" is that in UTF-8 it's two bytes (0xC3 0x9F) and you replace it with "SS" which is two bytes too (0x53 0x53). So with some cleverness it can be done in place for char[], but not for wchar[] or dchar[]. :-)

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

1 2
Next ›   Last »