Phobos strings versus C++ Boost (page 2)

On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg wrote: > On 2014-01-11 08:50, Brad Anderson wrote: >> The recent discussion got me wondering how Phobos stacked up >> against the C++ Boost String Algorithms library. >> >> Some background on the design of the Boost library: >> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html >> >> TL;DR: It works somewhat like ranges. >> >> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4 > > toLower/Upper doesn't really work in place. Yeah, it's kind of an argument for and against Phobos/D. InPlace can't be truly inplace like Boost's is because we have actual unicode support.

12-Jan-2014 01:22, monarch_dodra пишет: > On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg wrote: >> On 2014-01-11 08:50, Brad Anderson wrote: >>> The recent discussion got me wondering how Phobos stacked up >>> against the C++ Boost String Algorithms library. >>> >>> Some background on the design of the Boost library: >>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html >>> >>> TL;DR: It works somewhat like ranges. >>> >>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4 >> >> toLower/Upper doesn't really work in place. > > Yeah, "toLowerInplace" is actually more like "toLowerProbablyInPlace" With high probablity :) And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC. -- Dmitry Olshansky

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Brad Anderson
in reply to Andrei Alexandrescu

Permalink

Brad Anderson

Posted in reply to Andrei Alexandrescu

Permalink

On Saturday, 11 January 2014 at 20:46:32 UTC, Andrei Alexandrescu
wrote:
> On 1/10/14 11:50 PM, Brad Anderson wrote:
>> The recent discussion got me wondering how Phobos stacked up
>> against the C++ Boost String Algorithms library.
>>
>> Some background on the design of the Boost library:
>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>>
>> TL;DR: It works somewhat like ranges.
>>
>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
> [snip]
>
> Awesome! Shall we create an issue and link the spreadsheet from there?
>
> Andrei

I'll probably just make an issue for each group of problems after
this is done getting feedback.

The big issues appear to boil down to two things: 1) The complete
inability to do replace/erase functions easily and 2) the lack of
Unicode collation support getting in the way of case-insensitive
operations which are correct in every language.

Number 1 is pretty serious for day to day coding. Number 2 would
just fill a hole in our otherwise excellent Unicode support
(something Boost doesn't even truly have, instead using locales
and character sets). In the meantime, for English and a few other
languages what we have already can be used to perform
case-insensitive operations.

On 2014-01-11 22:42, Dmitry Olshansky wrote: > With high probablity :) > > And it's indeed quite high, the amount of "bad sheep" that gets > longer/shorter across the whole Unicode is around 5-10 codepoints IRC. The least we can do is make that very clear in the documentation. -- /Jacob Carlborg

On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote: > 12-Jan-2014 01:22, monarch_dodra пишет: >> On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg wrote: >>> On 2014-01-11 08:50, Brad Anderson wrote: >>>> The recent discussion got me wondering how Phobos stacked up >>>> against the C++ Boost String Algorithms library. >>>> >>>> Some background on the design of the Boost library: >>>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html >>>> >>>> TL;DR: It works somewhat like ranges. >>>> >>>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4 >>> >>> toLower/Upper doesn't really work in place. >> >> Yeah, "toLowerInplace" is actually more like "toLowerProbablyInPlace" > > With high probablity :) > > And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC. More important than the absolute amount of "bad sheep" is the frequency of them in your input :-)

On Sunday, 12 January 2014 at 12:48:05 UTC, Tobias Pankrath wrote: > On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote: >> 12-Jan-2014 01:22, monarch_dodra пишет: >> And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC. > > More important than the absolute amount of "bad sheep" is the frequency of them in your input :-) In german the frequency of "ß" is 0.31% and the mess with getting a longer result ("SS") is only for toUpper(). I think greak has a similar problem but don't know the frequency there...

On 2014-01-13 17:15:21 +0000, "Dominikus Dittes Scherkl" <Dominikus.Scherkl@continental-corporation.com> said: > On Sunday, 12 January 2014 at 12:48:05 UTC, Tobias Pankrath wrote: >> On Saturday, 11 January 2014 at 21:42:46 UTC, Dmitry Olshansky wrote: >>> 12-Jan-2014 01:22, monarch_dodra пишет: >>> And it's indeed quite high, the amount of "bad sheep" that gets longer/shorter across the whole Unicode is around 5-10 codepoints IRC. >> >> More important than the absolute amount of "bad sheep" is the frequency of them in your input :-) > > In german the frequency of "ß" is 0.31% and the mess with getting a longer > result ("SS") is only for toUpper(). > I think greak has a similar problem but don't know the frequency there... The funny thing about "ß" is that in UTF-8 it's two bytes (0xC3 0x9F) and you replace it with "SS" which is two bytes too (0x53 0x53). So with some cleverness it can be done in place for char[], but not for wchar[] or dchar[]. :-) -- Michel Fortin michel.fortin@michelf.ca http://michelf.ca

Forums