Phobos strings versus C++ Boost - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » Phobos strings versus C++ Boost

Thread overview

Phobos strings versus C++ Boost
Jan 11, 2014 Brad Anderson
Jan 11, 2014 Jakob Ovrum
Jan 11, 2014 Brad Anderson
Jan 11, 2014 Michel Fortin
Jan 11, 2014 Brad Anderson
Jan 11, 2014 monarch_dodra
Jan 11, 2014 Brad Anderson
Jan 11, 2014 Jacob Carlborg
Jan 11, 2014 monarch_dodra
Jan 11, 2014 Dmitry Olshansky
Jan 12, 2014 Jacob Carlborg
Jan 12, 2014 Tobias Pankrath
Jan 13, 2014 Dominikus Dittes Scherkl
Jan 13, 2014 Michel Fortin
Jan 11, 2014 Brad Anderson
Jan 11, 2014 Andrei Alexandrescu
Jan 11, 2014 Brad Anderson

January 11, 2014

Phobos strings versus C++ Boost

Posted by Brad Anderson

Brad Anderson

The recent discussion got me wondering how Phobos stacked up
against the C++ Boost String Algorithms library.

Some background on the design of the Boost library:
http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html

TL;DR: It works somewhat like ranges.

Google Spreadsheet with the comparison: http://goo.gl/Wmotu4

I wouldn't be surprised if I missed functions that would do
things easily but I did look reasonably hard for ways to
accomplish things. Do share if you spot anything I missed but
everything should be intuitive rather than clever.

A few things stand out:

1. They have case-insensitive versions of pretty much everything.
It's not hard to do a map!toLower/toUpper in D but it's also not
obvious (nor do I know if that's actually correct in languages
outside of english).

2. Replace and erase options a very slim. Doing something like a
chain() on the results of findSplit() and what you want to inject
I guess would work for replacing but that's really not very
elegant. remove() is simply way too cumbersome to use. I guess
you could use indexOf, then indexOf again with a slice with the
first result, then pass both two a tuple in remove. That's
terrible though.

3. Doing an action several times rather than once is tricky.  As
in, there is no findAll() that returns a range of ranges. Doing
the things mentioned in 2 several times over a whole range just
adds another level of complication.

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Jakob Ovrum
in reply to Brad Anderson

Jakob Ovrum

Posted in reply to Brad Anderson

On Saturday, 11 January 2014 at 07:50:56 UTC, Brad Anderson wrote:
> The recent discussion got me wondering how Phobos stacked up
> against the C++ Boost String Algorithms library.
>
> Some background on the design of the Boost library:
> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>
> TL;DR: It works somewhat like ranges.
>
> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4

Some comments:

 * `empty` is a property - do not append parentheses/call syntax
 * `!find().empty` => `canFind` or `any`
 * `ifind_first/last` can use `find!((a, b) => a.toLower() == b.toLower())`
 * I think the Phobos equivalent of `find_tail` needs a second `retro`?
 * I don't like the idea of adding a predicate to joiner, I think using filter is better

> 1. They have case-insensitive versions of pretty much everything.
> It's not hard to do a map!toLower/toUpper in D but it's also not
> obvious (nor do I know if that's actually correct in languages
> outside of english).

There are two pairs of toLower/toUpper - the ones in std.ascii and std.uni (the std.string pair aliases to std.uni). The latter pair works correctly for all scripts.

> 2. Replace and erase options a very slim. Doing something like a
> chain() on the results of findSplit() and what you want to inject
> I guess would work for replacing but that's really not very
> elegant. remove() is simply way too cumbersome to use. I guess
> you could use indexOf, then indexOf again with a slice with the
> first result, then pass both two a tuple in remove. That's
> terrible though.

I think the mutation algorithms in std.algorithm can handle most of these when used in conjunction with other algorithms, except that narrow strings do not have the property of assignable elements, which is kind of a fatal blow.

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Michel Fortin
in reply to Brad Anderson

Michel Fortin

Posted in reply to Brad Anderson

On 2014-01-11 07:50:54 +0000, "Brad Anderson" <eco@gnuk.net> said:

> 1. They have case-insensitive versions of pretty much everything.
> It's not hard to do a map!toLower/toUpper in D but it's also not
> obvious (nor do I know if that's actually correct in languages
> outside of english).

Uppercase, lowercase, and case-insensitive comparison is locale-dependent for Unicode. In the general case you can't just compare the lowercase/uppercase versions. For instance, look at the Turkish i/İ and ı/I (dot-less i), or the German ß/SS ss/SS pairs. Also, if you're sorting in alphabetical order you probably want to do something special with diacritics.

The correct way to to this is to implement the Unicode Collation Algorithm:
http://www.unicode.org/reports/tr10/

-- 
Michel Fortin
michel.fortin@michelf.ca
http://michelf.ca

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Brad Anderson
in reply to Michel Fortin

Brad Anderson

Posted in reply to Michel Fortin

On Saturday, 11 January 2014 at 12:47:12 UTC, Michel Fortin wrote:
> On 2014-01-11 07:50:54 +0000, "Brad Anderson" <eco@gnuk.net> said:
>
>> 1. They have case-insensitive versions of pretty much everything.
>> It's not hard to do a map!toLower/toUpper in D but it's also not
>> obvious (nor do I know if that's actually correct in languages
>> outside of english).
>
> Uppercase, lowercase, and case-insensitive comparison is locale-dependent for Unicode. In the general case you can't just compare the lowercase/uppercase versions. For instance, look at the Turkish i/İ and ı/I (dot-less i), or the German ß/SS ss/SS pairs. Also, if you're sorting in alphabetical order you probably want to do something special with diacritics.
>
> The correct way to to this is to implement the Unicode Collation Algorithm:
> http://www.unicode.org/reports/tr10/

I thought it was probably more complicated than that.

Looks like Dmitry put it in the tracker:
http://d.puremagic.com/issues/show_bug.cgi?id=10566

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Brad Anderson
in reply to Jakob Ovrum

Brad Anderson

Posted in reply to Jakob Ovrum

On Saturday, 11 January 2014 at 08:25:39 UTC, Jakob Ovrum wrote:

> Some comments:
>
>  * `empty` is a property - do not append parentheses/call syntax

*Nod*

>  * `!find().empty` => `canFind` or `any`

The documentation needs to be improved for canFind then. It takes
an `E needle` so I assumed it was an element type only.  The
other overload of canFind takes `Ranges needles` and stops when
it finds just one of them so I assumed it'd be called in the case
assert("123".canFind("321")) and would be true (>0). Looks like
the first overload just hands off to find() which can do either
element type or a subrange but that's not clear from the
documentation.

any() needs some examples. I'm not sure how it'd be used for this
purpose.

I'll try to make some pull requests to fix both of these doc
issues.

>  * `ifind_first/last` can use `find!((a, b) => a.toLower() == b.toLower())`

Yeah, but as Michael pointed out this isn't really a valid way to
do case-insensitive comparison anyway.

>  * I think the Phobos equivalent of `find_tail` needs a second `retro`?

Yeah, very ugly.

>  * I don't like the idea of adding a predicate to joiner, I think using filter is better

I just figured for consistency since so much of std.algorithm
accepts a predicate. I'm not opposed to sticking with filter
though.

>> 1. They have case-insensitive versions of pretty much everything.
>> It's not hard to do a map!toLower/toUpper in D but it's also not
>> obvious (nor do I know if that's actually correct in languages
>> outside of english).
>
> There are two pairs of toLower/toUpper - the ones in std.ascii and std.uni (the std.string pair aliases to std.uni). The latter pair works correctly for all scripts.
>
>> 2. Replace and erase options a very slim. Doing something like a
>> chain() on the results of findSplit() and what you want to inject
>> I guess would work for replacing but that's really not very
>> elegant. remove() is simply way too cumbersome to use. I guess
>> you could use indexOf, then indexOf again with a slice with the
>> first result, then pass both two a tuple in remove. That's
>> terrible though.
>
> I think the mutation algorithms in std.algorithm can handle most of these when used in conjunction with other algorithms, except that narrow strings do not have the property of assignable elements, which is kind of a fatal blow.

Something needs to be done about this. I'm not sure what.

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by monarch_dodra
in reply to Brad Anderson

monarch_dodra

Posted in reply to Brad Anderson

On Saturday, 11 January 2014 at 18:14:24 UTC, Brad Anderson wrote:
> On Saturday, 11 January 2014 at 12:47:12 UTC, Michel Fortin
>> The correct way to to this is to implement the Unicode Collation Algorithm:
>> http://www.unicode.org/reports/tr10/
>
> I thought it was probably more complicated than that.

You should read the report...

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Brad Anderson
in reply to monarch_dodra

Brad Anderson

Posted in reply to monarch_dodra

On Saturday, 11 January 2014 at 18:56:53 UTC, monarch_dodra wrote:
> On Saturday, 11 January 2014 at 18:14:24 UTC, Brad Anderson wrote:
>> On Saturday, 11 January 2014 at 12:47:12 UTC, Michel Fortin
>>> The correct way to to this is to implement the Unicode Collation Algorithm:
>>> http://www.unicode.org/reports/tr10/
>>
>> I thought it was probably more complicated than that.
>
> You should read the report...

I meant more complicated than toLower. I'm already plenty
intimidated by Unicode publications :)

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Jacob Carlborg
in reply to Brad Anderson

Jacob Carlborg

Posted in reply to Brad Anderson

On 2014-01-11 08:50, Brad Anderson wrote:
> The recent discussion got me wondering how Phobos stacked up
> against the C++ Boost String Algorithms library.
>
> Some background on the design of the Boost library:
> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>
> TL;DR: It works somewhat like ranges.
>
> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4

toLower/Upper doesn't really work in place.

-- 
/Jacob Carlborg

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by Andrei Alexandrescu
in reply to Brad Anderson

Andrei Alexandrescu

Posted in reply to Brad Anderson

On 1/10/14 11:50 PM, Brad Anderson wrote:
> The recent discussion got me wondering how Phobos stacked up
> against the C++ Boost String Algorithms library.
>
> Some background on the design of the Boost library:
> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>
> TL;DR: It works somewhat like ranges.
>
> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
[snip]

Awesome! Shall we create an issue and link the spreadsheet from there?

Andrei

January 11, 2014

Re: Phobos strings versus C++ Boost

Posted by monarch_dodra
in reply to Jacob Carlborg

monarch_dodra

Posted in reply to Jacob Carlborg

On Saturday, 11 January 2014 at 20:36:31 UTC, Jacob Carlborg wrote:
> On 2014-01-11 08:50, Brad Anderson wrote:
>> The recent discussion got me wondering how Phobos stacked up
>> against the C++ Boost String Algorithms library.
>>
>> Some background on the design of the Boost library:
>> http://www.boost.org/doc/libs/1_55_0/doc/html/string_algo/design.html
>>
>> TL;DR: It works somewhat like ranges.
>>
>> Google Spreadsheet with the comparison: http://goo.gl/Wmotu4
>
> toLower/Upper doesn't really work in place.

Yeah, "toLowerInplace" is actually more like "toLowerProbablyInPlace"

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation