splitter for strings - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » splitter for strings

Thread overview

splitter for strings
Jun 09, 2014 Chris
Jun 09, 2014 bearophile
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Chris
Jun 09, 2014 Steven Schveighoffer
Jun 09, 2014 Chris
Jun 09, 2014 Steven Schveighoffer
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Steven Schveighoffer
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Steven Schveighoffer
Jun 09, 2014 monarch_dodra
Jun 09, 2014 Chris
Jun 09, 2014 monarch_dodra

June 09, 2014

splitter for strings

Posted by Chris

Chris

Say I wanna split a string that contains hyphens. If I use std.algorithm.splitter I end up with empty elements for each hyphen, e.g.:

auto word = "bla-bla";
auto parts = appender!(string[]);
w.splitter('-').copy(parts);
// parts.data.length == 3 ["bla", "", "bla"]

This is not ideal for my purposes, so I filter like so:

auto parts = appender!(string[]);
foreach (p; word.splitter('-')) {
  if (p != "") {
    parts ~= p;
  }
}

or even better like so:

w.splitter('-').filter!(a => a != "").copy(parts);

I wonder, however, whether this is ideal or whether regex's split would be a better match (pardon the pun!). I try to avoid regex when ever possible since they are more awkward to use and usually more expensive.

June 09, 2014

Re: splitter for strings

Posted by bearophile
in reply to Chris

bearophile

Posted in reply to Chris

Chris:

> auto word = "bla-bla";
> auto parts = appender!(string[]);
> w.splitter('-').copy(parts);
> // parts.data.length == 3 ["bla", "", "bla"]

With the current dmd 2.066alpha this code:

void main() {
    import std.stdio, std.string, std.algorithm;
    const txt = "bla-bla";
    txt.split("-").writeln;
    txt.splitter("-").writeln;
    txt.splitter('-').writeln;
}

Prints:

["bla", "bla"]
["bla", "bla"]
["bla", "bla"]

Bye,
bearophile

June 09, 2014

Re: splitter for strings

Posted by Chris
in reply to bearophile

Chris

Posted in reply to bearophile

On Monday, 9 June 2014 at 10:14:40 UTC, bearophile wrote:
> Chris:
>
>> auto word = "bla-bla";
>> auto parts = appender!(string[]);
>> w.splitter('-').copy(parts);
>> // parts.data.length == 3 ["bla", "", "bla"]
>
> With the current dmd 2.066alpha this code:
>
> void main() {
>     import std.stdio, std.string, std.algorithm;
>     const txt = "bla-bla";
>     txt.split("-").writeln;
>     txt.splitter("-").writeln;
>     txt.splitter('-').writeln;
> }
>
> Prints:
>
> ["bla", "bla"]
> ["bla", "bla"]
> ["bla", "bla"]
>
> Bye,
> bearophile

Ok, thanks. I'll keep that in mind for the next version.

June 09, 2014

Re: splitter for strings

Posted by monarch_dodra
in reply to Chris

monarch_dodra

Posted in reply to Chris

On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>
> Ok, thanks. I'll keep that in mind for the next version.

Seems to me to also work with 2.065 and 2.064.

June 09, 2014

Re: splitter for strings

Posted by Chris
in reply to monarch_dodra

Chris

Posted in reply to monarch_dodra

On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>>
>> Ok, thanks. I'll keep that in mind for the next version.
>
> Seems to me to also work with 2.065 and 2.064.

From the library reference:

assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));

and

"If a range with one separator is given, the result is a range with two empty elements."

My problem was that if I have input like

auto word = "bla-";

it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.

length == 2 // grab [0] grab [1]
length == 1 // grab [0] (no second part, as in "bla-")
length > 2 // do something else

June 09, 2014

Re: splitter for strings

Posted by monarch_dodra
in reply to Chris

monarch_dodra

Posted in reply to Chris

On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
> From the library reference:
>
> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>
> and
>
> "If a range with one separator is given, the result is a range with two empty elements."
>
> My problem was that if I have input like
>
> auto word = "bla-";
>
> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>
> length == 2 // grab [0] grab [1]
> length == 1 // grab [0] (no second part, as in "bla-")
> length > 2 // do something else

You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
put(parts, w.splitter('-').filter!(a=>!a.empty)());

The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:

assert(equal(myString.spliter(sep).join(sep), myString));

If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.

June 09, 2014

Re: splitter for strings

Posted by Chris
in reply to monarch_dodra

Chris

Posted in reply to monarch_dodra

On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>> From the library reference:
>>
>> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>>
>> and
>>
>> "If a range with one separator is given, the result is a range with two empty elements."
>>
>> My problem was that if I have input like
>>
>> auto word = "bla-";
>>
>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>
>> length == 2 // grab [0] grab [1]
>> length == 1 // grab [0] (no second part, as in "bla-")
>> length > 2 // do something else
>
> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>
> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>
> assert(equal(myString.spliter(sep).join(sep), myString));
>
> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.

I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.

June 09, 2014

Re: splitter for strings

Posted by monarch_dodra
in reply to Chris

monarch_dodra

Posted in reply to Chris

On Monday, 9 June 2014 at 11:40:24 UTC, Chris wrote:
> On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
>> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>>> From the library reference:
>>>
>>> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>>>
>>> and
>>>
>>> "If a range with one separator is given, the result is a range with two empty elements."
>>>
>>> My problem was that if I have input like
>>>
>>> auto word = "bla-";
>>>
>>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>>
>>> length == 2 // grab [0] grab [1]
>>> length == 1 // grab [0] (no second part, as in "bla-")
>>> length > 2 // do something else
>>
>> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
>> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>>
>> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>>
>> assert(equal(myString.spliter(sep).join(sep), myString));
>>
>> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
>
> I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.

Arguably, none, since someone has to do the check anyways. If it's not done "outside" of splitter, it has to be done inside...

June 09, 2014

Re: splitter for strings

Posted by Chris
in reply to monarch_dodra

Chris

Posted in reply to monarch_dodra

On Monday, 9 June 2014 at 12:16:30 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 11:40:24 UTC, Chris wrote:
>> On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
>>> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>>>> From the library reference:
>>>>
>>>> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>>>>
>>>> and
>>>>
>>>> "If a range with one separator is given, the result is a range with two empty elements."
>>>>
>>>> My problem was that if I have input like
>>>>
>>>> auto word = "bla-";
>>>>
>>>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>>>
>>>> length == 2 // grab [0] grab [1]
>>>> length == 1 // grab [0] (no second part, as in "bla-")
>>>> length > 2 // do something else
>>>
>>> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
>>> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>>>
>>> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>>>
>>> assert(equal(myString.spliter(sep).join(sep), myString));
>>>
>>> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
>>
>> I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.
>
> Arguably, none, since someone has to do the check anyways. If it's not done "outside" of splitter, it has to be done inside...

Yes, of course. I just thought if it's done in the library function, the optimization might be better than when it is done in my code. (filter!() is arguably also in the library :)

June 09, 2014

Re: splitter for strings

Posted by Steven Schveighoffer
in reply to Chris

Steven Schveighoffer

Posted in reply to Chris

On Mon, 09 Jun 2014 07:04:11 -0400, Chris <wendlec@tcd.ie> wrote:

> On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote:
>> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>>>
>>> Ok, thanks. I'll keep that in mind for the next version.
>>
>> Seems to me to also work with 2.065 and 2.064.
>
>  From the library reference:
>
> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));

Note the 2 spaces between hello and world

> and
>
> "If a range with one separator is given, the result is a range with two empty elements."

Right, it allows you to distinguish cases where the range starts or ends with the separator.

> My problem was that if I have input like
>
> auto word = "bla-";
>
> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>
> length == 2 // grab [0] grab [1]
> length == 1 // grab [0] (no second part, as in "bla-")
> length > 2 // do something else

One thing you could do is strip any leading or trailing hyphens:


assert("-bla-".chomp("-").chompPrefix("-").split('-').length == 1);

Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward.

-Steve

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation