Jump to page: 1 2 3
Thread overview
splitter for strings
Jun 09, 2014
Chris
Jun 09, 2014
bearophile
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
Jun 09, 2014
Chris
Jun 09, 2014
Chris
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
Jun 09, 2014
monarch_dodra
Jun 09, 2014
monarch_dodra
Jun 09, 2014
Chris
Jun 09, 2014
monarch_dodra
June 09, 2014
Say I wanna split a string that contains hyphens. If I use std.algorithm.splitter I end up with empty elements for each hyphen, e.g.:

auto word = "bla-bla";
auto parts = appender!(string[]);
w.splitter('-').copy(parts);
// parts.data.length == 3 ["bla", "", "bla"]

This is not ideal for my purposes, so I filter like so:

auto parts = appender!(string[]);
foreach (p; word.splitter('-')) {
  if (p != "") {
    parts ~= p;
  }
}

or even better like so:

w.splitter('-').filter!(a => a != "").copy(parts);

I wonder, however, whether this is ideal or whether regex's split would be a better match (pardon the pun!). I try to avoid regex when ever possible since they are more awkward to use and usually more expensive.
June 09, 2014
Chris:

> auto word = "bla-bla";
> auto parts = appender!(string[]);
> w.splitter('-').copy(parts);
> // parts.data.length == 3 ["bla", "", "bla"]

With the current dmd 2.066alpha this code:

void main() {
    import std.stdio, std.string, std.algorithm;
    const txt = "bla-bla";
    txt.split("-").writeln;
    txt.splitter("-").writeln;
    txt.splitter('-').writeln;
}

Prints:

["bla", "bla"]
["bla", "bla"]
["bla", "bla"]

Bye,
bearophile
June 09, 2014
On Monday, 9 June 2014 at 10:14:40 UTC, bearophile wrote:
> Chris:
>
>> auto word = "bla-bla";
>> auto parts = appender!(string[]);
>> w.splitter('-').copy(parts);
>> // parts.data.length == 3 ["bla", "", "bla"]
>
> With the current dmd 2.066alpha this code:
>
> void main() {
>     import std.stdio, std.string, std.algorithm;
>     const txt = "bla-bla";
>     txt.split("-").writeln;
>     txt.splitter("-").writeln;
>     txt.splitter('-').writeln;
> }
>
> Prints:
>
> ["bla", "bla"]
> ["bla", "bla"]
> ["bla", "bla"]
>
> Bye,
> bearophile

Ok, thanks. I'll keep that in mind for the next version.
June 09, 2014
On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>
> Ok, thanks. I'll keep that in mind for the next version.

Seems to me to also work with 2.065 and 2.064.
June 09, 2014
On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>>
>> Ok, thanks. I'll keep that in mind for the next version.
>
> Seems to me to also work with 2.065 and 2.064.

From the library reference:

assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));

and

"If a range with one separator is given, the result is a range with two empty elements."

My problem was that if I have input like

auto word = "bla-";

it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.

length == 2 // grab [0] grab [1]
length == 1 // grab [0] (no second part, as in "bla-")
length > 2 // do something else
June 09, 2014
On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
> From the library reference:
>
> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>
> and
>
> "If a range with one separator is given, the result is a range with two empty elements."
>
> My problem was that if I have input like
>
> auto word = "bla-";
>
> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>
> length == 2 // grab [0] grab [1]
> length == 1 // grab [0] (no second part, as in "bla-")
> length > 2 // do something else

You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
put(parts, w.splitter('-').filter!(a=>!a.empty)());

The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:

assert(equal(myString.spliter(sep).join(sep), myString));

If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
June 09, 2014
On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>> From the library reference:
>>
>> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>>
>> and
>>
>> "If a range with one separator is given, the result is a range with two empty elements."
>>
>> My problem was that if I have input like
>>
>> auto word = "bla-";
>>
>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>
>> length == 2 // grab [0] grab [1]
>> length == 1 // grab [0] (no second part, as in "bla-")
>> length > 2 // do something else
>
> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>
> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>
> assert(equal(myString.spliter(sep).join(sep), myString));
>
> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.

I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.
June 09, 2014
On Monday, 9 June 2014 at 11:40:24 UTC, Chris wrote:
> On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
>> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>>> From the library reference:
>>>
>>> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>>>
>>> and
>>>
>>> "If a range with one separator is given, the result is a range with two empty elements."
>>>
>>> My problem was that if I have input like
>>>
>>> auto word = "bla-";
>>>
>>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>>
>>> length == 2 // grab [0] grab [1]
>>> length == 1 // grab [0] (no second part, as in "bla-")
>>> length > 2 // do something else
>>
>> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
>> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>>
>> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>>
>> assert(equal(myString.spliter(sep).join(sep), myString));
>>
>> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
>
> I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.

Arguably, none, since someone has to do the check anyways. If it's not done "outside" of splitter, it has to be done inside...
June 09, 2014
On Monday, 9 June 2014 at 12:16:30 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 11:40:24 UTC, Chris wrote:
>> On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
>>> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>>>> From the library reference:
>>>>
>>>> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));
>>>>
>>>> and
>>>>
>>>> "If a range with one separator is given, the result is a range with two empty elements."
>>>>
>>>> My problem was that if I have input like
>>>>
>>>> auto word = "bla-";
>>>>
>>>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>>>
>>>> length == 2 // grab [0] grab [1]
>>>> length == 1 // grab [0] (no second part, as in "bla-")
>>>> length > 2 // do something else
>>>
>>> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
>>> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>>>
>>> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>>>
>>> assert(equal(myString.spliter(sep).join(sep), myString));
>>>
>>> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
>>
>> I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.
>
> Arguably, none, since someone has to do the check anyways. If it's not done "outside" of splitter, it has to be done inside...

Yes, of course. I just thought if it's done in the library function, the optimization might be better than when it is done in my code. (filter!() is arguably also in the library :)
June 09, 2014
On Mon, 09 Jun 2014 07:04:11 -0400, Chris <wendlec@tcd.ie> wrote:

> On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote:
>> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>>>
>>> Ok, thanks. I'll keep that in mind for the next version.
>>
>> Seems to me to also work with 2.065 and 2.064.
>
>  From the library reference:
>
> assert(equal(splitter("hello  world", ' '), [ "hello", "", "world" ]));

Note the 2 spaces between hello and world

> and
>
> "If a range with one separator is given, the result is a range with two empty elements."

Right, it allows you to distinguish cases where the range starts or ends with the separator.

> My problem was that if I have input like
>
> auto word = "bla-";
>
> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>
> length == 2 // grab [0] grab [1]
> length == 1 // grab [0] (no second part, as in "bla-")
> length > 2 // do something else

One thing you could do is strip any leading or trailing hyphens:


assert("-bla-".chomp("-").chompPrefix("-").split('-').length == 1);

Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward.

-Steve
« First   ‹ Prev
1 2 3