Thread overview | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 09, 2014 splitter for strings | ||||
---|---|---|---|---|
| ||||
Say I wanna split a string that contains hyphens. If I use std.algorithm.splitter I end up with empty elements for each hyphen, e.g.: auto word = "bla-bla"; auto parts = appender!(string[]); w.splitter('-').copy(parts); // parts.data.length == 3 ["bla", "", "bla"] This is not ideal for my purposes, so I filter like so: auto parts = appender!(string[]); foreach (p; word.splitter('-')) { if (p != "") { parts ~= p; } } or even better like so: w.splitter('-').filter!(a => a != "").copy(parts); I wonder, however, whether this is ideal or whether regex's split would be a better match (pardon the pun!). I try to avoid regex when ever possible since they are more awkward to use and usually more expensive. |
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | Chris:
> auto word = "bla-bla";
> auto parts = appender!(string[]);
> w.splitter('-').copy(parts);
> // parts.data.length == 3 ["bla", "", "bla"]
With the current dmd 2.066alpha this code:
void main() {
import std.stdio, std.string, std.algorithm;
const txt = "bla-bla";
txt.split("-").writeln;
txt.splitter("-").writeln;
txt.splitter('-').writeln;
}
Prints:
["bla", "bla"]
["bla", "bla"]
["bla", "bla"]
Bye,
bearophile
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to bearophile | On Monday, 9 June 2014 at 10:14:40 UTC, bearophile wrote:
> Chris:
>
>> auto word = "bla-bla";
>> auto parts = appender!(string[]);
>> w.splitter('-').copy(parts);
>> // parts.data.length == 3 ["bla", "", "bla"]
>
> With the current dmd 2.066alpha this code:
>
> void main() {
> import std.stdio, std.string, std.algorithm;
> const txt = "bla-bla";
> txt.split("-").writeln;
> txt.splitter("-").writeln;
> txt.splitter('-').writeln;
> }
>
> Prints:
>
> ["bla", "bla"]
> ["bla", "bla"]
> ["bla", "bla"]
>
> Bye,
> bearophile
Ok, thanks. I'll keep that in mind for the next version.
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>
> Ok, thanks. I'll keep that in mind for the next version.
Seems to me to also work with 2.065 and 2.064.
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>>
>> Ok, thanks. I'll keep that in mind for the next version.
>
> Seems to me to also work with 2.065 and 2.064.
From the library reference:
assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ]));
and
"If a range with one separator is given, the result is a range with two empty elements."
My problem was that if I have input like
auto word = "bla-";
it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
length == 2 // grab [0] grab [1]
length == 1 // grab [0] (no second part, as in "bla-")
length > 2 // do something else
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote: > From the library reference: > > assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ])); > > and > > "If a range with one separator is given, the result is a range with two empty elements." > > My problem was that if I have input like > > auto word = "bla-"; > > it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e. > > length == 2 // grab [0] grab [1] > length == 1 // grab [0] (no second part, as in "bla-") > length > 2 // do something else You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want: put(parts, w.splitter('-').filter!(a=>!a.empty)()); The rational for this behavior, is that it preserves the "total amount of information" from your input. EG: assert(equal(myString.spliter(sep).join(sep), myString)); If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were. |
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>> From the library reference:
>>
>> assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ]));
>>
>> and
>>
>> "If a range with one separator is given, the result is a range with two empty elements."
>>
>> My problem was that if I have input like
>>
>> auto word = "bla-";
>>
>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>
>> length == 2 // grab [0] grab [1]
>> length == 1 // grab [0] (no second part, as in "bla-")
>> length > 2 // do something else
>
> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>
> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>
> assert(equal(myString.spliter(sep).join(sep), myString));
>
> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Monday, 9 June 2014 at 11:40:24 UTC, Chris wrote:
> On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
>> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>>> From the library reference:
>>>
>>> assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ]));
>>>
>>> and
>>>
>>> "If a range with one separator is given, the result is a range with two empty elements."
>>>
>>> My problem was that if I have input like
>>>
>>> auto word = "bla-";
>>>
>>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>>
>>> length == 2 // grab [0] grab [1]
>>> length == 1 // grab [0] (no second part, as in "bla-")
>>> length > 2 // do something else
>>
>> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
>> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>>
>> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>>
>> assert(equal(myString.spliter(sep).join(sep), myString));
>>
>> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
>
> I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.
Arguably, none, since someone has to do the check anyways. If it's not done "outside" of splitter, it has to be done inside...
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Monday, 9 June 2014 at 12:16:30 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 11:40:24 UTC, Chris wrote:
>> On Monday, 9 June 2014 at 11:16:18 UTC, monarch_dodra wrote:
>>> On Monday, 9 June 2014 at 11:04:12 UTC, Chris wrote:
>>>> From the library reference:
>>>>
>>>> assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ]));
>>>>
>>>> and
>>>>
>>>> "If a range with one separator is given, the result is a range with two empty elements."
>>>>
>>>> My problem was that if I have input like
>>>>
>>>> auto word = "bla-";
>>>>
>>>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>>>
>>>> length == 2 // grab [0] grab [1]
>>>> length == 1 // grab [0] (no second part, as in "bla-")
>>>> length > 2 // do something else
>>>
>>> You can just pipe in an extra "filter!(a=>!a.empty)", and it'll do what you want:
>>> put(parts, w.splitter('-').filter!(a=>!a.empty)());
>>>
>>> The rational for this behavior, is that it preserves the "total amount of information" from your input. EG:
>>>
>>> assert(equal(myString.spliter(sep).join(sep), myString));
>>>
>>> If the empty tokens were all stripped out, that wouldn't work, you'd have lost information about how many separators there actually were, and where they were.
>>
>> I see, I've already popped in a filter. I only wonder how much of a performance loss that is. Probably negligible.
>
> Arguably, none, since someone has to do the check anyways. If it's not done "outside" of splitter, it has to be done inside...
Yes, of course. I just thought if it's done in the library function, the optimization might be better than when it is done in my code. (filter!() is arguably also in the library :)
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Mon, 09 Jun 2014 07:04:11 -0400, Chris <wendlec@tcd.ie> wrote: > On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote: >> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote: >>> >>> Ok, thanks. I'll keep that in mind for the next version. >> >> Seems to me to also work with 2.065 and 2.064. > > From the library reference: > > assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ])); Note the 2 spaces between hello and world > and > > "If a range with one separator is given, the result is a range with two empty elements." Right, it allows you to distinguish cases where the range starts or ends with the separator. > My problem was that if I have input like > > auto word = "bla-"; > > it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e. > > length == 2 // grab [0] grab [1] > length == 1 // grab [0] (no second part, as in "bla-") > length > 2 // do something else One thing you could do is strip any leading or trailing hyphens: assert("-bla-".chomp("-").chompPrefix("-").split('-').length == 1); Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward. -Steve |
Copyright © 1999-2021 by the D Language Foundation