June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Monday, 9 June 2014 at 14:21:21 UTC, Steven Schveighoffer wrote:
> On Mon, 09 Jun 2014 07:04:11 -0400, Chris <wendlec@tcd.ie> wrote:
>
>> On Monday, 9 June 2014 at 10:54:09 UTC, monarch_dodra wrote:
>>> On Monday, 9 June 2014 at 10:23:16 UTC, Chris wrote:
>>>>
>>>> Ok, thanks. I'll keep that in mind for the next version.
>>>
>>> Seems to me to also work with 2.065 and 2.064.
>>
>> From the library reference:
>>
>> assert(equal(splitter("hello world", ' '), [ "hello", "", "world" ]));
>
> Note the 2 spaces between hello and world
>
>> and
>>
>> "If a range with one separator is given, the result is a range with two empty elements."
>
> Right, it allows you to distinguish cases where the range starts or ends with the separator.
>
>> My problem was that if I have input like
>>
>> auto word = "bla-";
>>
>> it will return parts.data.length == 2, so I would have to check parts.data[1] != "". This is too awkward. I just want the parts of the word, i.e.
>>
>> length == 2 // grab [0] grab [1]
>> length == 1 // grab [0] (no second part, as in "bla-")
>> length > 2 // do something else
>
> One thing you could do is strip any leading or trailing hyphens:
>
>
> assert("-bla-".chomp("-").chompPrefix("-").split('-').length == 1);
>
> Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward.
>
> -Steve
Atm, I have
auto parts = appender!(string[]);
w.splitter('-').filter!(a => !a.empty).copy(parts);
Which looks more elegant and gives me what I want. IMO, the module that handles the splitting of hyphenated words should be able to deal with cases like "blah-" without the input being prepared in a certain way. Now I have:
if (parts.data.length == 1) {
// false alarm. Trailing hyphen
}
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Mon, 09 Jun 2014 10:39:39 -0400, Chris <wendlec@tcd.ie> wrote:
> Atm, I have
>
> auto parts = appender!(string[]);
> w.splitter('-').filter!(a => !a.empty).copy(parts);
>
> Which looks more elegant and gives me what I want. IMO, the module that handles the splitting of hyphenated words should be able to deal with cases like "blah-" without the input being prepared in a certain way.
It's not mishandled. It's handled exactly as I would have expected. If "blah-" and "blah" result in the same thing, then how do you know the difference?
Stripping any possible leading or trailing hyphens is much more efficient than checking every single word to see if it's empty.
However, if you have an instance of "--", your solution will remove the extra empty string, whereas mine does not. Not sure if that's important.
-Steve
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Monday, 9 June 2014 at 14:47:45 UTC, Steven Schveighoffer wrote:
> On Mon, 09 Jun 2014 10:39:39 -0400, Chris <wendlec@tcd.ie> wrote:
>
>> Atm, I have
>>
>> auto parts = appender!(string[]);
>> w.splitter('-').filter!(a => !a.empty).copy(parts);
>>
>> Which looks more elegant and gives me what I want. IMO, the module that handles the splitting of hyphenated words should be able to deal with cases like "blah-" without the input being prepared in a certain way.
>
> It's not mishandled. It's handled exactly as I would have expected. If "blah-" and "blah" result in the same thing, then how do you know the difference?
>
> Stripping any possible leading or trailing hyphens is much more efficient than checking every single word to see if it's empty.
>
> However, if you have an instance of "--", your solution will remove the extra empty string, whereas mine does not. Not sure if that's important.
>
> -Steve
It is important. "blah--" should come out as "blah". The logic is along the following lines:
if (canFind(w, "-")) {
auto parts = appender!(string[]);
w.splitter('-').filter!(a => !a.empty).copy(parts);
if (parts.data.length == 1) {
// false alarm. Trailing hyphen
}
}
The more common case is that it's not a trailing hyphen. std.string.strip() only works for whitespaces. Would be nice to have something like that for random characters. strip(s, '-') or strip(s, ['-', '+', '@'])
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Monday, 9 June 2014 at 14:21:21 UTC, Steven Schveighoffer wrote:
> Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward.
>
> -Steve
It's in algorithm, because it's more generic than just strings.
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Chris | On Monday, 9 June 2014 at 15:19:05 UTC, Chris wrote: > On Monday, 9 June 2014 at 14:47:45 UTC, Steven Schveighoffer wrote: >> On Mon, 09 Jun 2014 10:39:39 -0400, Chris <wendlec@tcd.ie> wrote: >> >>> Atm, I have >>> >>> auto parts = appender!(string[]); >>> w.splitter('-').filter!(a => !a.empty).copy(parts); >>> >>> Which looks more elegant and gives me what I want. IMO, the module that handles the splitting of hyphenated words should be able to deal with cases like "blah-" without the input being prepared in a certain way. >> >> It's not mishandled. It's handled exactly as I would have expected. If "blah-" and "blah" result in the same thing, then how do you know the difference? >> >> Stripping any possible leading or trailing hyphens is much more efficient than checking every single word to see if it's empty. >> >> However, if you have an instance of "--", your solution will remove the extra empty string, whereas mine does not. Not sure if that's important. >> >> -Steve > > It is important. "blah--" should come out as "blah". The logic is along the following lines: > > if (canFind(w, "-")) { > auto parts = appender!(string[]); > w.splitter('-').filter!(a => !a.empty).copy(parts); > if (parts.data.length == 1) { > // false alarm. Trailing hyphen > } > } > > The more common case is that it's not a trailing hyphen. std.string.strip() only works for whitespaces. Would be nice to have something like that for random characters. strip(s, '-') or strip(s, ['-', '+', '@']) http://dlang.org/phobos/std_algorithm.html#strip w = w.strip('-'); if (canFind(w, "-")) { ... |
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Mon, 09 Jun 2014 11:49:29 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:
> On Monday, 9 June 2014 at 14:21:21 UTC, Steven Schveighoffer wrote:
>> Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward.
>>
>> -Steve
>
> It's in algorithm, because it's more generic than just strings.
Ugh.. This makes things difficult. If I want to work with strings, I import std.string.
I understand that the algorithm is applicable to all types, but this makes for some awkward coding. What if you wanted to use both? Surely we can come up with a better solution than this.
-Steve
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Monday, 9 June 2014 at 15:54:29 UTC, Steven Schveighoffer wrote:
> On Mon, 09 Jun 2014 11:49:29 -0400, monarch_dodra <monarchdodra@gmail.com> wrote:
>
>> On Monday, 9 June 2014 at 14:21:21 UTC, Steven Schveighoffer wrote:
>>> Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward.
>>>
>>> -Steve
>>
>> It's in algorithm, because it's more generic than just strings.
>
> Ugh.. This makes things difficult. If I want to work with strings, I import std.string.
>
> I understand that the algorithm is applicable to all types, but this makes for some awkward coding. What if you wanted to use both? Surely we can come up with a better solution than this.
>
> -Steve
There's 2 different issues: The first, is that "split(string)" was pre-existing in std.string, and *then* split was introduced in algorithm. Where ideally (?) everything would have been placed in the same module, we true to avoid moving things around now.
The second thing is that "split" without any predicate/item can only make sense for strings, but not for generic ranges.
For what it's worth, I find it makes sense.
|
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Mon, 09 Jun 2014 12:06:13 -0400, monarch_dodra <monarchdodra@gmail.com> wrote: > On Monday, 9 June 2014 at 15:54:29 UTC, Steven Schveighoffer wrote: >> On Mon, 09 Jun 2014 11:49:29 -0400, monarch_dodra <monarchdodra@gmail.com> wrote: >> >>> On Monday, 9 June 2014 at 14:21:21 UTC, Steven Schveighoffer wrote: >>>> Just looked at std.string for a strip function that allows custom character strippage, but apparently not there. The above is quite awkward. >>>> >>>> -Steve >>> >>> It's in algorithm, because it's more generic than just strings. >> >> Ugh.. This makes things difficult. If I want to work with strings, I import std.string. >> >> I understand that the algorithm is applicable to all types, but this makes for some awkward coding. What if you wanted to use both? Surely we can come up with a better solution than this. >> >> -Steve > I think we are confusing things here, I was talking about strip :) > There's 2 different issues: The first, is that "split(string)" was pre-existing in std.string, and *then* split was introduced in algorithm. Where ideally (?) everything would have been placed in the same module, we true to avoid moving things around now. > > The second thing is that "split" without any predicate/item can only make sense for strings, but not for generic ranges. > > For what it's worth, I find it makes sense. Well, I suppose it should probably work if you try both strip and strip('-')... and indeed it does. It is not as bad as I thought (I thought they would conflict). It still leaves me a bit uneasy that std.string does not provide everything you would need to work with strings. But we don't want std.string importing std.algorithm, or at least we don't want it importing ALL of std.algorithm. If we could split up std.algorithm into individual modules, that would probably help. -Steve |
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to Steven Schveighoffer | On Monday, 9 June 2014 at 17:57:24 UTC, Steven Schveighoffer wrote: > I think we are confusing things here, I was talking about strip :) strip and split are actually both pretty much in the same boat actually in regards to that, so just 's/split/strip/g', and the same answer will apply. "split" (and "splitter") actually have it a bit more complicated, because historically, if you imported both string and algorithm, then "split(myString)" will create an ambiguous call. The issue is that you can't do selective imports when you already have a local object with the same name, so algorithm had: ---- auto split(String)(String myString) { return std.string.split(myString); } ---- rather than ---- public import std.string : split; ---- I tried to "fix" the issue by removing "split(String)" from algorithm, but that created some breakage. So Andrei just came down and put *everything* in algorithm, and added an "public import std.algorithm : split" in std.string. This works, but it does mean that: 1. string unconditionally pulls algorithm. 2. You can do things like: std.string.split([1, 2, 3], 2); IMO, the "strip" solution is better :/ > If we could split up std.algorithm into individual modules, that would probably help. > > -Steve Yes. |
June 09, 2014 Re: splitter for strings | ||||
---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Monday, 9 June 2014 at 15:52:24 UTC, monarch_dodra wrote:
> On Monday, 9 June 2014 at 15:19:05 UTC, Chris wrote:
>> On Monday, 9 June 2014 at 14:47:45 UTC, Steven Schveighoffer wrote:
>>> On Mon, 09 Jun 2014 10:39:39 -0400, Chris <wendlec@tcd.ie> wrote:
>>>
>>>> Atm, I have
>>>>
>>>> auto parts = appender!(string[]);
>>>> w.splitter('-').filter!(a => !a.empty).copy(parts);
>>>>
>>>> Which looks more elegant and gives me what I want. IMO, the module that handles the splitting of hyphenated words should be able to deal with cases like "blah-" without the input being prepared in a certain way.
>>>
>>> It's not mishandled. It's handled exactly as I would have expected. If "blah-" and "blah" result in the same thing, then how do you know the difference?
>>>
>>> Stripping any possible leading or trailing hyphens is much more efficient than checking every single word to see if it's empty.
>>>
>>> However, if you have an instance of "--", your solution will remove the extra empty string, whereas mine does not. Not sure if that's important.
>>>
>>> -Steve
>>
>> It is important. "blah--" should come out as "blah". The logic is along the following lines:
>>
>> if (canFind(w, "-")) {
>> auto parts = appender!(string[]);
>> w.splitter('-').filter!(a => !a.empty).copy(parts);
>> if (parts.data.length == 1) {
>> // false alarm. Trailing hyphen
>> }
>> }
>>
>> The more common case is that it's not a trailing hyphen. std.string.strip() only works for whitespaces. Would be nice to have something like that for random characters. strip(s, '-') or strip(s, ['-', '+', '@'])
>
> http://dlang.org/phobos/std_algorithm.html#strip
>
> w = w.strip('-');
> if (canFind(w, "-")) {
> ...
Uh, I see, I misread the signature of std.string.strip(). So that's one option now, to strip all trailing hyphens with std.string.strip(). Well, I'll give it a shot tomorrow.
|
Copyright © 1999-2021 by the D Language Foundation