| Thread overview | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
June 10, 2014 Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
I'm missing a version of splitter that can be used to split ranges based on arbitrary predicates. I need to for conversion between different symbol casings, typically: 1. someName => SomeName In this case the lambda should take two arguments (a,b) where in 1. a should be lowercase and b should be uppercase Have i missed a reusable component that can be applied here? | ||||
June 10, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | > 1. someName => SomeName
My example is dumb and incorrect.
I actually want this to do the following
2. "_someGreatVariableName" => "Some Great Varible Name"
| |||
June 10, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Tuesday, 10 June 2014 at 21:11:17 UTC, Nordlöw wrote:
>> 1. someName => SomeName
>
> My example is dumb and incorrect.
>
> I actually want this to do the following
>
> 2. "_someGreatVariableName" => "Some Great Varible Name"
The current splitter works on the notion of splitter "tokens", eg, it splits when it find an element or range that corresponds to the passed value/pred.
What exactly are you requesting though?
- Split on the "edge" lowercase to uppercase?
- Split on uppercase but keep the uppercase element?
Either way, your example should look like this:
"SomeGreatVariableName" => ["Some", "Great", "Variable", "Name"]
Since you are splitting up your range into subranges.
And also either way, AFAIK, yeah, we don't have any splitter that does that. We are also missing the version that takes both a range an predicate, which would allow things like:
"This Cool thing is COOL!!!".split((a, b)=>a == b.toLower())("cool")
=>
["This ", " thing is ", "!!!"]
Looks like you'll have to roll your own :/
| |||
June 10, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On Tuesday, 10 June 2014 at 21:26:50 UTC, monarch_dodr
> What exactly are you requesting though?
> - Split on the "edge" lowercase to uppercase?
> - Split on uppercase but keep the uppercase element?
Thinking about this more: Do you *actually* have two different predicates, or are they mutually exclusive? EG: predicate "isUpper", and split on a false->true?
Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your requirements could even make it simpler, since you don't need to "skip" the splitter element (which, for strings, can actually be tricky business...).
You'd just have to use 2 calls to find though, for "both your predicates" or for "not predicate then predicate".
| |||
June 10, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | > Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your
What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
Thx.
| |||
June 11, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On Tuesday, 10 June 2014 at 22:31:37 UTC, Nordlöw wrote: >> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your > > What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? > > Thx. I meant mostly copy pasting it, and modifying it to your needs. For example, I adapted it into this. For simplicity, I stripped infinite and forward only range support. The only functions I actually modified were "findTerminator", to actually find according to what I want, and popFront. //---- auto slicer(alias isTerminator, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(unaryFun!isTerminator(input.front)))) { return SlicerResult!(unaryFun!isTerminator, Range)(input); } private struct SlicerResult(alias isTerminator, Range) { alias notTerminator = not!isTerminator; private Range _input; private size_t _end = 0; private void findTerminator() { auto r = _input.save.find!(not!isTerminator).find!isTerminator(); _end = _input.length - r.length; } this(Range input) { _input = input; if (!_input.empty) findTerminator(); else _end = size_t.max; } static if (isInfinite!Range) enum bool empty = false; // Propagate infiniteness. else @property bool empty() { return _end == size_t.max; } @property auto front() { return _input[0 .. _end]; } void popFront() { _input = _input[_end .. _input.length]; if (_input.empty) { _end = size_t.max; return; } findTerminator(); } @property typeof(this) save() { auto ret = this; ret._input = _input.save; return ret; } } //---- This will split on before the first element where pred is true, provided there are previous elements where pred is false: //---- void main() { "SomeGreatVariableName" .slicer!isUpper.writeln(); "someGGGreatVariableName".slicer!isUpper.writeln(); "".slicer!isUpper.writeln(); "a".slicer!isUpper.writeln(); "A".slicer!isUpper.writeln(); } //---- ["Some", "Great", "Variable", "Name"] ["some", "GGGreat", "Variable", "Name"] [] ["a"] ["A"] //---- This may or may not be what you wanted, depending on how you want to split "GGGreat". If you wanted it to simply split *ON* the left of every capital letter, then you can modify the the find terminator into: private void findTerminator() { auto r = _input.save.dropOne.find!isTerminator; _end = _input.length - r.length; } And you'd get: ["some", "G", "G", "Great", "Variable", "Name"] ******************************************* ******************************************* ******************************************* In any case, yeah, it shouldn't be too hard to shape it into what you want. A more involved solution to this problem could be to simply pass a "searchFun" predicate, in which case you'd be able to split not just according to any "unitary predicate", but according to an entire "range search strategy: //---- auto slicer(alias searchFun, Range)(Range input) if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range) && is(typeof(searchFun(input)))) { return SlicerResult!(searchFun, Range)(input); } private struct SlicerResult(alias searchFun, Range) { private Range _input; private size_t _end = 0; private void findTerminator() { auto r = searchFun(_input.save); _end = _input.length - r.length; } ... //---- And then: "SomeGGGGreatVariableName".slicer!((s)=>s.find!isLower.find!isUpper).writeln(); "someGGGreatVariableName" .slicer!((s)=>s.dropOne.find!isUpper).writeln(); ["Some", "GGGGreat", "Variable", "Name"] ["some", "G", "G", "Great", "Variable", "Name"] Just ideas. | |||
June 11, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Nordlöw | On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote: >> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your > > What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? Starting from scratch is actually not a bad idea, at least for this kind of trivial functionality. A working version can be written in less time than copying, analyzing and modifying another implementation... For example (using monarch_dodra's test inputs): auto slicer(alias PRED, R)(R r) { import std.algorithm, std.array, std.range; struct Slicer { R r; size_t c; bool empty() @property const { return r.empty; } auto front() @property { c = r.dropExactly(1).countUntil!PRED()+1; if (c==0) c = r.length; return r.takeExactly(c); } void popFront() { r.popFrontN(c); } } return Slicer(r); } auto slicer(R)(R r) { import std.uni; return slicer!(a=>isUpper(a))(r); } void main() { import std.stdio, std.range; "SomeGreatVariableName" .slicer().writeln(); "someGGGreatVariableName".slicer().join(" ").writeln(); "".slicer().writeln(); "a".slicer().writeln(); "A".slicer().writeln(); } artur | |||
June 11, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to Artur Skawina | On Wednesday, 11 June 2014 at 11:42:42 UTC, Artur Skawina via Digitalmars-d-learn wrote: > On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote: >>> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your >> >> What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch? > > Starting from scratch is actually not a bad idea, at least for this kind > of trivial functionality. A working version can be written in less time > than copying, analyzing and modifying another implementation... > > ... > > artur I don't know about "starting from scratch" entirely. Maybe not copy paste, but it always helps to have a reference implementation. For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG: string s = "someGGGreatVariableName".slicer().front; Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to string That's why the splitter code uses the more verbose "r.length - r.find!pred". | |||
June 11, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
Posted in reply to monarch_dodra | On 06/11/14 14:40, monarch_dodra via Digitalmars-d-learn wrote: > For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG: > string s = "someGGGreatVariableName".slicer().front; > Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to string That's a problem with D's string handling. For both examples and rarely called code, i always prefer correctness over performance. Of course if that code ends up being perf significant it could be further optimized by adding a specialization for D strings... > That's why the splitter code uses the more verbose "r.length - r.find!pred". ... but that's the wrong approach. You end up writing a string-specific (or -aware) special version for practically every algorithm... If, instead, you create a string-specific 'countUntil' that returns a type that holds both the byte and code-point counts and implicitly converts to the latter, then you can have a 'takeExactly' overload that uses the extra info to avoid the unnecessary decoding and is able to directly return slices. The overhead is minimal; one extra integer that's passed around, and often completely eliminated by function inlining. But now you don't need to write a string-specific version of every algorithm. (Of course this description is slightly oversimplified) There is a reason why I never use D's std lib. artur | |||
June 11, 2014 Re: Splitting Ranges using Lambda Predicates | ||||
|---|---|---|---|---|
| ||||
On 06/11/14 15:44, Artur Skawina wrote:
> If, instead, you create a string-specific 'countUntil' that returns
> a type that holds both the byte and code-point counts and implicitly
> converts to the latter, then you can have a 'takeExactly' overload
> that uses the extra info to avoid the unnecessary decoding and is able
> to directly return slices. The overhead is minimal; one extra integer
> that's passed around, and often completely eliminated by function inlining.
> But now you don't need to write a string-specific version of every
> algorithm. (Of course this description is slightly oversimplified)
Well, for safety, you'd most likely want to pass around the range too (ie string slice, or just the pointer as the min-length is implied); so it's two size_ts, not one. Still, the overhead is tiny and the advantage of not having to special-case for strings everywhere is worth it.
artur
| ||||
Copyright © 1999-2021 by the D Language Foundation
Permalink
Reply