Jump to page: 1 2
Thread overview
Splitting Ranges using Lambda Predicates
Jun 10, 2014
Nordlöw
Jun 10, 2014
Nordlöw
Jun 10, 2014
monarch_dodra
Jun 10, 2014
monarch_dodra
Jun 10, 2014
Nordlöw
Jun 11, 2014
monarch_dodra
Oct 09, 2014
Nordlöw
Oct 09, 2014
monarch_dodra
Oct 10, 2014
Nordlöw
Oct 10, 2014
monarch_dodra
Jun 11, 2014
Artur Skawina
Jun 11, 2014
monarch_dodra
Jun 11, 2014
Artur Skawina
Jun 11, 2014
monarch_dodra
Jun 11, 2014
Artur Skawina
Jun 11, 2014
Artur Skawina
June 10, 2014
I'm missing a version of splitter that can be used to split ranges based on arbitrary predicates. I need to for conversion between different symbol casings, typically:

1. someName => SomeName

In this case the lambda should take two arguments (a,b)

where in

1. a should be lowercase and b should be uppercase

Have i missed a reusable component that can be applied here?
June 10, 2014
> 1. someName => SomeName

My example is dumb and incorrect.

I actually want this to do the following

2. "_someGreatVariableName" => "Some Great Varible Name"
June 10, 2014
On Tuesday, 10 June 2014 at 21:11:17 UTC, Nordlöw wrote:
>> 1. someName => SomeName
>
> My example is dumb and incorrect.
>
> I actually want this to do the following
>
> 2. "_someGreatVariableName" => "Some Great Varible Name"

The current splitter works on the notion of splitter "tokens", eg, it splits when it find an element or range that corresponds to the passed value/pred.

What exactly are you requesting though?
- Split on the "edge" lowercase to uppercase?
- Split on uppercase but keep the uppercase element?

Either way, your example should look like this:

"SomeGreatVariableName" => ["Some", "Great", "Variable", "Name"]

Since you are splitting up your range into subranges.

And also either way, AFAIK, yeah, we don't have any splitter that does that. We are also missing the version that takes both a range an predicate, which would allow things like:

"This Cool thing is COOL!!!".split((a, b)=>a == b.toLower())("cool")
  =>
["This ", " thing is ", "!!!"]

Looks like you'll have to roll your own :/
June 10, 2014
On Tuesday, 10 June 2014 at 21:26:50 UTC, monarch_dodr
> What exactly are you requesting though?
> - Split on the "edge" lowercase to uppercase?
> - Split on uppercase but keep the uppercase element?

Thinking about this more: Do you *actually* have two different predicates, or are they mutually exclusive? EG: predicate "isUpper", and split on a false->true?

Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your requirements could even make it simpler, since you don't need to "skip" the splitter element (which, for strings, can actually be tricky business...).

You'd just have to use 2 calls to find though, for "both your predicates" or for "not predicate then predicate".
June 10, 2014
> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your

What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?

Thx.
June 11, 2014
On Tuesday, 10 June 2014 at 22:31:37 UTC, Nordlöw wrote:
>> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your
>
> What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
>
> Thx.

I meant mostly copy pasting it, and modifying it to your needs. For example, I adapted it into this. For simplicity, I stripped infinite and forward only range support. The only functions I actually modified were "findTerminator", to actually find according to what I want, and popFront.

//----
auto slicer(alias isTerminator, Range)(Range input)
if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range)
    && is(typeof(unaryFun!isTerminator(input.front))))
{
    return SlicerResult!(unaryFun!isTerminator, Range)(input);
}

private struct SlicerResult(alias isTerminator, Range)
{
    alias notTerminator = not!isTerminator;

    private Range _input;
    private size_t _end = 0;

    private void findTerminator()
    {
        auto r = _input.save.find!(not!isTerminator).find!isTerminator();
        _end = _input.length - r.length;
    }

    this(Range input)
    {
        _input = input;

        if (!_input.empty)
            findTerminator();
        else
            _end = size_t.max;
    }

    static if (isInfinite!Range)
        enum bool empty = false;  // Propagate infiniteness.
    else
        @property bool empty()
        {
            return _end == size_t.max;
        }

    @property auto front()
    {
        return _input[0 .. _end];
    }

    void popFront()
    {
        _input = _input[_end .. _input.length];
        if (_input.empty)
        {
            _end = size_t.max;
            return;
        }
        findTerminator();
    }

    @property typeof(this) save()
    {
        auto ret = this;
        ret._input = _input.save;
        return ret;
    }
}
//----

This will split on before the first element where pred is true, provided there are previous elements where pred is false:

//----
void main()
{
    "SomeGreatVariableName"  .slicer!isUpper.writeln();
    "someGGGreatVariableName".slicer!isUpper.writeln();
    "".slicer!isUpper.writeln();
    "a".slicer!isUpper.writeln();
    "A".slicer!isUpper.writeln();
}
//----
["Some", "Great", "Variable", "Name"]
["some", "GGGreat", "Variable", "Name"]
[]
["a"]
["A"]
//----

This may or may not be what you wanted, depending on how you want to split "GGGreat". If you wanted it to simply split *ON* the left of every capital letter, then you can modify the the find terminator into:


    private void findTerminator()
    {
        auto r = _input.save.dropOne.find!isTerminator;
        _end = _input.length - r.length;
    }

And you'd get:
["some", "G", "G", "Great", "Variable", "Name"]


*******************************************
*******************************************
*******************************************


In any case, yeah, it shouldn't be too hard to shape it into what you want. A more involved solution to this problem could be to simply pass a "searchFun" predicate, in which case you'd be able to split not just according to any "unitary predicate", but according to an entire "range search strategy:

//----
auto slicer(alias searchFun, Range)(Range input)
if (((isRandomAccessRange!Range && hasSlicing!Range) || isSomeString!Range)
    && is(typeof(searchFun(input))))
{
    return SlicerResult!(searchFun, Range)(input);
}

private struct SlicerResult(alias searchFun, Range)
{
    private Range _input;
    private size_t _end = 0;

    private void findTerminator()
    {
        auto r = searchFun(_input.save);
        _end = _input.length - r.length;
    }

    ...
//----

And then:

    "SomeGGGGreatVariableName".slicer!((s)=>s.find!isLower.find!isUpper).writeln();
    "someGGGreatVariableName" .slicer!((s)=>s.dropOne.find!isUpper).writeln();

["Some", "GGGGreat", "Variable", "Name"]
["some", "G", "G", "Great", "Variable", "Name"]

Just ideas.
June 11, 2014
On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:
>> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your
> 
> What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?

Starting from scratch is actually not a bad idea, at least for this kind of trivial functionality. A working version can be written in less time than copying, analyzing and modifying another implementation...

For example (using monarch_dodra's test inputs):

   auto slicer(alias PRED, R)(R r) {
      import std.algorithm, std.array, std.range;
      struct Slicer {
         R r;
         size_t c;
         bool empty() @property const { return r.empty; }
         auto front() @property {
            c = r.dropExactly(1).countUntil!PRED()+1;
            if (c==0)
               c = r.length;
            return r.takeExactly(c);
         }
         void popFront() { r.popFrontN(c); }
      }
      return Slicer(r);
   }

   auto slicer(R)(R r) {
      import std.uni;
      return slicer!(a=>isUpper(a))(r);
   }

   void main() {
      import std.stdio, std.range;
      "SomeGreatVariableName"  .slicer().writeln();
      "someGGGreatVariableName".slicer().join(" ").writeln();
      "".slicer().writeln();
      "a".slicer().writeln();
      "A".slicer().writeln();
   }

artur
June 11, 2014
On Wednesday, 11 June 2014 at 11:42:42 UTC, Artur Skawina via Digitalmars-d-learn wrote:
> On 06/11/14 00:31, "Nordlöw" via Digitalmars-d-learn wrote:
>>> Either way, it shouldn't be too hard to implement. Base it off "splitter!pred", which is actually quite trivial. AFAIK, your
>> 
>> What do you mean by basing it off splitter!pred - should I start with some existing splitter algorithm in Phobos or start from scratch?
>
> Starting from scratch is actually not a bad idea, at least for this kind
> of trivial functionality. A working version can be written in less time
> than copying, analyzing and modifying another implementation...
>
> ...
>
> artur

I don't know about "starting from scratch" entirely. Maybe not copy paste, but it always helps to have a reference implementation.

For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG:
string s = "someGGGreatVariableName".slicer().front;
Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to string

That's why the splitter code uses the more verbose "r.length - r.find!pred".
June 11, 2014
On 06/11/14 14:40, monarch_dodra via Digitalmars-d-learn wrote:
> For example, you should avoid "countUntil" and "takeExactly" when dealing with strings, since these are not O(1) operations, and don't actually return string slices. EG:
> string s = "someGGGreatVariableName".slicer().front;
> Error: cannot implicitly convert expression (slicer("someGGGreatVariableName").front()) of type Result to string

That's a problem with D's string handling. For both examples and rarely called code, i always prefer correctness over performance. Of course if that code ends up being perf significant it could be further optimized by adding a specialization for D strings...

> That's why the splitter code uses the more verbose "r.length - r.find!pred".

... but that's the wrong approach. You end up writing a string-specific (or -aware) special version for practically every algorithm...

If, instead, you create a string-specific 'countUntil' that returns
a type that holds both the byte and code-point counts and implicitly
converts to the latter, then you can have a 'takeExactly' overload
that uses the extra info to avoid the unnecessary decoding and is able
to directly return slices. The overhead is minimal; one extra integer
that's passed around, and often completely eliminated by function inlining.
But now you don't need to write a string-specific version of every
algorithm. (Of course this description is slightly oversimplified)

There is a reason why I never use D's std lib.

artur
June 11, 2014
On 06/11/14 15:44, Artur Skawina wrote:
> If, instead, you create a string-specific 'countUntil' that returns
> a type that holds both the byte and code-point counts and implicitly
> converts to the latter, then you can have a 'takeExactly' overload
> that uses the extra info to avoid the unnecessary decoding and is able
> to directly return slices. The overhead is minimal; one extra integer
> that's passed around, and often completely eliminated by function inlining.
> But now you don't need to write a string-specific version of every
> algorithm. (Of course this description is slightly oversimplified)

Well, for safety, you'd most likely want to pass around the range too (ie string slice, or just the pointer as the min-length is implied); so it's two size_ts, not one. Still, the overhead is tiny and the advantage of not having to special-case for strings everywhere is worth it.

artur

« First   ‹ Prev
1 2