phobos and splitting things... but not with whitespace. - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » phobos and splitting things... but not with whitespace.

Thread overview

phobos and splitting things... but not with whitespace.
Jun 23, 2012 Chad J
Jun 23, 2012 simendsjo
Jun 23, 2012 Chad J
Jun 23, 2012 simendsjo
Jun 23, 2012 Chad J
Jun 23, 2012 simendsjo
Jun 23, 2012 simendsjo
Jun 23, 2012 Chad J
Jun 23, 2012 Chad J
Jun 23, 2012 simendsjo
Jun 23, 2012 Chad J
Jun 23, 2012 simendsjo
Jun 23, 2012 Chad J
Jun 23, 2012 simendsjo
Jun 23, 2012 Chad J
Jun 24, 2012 Roman D. Boiko
Jun 24, 2012 simendsjo
Jun 23, 2012 Chad J

June 23, 2012

phobos and splitting things... but not with whitespace.

Posted by Chad J

Chad J

http://dlang.org/phobos/std_array.html#splitter

The first thing I don't understand is why splitter is in /std.array/ and yet only works on /strings/.  It is defined in terms of whitespace, and I don't understand how whitespace is well-defined for things besides text.  Why wouldn't it be in std.string?

That said, I'd like to split on something that isn't whitespace.  So where's "auto splitter(C)(C[] s, C[] delim)"??  Is there a hole in functionality?

The next thing I want to do is split on whitespace, but only once, and recover the tail.  I want to write this function:

string snip(string text)
{
	string head, tail;
	head = getHead(text, "// -- snip --", tail);
	return tail;
}

I would expect these functions to exist:
auto getHead(C)(C[] s, C[] delim, ref C[] tail);
auto getHead(C)(C[] s, C[] delim);
auto getTail(C)(C[] s, C[] delim);

Maybe even this, though it could be a bit redundant:
auto getTail(C)(C[] s, C[] delim, ref C[] head);

Do these exist in phobos?  Otherwise, is it a hole in the functionality or some kind of intentional design minimalism?

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by simendsjo
in reply to Chad J

simendsjo

Posted in reply to Chad J

On Sat, 23 Jun 2012 17:19:59 +0200, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:
> http://dlang.org/phobos/std_array.html#splitter

> The first thing I don't understand is why splitter is in /std.array/ and yet only works on /strings/.  It is defined > in terms of whitespace, and I don't understand how whitespace is well-defined for things besides text.  Why wouldn't > it be in std.string?

See http://dlang.org/phobos/std_algorithm.html#splitter

> I would expect these functions to exist:
> auto getHead(C)(C[] s, C[] delim, ref C[] tail);
> auto getHead(C)(C[] s, C[] delim);
> auto getTail(C)(C[] s, C[] delim);

As head is simply splitter(..)[0] and tail splitter(...)[1..$], extra functions could be implemented much like this

@property T head(T[] arr) { return arr.front; }
@property T[] tail(T[] arr) { return arr[1..$]; }

..and UFCS takes care of the rest:
auto fields = splitter(...);
auto head = fields.head;
auto tail = fields.tail;

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by Chad J
in reply to Chad J

Chad J

Posted in reply to Chad J

I'm realizing that if I want to remove exactly one line from a string of text and make no assumptions about the type of newline ("\n" or "\r\n" or "\r") and without scanning the rest of the text then I'm not sure how to do this with a single call to phobos functions.  I'd have to use indexOf and do a bunch of twiddling and maybe look ahead a character. It seems unusually complicated for such a simple operation.

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by Chad J
in reply to simendsjo

Chad J

Posted in reply to simendsjo

On 06/23/2012 11:31 AM, simendsjo wrote:
> On Sat, 23 Jun 2012 17:19:59 +0200, Chad J
> <chadjoan@__spam.is.bad__gmail.com> wrote:
>> http://dlang.org/phobos/std_array.html#splitter
>
>> The first thing I don't understand is why splitter is in /std.array/
>> and yet only works on /strings/. It is defined > in terms of
>> whitespace, and I don't understand how whitespace is well-defined for
>> things besides text. Why wouldn't > it be in std.string?
>
> See http://dlang.org/phobos/std_algorithm.html#splitter
>
>> I would expect these functions to exist:
>> auto getHead(C)(C[] s, C[] delim, ref C[] tail);
>> auto getHead(C)(C[] s, C[] delim);
>> auto getTail(C)(C[] s, C[] delim);
>
> As head is simply splitter(..)[0] and tail splitter(...)[1..$], extra
> functions could be implemented much like this
>
> @property T head(T[] arr) { return arr.front; }
> @property T[] tail(T[] arr) { return arr[1..$]; }
>
> ..and UFCS takes care of the rest:
> auto fields = splitter(...);
> auto head = fields.head;
> auto tail = fields.tail;

But I don't want tail as an array.  Assume that arr is HUGE and scanning the rest of it is a bad idea.  join(arr[1..$]) then becomes a slow operation: O(n) when I could have O(1).

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by simendsjo
in reply to Chad J

simendsjo

Posted in reply to Chad J

On Sat, 23 Jun 2012 17:39:55 +0200, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:

> On 06/23/2012 11:31 AM, simendsjo wrote:
>> On Sat, 23 Jun 2012 17:19:59 +0200, Chad J
>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>> http://dlang.org/phobos/std_array.html#splitter
>>
>>> The first thing I don't understand is why splitter is in /std.array/
>>> and yet only works on /strings/. It is defined > in terms of
>>> whitespace, and I don't understand how whitespace is well-defined for
>>> things besides text. Why wouldn't > it be in std.string?
>>
>> See http://dlang.org/phobos/std_algorithm.html#splitter
>>
>>> I would expect these functions to exist:
>>> auto getHead(C)(C[] s, C[] delim, ref C[] tail);
>>> auto getHead(C)(C[] s, C[] delim);
>>> auto getTail(C)(C[] s, C[] delim);
>>
>> As head is simply splitter(..)[0] and tail splitter(...)[1..$], extra
>> functions could be implemented much like this
>>
>> @property T head(T[] arr) { return arr.front; }
>> @property T[] tail(T[] arr) { return arr[1..$]; }
>>
>> ..and UFCS takes care of the rest:
>> auto fields = splitter(...);
>> auto head = fields.head;
>> auto tail = fields.tail;
>
> But I don't want tail as an array.  Assume that arr is HUGE and scanning the rest of it is a bad idea.  join(arr[1..$]) then becomes a slow operation: O(n) when I could have O(1).


Looking for findSplit? http://dlang.org/phobos/std_algorithm.html#findSplit

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by Chad J
in reply to simendsjo

Chad J

Posted in reply to simendsjo

On 06/23/2012 11:44 AM, simendsjo wrote:
> On Sat, 23 Jun 2012 17:39:55 +0200, Chad J
> <chadjoan@__spam.is.bad__gmail.com> wrote:
>
>> On 06/23/2012 11:31 AM, simendsjo wrote:
>>> On Sat, 23 Jun 2012 17:19:59 +0200, Chad J
>>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>>> http://dlang.org/phobos/std_array.html#splitter
>>>
>>>> The first thing I don't understand is why splitter is in /std.array/
>>>> and yet only works on /strings/. It is defined > in terms of
>>>> whitespace, and I don't understand how whitespace is well-defined for
>>>> things besides text. Why wouldn't > it be in std.string?
>>>
>>> See http://dlang.org/phobos/std_algorithm.html#splitter
>>>
>>>> I would expect these functions to exist:
>>>> auto getHead(C)(C[] s, C[] delim, ref C[] tail);
>>>> auto getHead(C)(C[] s, C[] delim);
>>>> auto getTail(C)(C[] s, C[] delim);
>>>
>>> As head is simply splitter(..)[0] and tail splitter(...)[1..$], extra
>>> functions could be implemented much like this
>>>
>>> @property T head(T[] arr) { return arr.front; }
>>> @property T[] tail(T[] arr) { return arr[1..$]; }
>>>
>>> ..and UFCS takes care of the rest:
>>> auto fields = splitter(...);
>>> auto head = fields.head;
>>> auto tail = fields.tail;
>>
>> But I don't want tail as an array. Assume that arr is HUGE and
>> scanning the rest of it is a bad idea. join(arr[1..$]) then becomes a
>> slow operation: O(n) when I could have O(1).
>
>
> Looking for findSplit? http://dlang.org/phobos/std_algorithm.html#findSplit

Cool, that's what I want!

Now if I could find the elegant way to remove exactly one line from the text without scanning the text after it...

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by simendsjo
in reply to Chad J

simendsjo

Posted in reply to Chad J

On Sat, 23 Jun 2012 18:50:05 +0200, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:

> Looking for findSplit? http://dlang.org/phobos/std_algorithm.html#findSplit
>  Cool, that's what I want!
>  Now if I could find the elegant way to remove exactly one line from the text without scanning the text after it...

Isn't that exactly what findSplit does? It doesn't have to search the rest of the string after the match, it just returns a slice of the rest of the array (I guess - haven't read the code)

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by simendsjo
in reply to simendsjo

simendsjo

Posted in reply to simendsjo

On Sat, 23 Jun 2012 18:56:24 +0200, simendsjo <simendsjo@gmail.com> wrote:

> On Sat, 23 Jun 2012 18:50:05 +0200, Chad J <chadjoan@__spam.is.bad__gmail.com> wrote:
>
>> Looking for findSplit? http://dlang.org/phobos/std_algorithm.html#findSplit
>>  Cool, that's what I want!
>>  Now if I could find the elegant way to remove exactly one line from the text without scanning the text after it...
>
> Isn't that exactly what findSplit does? It doesn't have to search the rest of the string after the match, it just returns a slice of the rest of the array (I guess - haven't read the code)


import std.stdio, std.algorithm;

void main() {
    auto text = "1\n2\n3\n4";
    auto res = text.findSplit("\n");

    auto pre = res[0];
    assert(pre.ptr == text.ptr); // no copy for pre match

    auto match = res[1];
    assert(match.ptr == &text[1]); // no copy for needle

    auto post = res[2];
    assert(post.ptr == &text[2]); // no copy for post match
    assert(post.length == 5);
}

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by Chad J
in reply to simendsjo

Chad J

Posted in reply to simendsjo

On 06/23/2012 01:02 PM, simendsjo wrote:
> On Sat, 23 Jun 2012 18:56:24 +0200, simendsjo <simendsjo@gmail.com> wrote:
>
>> On Sat, 23 Jun 2012 18:50:05 +0200, Chad J
>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>
>>> Looking for findSplit?
>>> http://dlang.org/phobos/std_algorithm.html#findSplit
>>> Cool, that's what I want!
>>> Now if I could find the elegant way to remove exactly one line from
>>> the text without scanning the text after it...
>>
>> Isn't that exactly what findSplit does? It doesn't have to search the
>> rest of the string after the match, it just returns a slice of the
>> rest of the array (I guess - haven't read the code)
>
>
> import std.stdio, std.algorithm;
>
> void main() {
> auto text = "1\n2\n3\n4";
> auto res = text.findSplit("\n");
>
> auto pre = res[0];
> assert(pre.ptr == text.ptr); // no copy for pre match
>
> auto match = res[1];
> assert(match.ptr == &text[1]); // no copy for needle
>
> auto post = res[2];
> assert(post.ptr == &text[2]); // no copy for post match
> assert(post.length == 5);
> }

Close... the reason findSplit doesn't work is because a new line could be "\n" or it could be "\r\n" or it could be "\r".

June 23, 2012

Re: phobos and splitting things... but not with whitespace.

Posted by Chad J
in reply to Chad J

Chad J

Posted in reply to Chad J

On 06/23/2012 01:24 PM, Chad J wrote:
> On 06/23/2012 01:02 PM, simendsjo wrote:
>> On Sat, 23 Jun 2012 18:56:24 +0200, simendsjo <simendsjo@gmail.com>
>> wrote:
>>
>>> On Sat, 23 Jun 2012 18:50:05 +0200, Chad J
>>> <chadjoan@__spam.is.bad__gmail.com> wrote:
>>>
>>>> Looking for findSplit?
>>>> http://dlang.org/phobos/std_algorithm.html#findSplit
>>>> Cool, that's what I want!
>>>> Now if I could find the elegant way to remove exactly one line from
>>>> the text without scanning the text after it...
>>>
>>> Isn't that exactly what findSplit does? It doesn't have to search the
>>> rest of the string after the match, it just returns a slice of the
>>> rest of the array (I guess - haven't read the code)
>>
>>
>> import std.stdio, std.algorithm;
>>
>> void main() {
>> auto text = "1\n2\n3\n4";
>> auto res = text.findSplit("\n");
>>
>> auto pre = res[0];
>> assert(pre.ptr == text.ptr); // no copy for pre match
>>
>> auto match = res[1];
>> assert(match.ptr == &text[1]); // no copy for needle
>>
>> auto post = res[2];
>> assert(post.ptr == &text[2]); // no copy for post match
>> assert(post.length == 5);
>> }
>
> Close... the reason findSplit doesn't work is because a new line could
> be "\n" or it could be "\r\n" or it could be "\r".

As an additional note: I could probably do this easily if I had a function like findSplit where the predicate is used /instead/ of a delimiter.  So like this:

auto findSplit(alias pred = "a", R)(R haystack);
...
auto tuple = findSplit!(`a == "\n" || a == "\r\n" || a == "\r"`)(text);
return tuple[2];

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation