Thread overview
How to divide by space keeping words with spaces inside quotes?
Aug 08, 2021
Marcone
Aug 09, 2021
cy
Aug 09, 2021
Basile.B
Aug 09, 2021
Basile.B
Aug 09, 2021
jfondren
Aug 09, 2021
Marcone
August 08, 2021

How to divide by space keeping words with spaces inside quotes?

Exanple:

string text = "Duck Cat "Carl Rivers" Dog";

I want split to:

["Duck", "Cat", "Carl Rivers", "Dog"]

ATENTION: I DON'T WANT:

["Duck", "Cat", "Carl", "Rivers", "Dog"]

How can I get it in Dlang?

August 09, 2021

On Sunday, 8 August 2021 at 23:04:32 UTC, Marcone wrote:

>

How to divide by space keeping words with spaces inside quotes?

Well the designers of ASCII were morons who decided that open quote and close quote would be the same damn letter, so it's a little trickier. Basically what you have to do is process it character by character into a finite state machine that switches between word mode, space mode, and quoting mode, accounting for backslash escapes since morons, etc. so you kinda need them.

I dunno any modules in specific that do it, but something like:

string somestr = "Duck Cat \"Carl Rivers\" Dog";

enum FSM { QUOTING, WORD, SPACE };

struct FSM {
	FSM mode;
	bool backslash;
	Appender!char cur;
	Appender!string accum;
}

FSM fsm;
fsm.mode = SPACE;

foreach(char ch: somestr) {
	if(fsm.backslash) {
		fsm.backslash = false;
		cur.add(ch);
		continue;
	}
			
	switch(fsm.mode) {
	case FSM.QUOTING:
		switch(ch) {
		case '\\':
			cur.add('\\');
			fsm.backslash = true;
		case '"':
			fsm.mode = FSM.SPACE;
			accum.add(tostring(cur.data));
			cur = appender!char;
			break;
		default:
			cur.add(ch);
		};
		break;
	case FSM.WORD:
		switch(ch) {
		case '\\':
			cur.add('\\');
			fsm.backslash = true;			
		case ' ':
		case '\t':
			fsm.mode = FSM.SPACE;
			if(cur.data.length) {
				accum.add(tostring(cur.data));
				cur = appender!char;
			}
			break;
		default:
			cur.add(ch);
		};
		break;
	case FSM.SPACE:
		switch(ch) {
		case '\\':
			fsm.backslash = true;
			fsm.mode = WORD;
			cur.add('\\');
			break;
		case ' ':
		case '\t':
		case '\n':
			break;
		case '"':
			fsm.mode = FSM.QUOTING;
			break;
		default:
			cur.add(ch);
			fsm.mode = FSM.WORD;
			break;
		};
	};
}

string[] result = fsm.data;
print(result);

(untested pseudocode that won't work btw)

August 09, 2021

On Sunday, 8 August 2021 at 23:04:32 UTC, Marcone wrote:

>

How to divide by space keeping words with spaces inside quotes?

Exanple:

string text = "Duck Cat "Carl Rivers" Dog";

I want split to:

["Duck", "Cat", "Carl Rivers", "Dog"]

ATENTION: I DON'T WANT:

["Duck", "Cat", "Carl", "Rivers", "Dog"]

How can I get it in Dlang?

You can use a regex. Apparently the pattern (\"[\w ]*\")|\w* would work, other option is to write a dedicated lexer, as suggested in the other answer.

August 09, 2021

On Monday, 9 August 2021 at 04:19:05 UTC, Basile.B wrote:

>

On Sunday, 8 August 2021 at 23:04:32 UTC, Marcone wrote:

>

How to divide by space keeping words with spaces inside quotes?

Exanple:

string text = "Duck Cat "Carl Rivers" Dog";

I want split to:

["Duck", "Cat", "Carl Rivers", "Dog"]

ATENTION: I DON'T WANT:

["Duck", "Cat", "Carl", "Rivers", "Dog"]

How can I get it in Dlang?

You can use a regex. Apparently the pattern (\"[\w ]*\")|\w* would work

with + as quantifier actually (\"[\w ]+\")|\w+

August 09, 2021

On Sunday, 8 August 2021 at 23:04:32 UTC, Marcone wrote:

>

How to divide by space keeping words with spaces inside quotes?

Exanple:

string text = "Duck Cat "Carl Rivers" Dog";

I want split to:

["Duck", "Cat", "Carl Rivers", "Dog"]

ATENTION: I DON'T WANT:

["Duck", "Cat", "Carl", "Rivers", "Dog"]

How can I get it in Dlang?

regex:

// test with: dmd -unittest -main -run filename.d

string[] splitquote(string s) {
    import std.regex : matchAll, regex;
    import std.array : array;
    import std.algorithm : map;

    return s.matchAll(regex(`"([\w ]+)"|(\w+)`)).map!"a[2] ? a[2] : a[1]".array;
}

unittest {
    assert(`Duck Cat Carl`.splitquote == ["Duck", "Cat", "Carl"]);
    assert(`Duck "Cat" Carl`.splitquote == ["Duck", "Cat", "Carl"]);
    assert(`Duck "Cat Carl"`.splitquote == ["Duck", "Cat Carl"]);
    assert(`"Duck" "Cat Carl`.splitquote == ["Duck", "Cat", "Carl"]); // GIGO
    assert(`"Duck Cat" "Carl"`.splitquote == ["Duck Cat", "Carl"]);
}

PEG:

/++ dub.sdl:
    dependency "pegged" version="~>0.4.5"
+/
// test with: dub run -bunittest --single filename.d
import pegged.grammar;

mixin(grammar(q"PEG
Quotable:
    Words    < (' '* (Quoted/Unquoted))*
    Quoted   <~ :doublequote (!doublequote .)+ :doublequote
    Unquoted < identifier+
PEG"));

string[] splitquote(string s) {
    return Quotable(s).matches;
}

unittest {
    assert(`Duck Cat Carl`.splitquote == ["Duck", "Cat", "Carl"]);
    assert(`Duck "Cat" Carl`.splitquote == ["Duck", "Cat", "Carl"]);
    assert(`Duck "Cat Carl"`.splitquote == ["Duck", "Cat Carl"]);
    assert(`"Duck" "Cat Carl`.splitquote == ["Duck"]);
    assert(`"Duck Cat" "Carl"`.splitquote == ["Duck Cat", "Carl"]);
}

void main() { }
August 09, 2021

Thank you very much! With your helps I created this function that works fine:

// Function splitcommas()
string[] splitcommas(string text) nothrow {
	try {
		return text.splitter!(Yes.keepSeparators)(regex("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'")).array.map!(x => x.replace("\"", "")).filter!(x => x.strip.length).array;
	} catch(Throwable){ return []; }
}