Thread overview | |||||||
---|---|---|---|---|---|---|---|
|
July 21, 2013 Code generation tricks | ||||
---|---|---|---|---|
| ||||
This seems to be a somewhat efficient string splitter http://dpaste.dzfl.pl/4307aa5f The basic idea is for(int j = 0; j < s.length; j++) { mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", " if (r.length <= i) r.length += 5; if (j != 0) { r[i++] = s[oldj..j]; oldj = j + %%L; } else oldj = %%L; j += %%L; continue;", T)); } ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed. IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful. One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist. I think int this case a variadic alias would be very useful. alias T... => alias T0, alias T1, etc.... (e.g. T[0] is an alias, T.length is number of aliases, etc...) In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful. |
July 22, 2013 Re: Code generation tricks | ||||
---|---|---|---|---|
| ||||
Posted in reply to JS | On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
> This seems to be a somewhat efficient string splitter
>
> http://dpaste.dzfl.pl/4307aa5f
>
> The basic idea is
>
> for(int j = 0; j < s.length; j++)
> {
> mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", "
> if (r.length <= i) r.length += 5;
> if (j != 0)
> {
> r[i++] = s[oldj..j];
> oldj = j + %%L;
> }
> else
> oldj = %%L;
> j += %%L; continue;", T));
>
> }
>
> ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed.
>
> IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful.
>
> One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist.
>
> I think int this case a variadic alias would be very useful.
>
> alias T... => alias T0, alias T1, etc....
> (e.g. T[0] is an alias, T.length is number of aliases, etc...)
>
> In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful.
How does this perform compared to naive/phobos splitting?
|
July 23, 2013 Re: Code generation tricks | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Monday, 22 July 2013 at 21:04:42 UTC, John Colvin wrote: > On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote: >> This seems to be a somewhat efficient string splitter >> >> http://dpaste.dzfl.pl/4307aa5f >> >> The basic idea is >> >> for(int j = 0; j < s.length; j++) >> { >> mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", " >> if (r.length <= i) r.length += 5; >> if (j != 0) >> { >> r[i++] = s[oldj..j]; >> oldj = j + %%L; >> } >> else >> oldj = %%L; >> j += %%L; continue;", T)); >> >> } >> >> ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed. >> >> IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful. >> >> One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist. >> >> I think int this case a variadic alias would be very useful. >> >> alias T... => alias T0, alias T1, etc.... >> (e.g. T[0] is an alias, T.length is number of aliases, etc...) >> >> In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful. > > How does this perform compared to naive/phobos splitting? I don't know... probably not a huge difference unless phobo's is heavily optimized. With just one delim, there should be no difference. With 100 delim literals, it should probably be significant, more so when chars are used. If the compiler is able to optimize slices of literal strings then it should be even better. http://dpaste.dzfl.pl/2f10d24a The code has a bunch of errors on it but compiles fine on mine. Must be some command line switch or something. |
July 23, 2013 Re: Code generation tricks | ||||
---|---|---|---|---|
| ||||
Posted in reply to John Colvin | On Monday, 22 July 2013 at 21:04:42 UTC, John Colvin wrote: > On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote: >> This seems to be a somewhat efficient string splitter >> >> http://dpaste.dzfl.pl/4307aa5f >> >> The basic idea is >> >> for(int j = 0; j < s.length; j++) >> { >> mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", " >> if (r.length <= i) r.length += 5; >> if (j != 0) >> { >> r[i++] = s[oldj..j]; >> oldj = j + %%L; >> } >> else >> oldj = %%L; >> j += %%L; continue;", T)); >> >> } >> >> ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed. >> >> IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful. >> >> One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist. >> >> I think int this case a variadic alias would be very useful. >> >> alias T... => alias T0, alias T1, etc.... >> (e.g. T[0] is an alias, T.length is number of aliases, etc...) >> >> In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful. > > How does this perform compared to naive/phobos splitting? I don't know... probably not a huge difference unless phobo's is heavily optimized. With just one delim, there should be no difference. With 100 delim literals, it should probably be significant, more so when chars are used. If the compiler is able to optimize slices of literal strings then it should be even better. Heres my test code that you might be able to profile if you want: http://dpaste.dzfl.pl/2f10d24a The code has a bunch of errors on it but compiles fine on mine. Must be some command line switch or something. The Expand templates simply allow one to expand the variadic args into compile time expressions. e.g., we can do if (a == b) with normal args but not with variargs... the templates help accomplish that. (I'm sure there are better ways... think of the code as proof of concept). |
July 23, 2013 Re: Code generation tricks | ||||
---|---|---|---|---|
| ||||
Posted in reply to JS | On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
> This seems to be a somewhat efficient string splitter
>
> http://dpaste.dzfl.pl/4307aa5f
I probably shouldn't have done this, but I wanted to know what that abomination actually does, so I reduced it (code below). In the end, all it does is accepting both char and string separators, something rather simple when you have static if.
Some comments on the result:
* I think the fiddling with i and r.length is silly, but it had some impact on performance, so I left it in.
* Likewise, I'd rather just use std.algorithm.startsWith and not distinguish between char and string separators in split. Again, performance was slightly worse.
And here it is:
inout(char)[][] split(Separators ...)(inout(char)[] s, Separators separators)
{
size_t i = 0, oldj = 0;
inout(char)[][] r;
for(size_t j = 0; j < s.length; j++)
{
foreach(si, S; Separators)
{
immutable sep = separators[si];
static if(is(S : char))
{
auto slice = s[j];
enum seplen = 1;
}
else static if(is(S : const(char)[]))
{
auto slice = s[j .. min(s.length, j + sep.length)];
immutable seplen = sep.length;
}
else static assert(false);
if(slice == sep)
{
if(r.length <= i) r.length += 5;
if(j != 0) r[i++] = s[oldj .. j];
j += seplen;
oldj = j;
}
}
}
if(oldj < s.length)
{
auto tail = s[oldj .. $];
if(tail.length > 0)
{
if(r.length <= i) r.length++;
r[i++] = tail;
}
}
r.length = i;
return r;
}
|
Copyright © 1999-2021 by the D Language Foundation