Thread overview
Code generation tricks
Jul 21, 2013
JS
Jul 22, 2013
John Colvin
Jul 23, 2013
JS
Jul 23, 2013
JS
Jul 23, 2013
anonymous
July 21, 2013
This seems to be a somewhat efficient string splitter

http://dpaste.dzfl.pl/4307aa5f

The basic idea is

for(int j = 0; j < s.length; j++)
	{
		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", "
			if (r.length <= i) r.length += 5;
			if (j != 0)
			{
				r[i++] = s[oldj..j];
				oldj = j + %%L;
			}
			else
				oldj = %%L;
		 j += %%L; continue;", T));
		
	}

ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed.

IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful.

One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist.

I think int this case a variadic alias would be very useful.

alias T... => alias T0, alias T1, etc....
(e.g. T[0] is an alias, T.length is number of aliases, etc...)

In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful.


July 22, 2013
On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
> This seems to be a somewhat efficient string splitter
>
> http://dpaste.dzfl.pl/4307aa5f
>
> The basic idea is
>
> for(int j = 0; j < s.length; j++)
> 	{
> 		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", "
> 			if (r.length <= i) r.length += 5;
> 			if (j != 0)
> 			{
> 				r[i++] = s[oldj..j];
> 				oldj = j + %%L;
> 			}
> 			else
> 				oldj = %%L;
> 		 j += %%L; continue;", T));
> 		
> 	}
>
> ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed.
>
> IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful.
>
> One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist.
>
> I think int this case a variadic alias would be very useful.
>
> alias T... => alias T0, alias T1, etc....
> (e.g. T[0] is an alias, T.length is number of aliases, etc...)
>
> In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful.

How does this perform compared to naive/phobos splitting?
July 23, 2013
On Monday, 22 July 2013 at 21:04:42 UTC, John Colvin wrote:
> On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
>> This seems to be a somewhat efficient string splitter
>>
>> http://dpaste.dzfl.pl/4307aa5f
>>
>> The basic idea is
>>
>> for(int j = 0; j < s.length; j++)
>> 	{
>> 		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", "
>> 			if (r.length <= i) r.length += 5;
>> 			if (j != 0)
>> 			{
>> 				r[i++] = s[oldj..j];
>> 				oldj = j + %%L;
>> 			}
>> 			else
>> 				oldj = %%L;
>> 		 j += %%L; continue;", T));
>> 		
>> 	}
>>
>> ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed.
>>
>> IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful.
>>
>> One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist.
>>
>> I think int this case a variadic alias would be very useful.
>>
>> alias T... => alias T0, alias T1, etc....
>> (e.g. T[0] is an alias, T.length is number of aliases, etc...)
>>
>> In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful.
>
> How does this perform compared to naive/phobos splitting?

I don't know... probably not a huge difference unless phobo's is heavily optimized.

With just one delim, there should be no difference. With 100 delim literals, it should probably be significant, more so when chars are used. If the compiler is able to optimize slices of literal strings then it should be even better.


http://dpaste.dzfl.pl/2f10d24a

The code has a bunch of errors on it but compiles fine on mine. Must be some command line switch or something.


July 23, 2013
On Monday, 22 July 2013 at 21:04:42 UTC, John Colvin wrote:
> On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
>> This seems to be a somewhat efficient string splitter
>>
>> http://dpaste.dzfl.pl/4307aa5f
>>
>> The basic idea is
>>
>> for(int j = 0; j < s.length; j++)
>> 	{
>> 		mixin(ExpandVariadicIf!("??Cs[j]??s[j..min(s.length-1, j + %%L)]::", "d", "
>> 			if (r.length <= i) r.length += 5;
>> 			if (j != 0)
>> 			{
>> 				r[i++] = s[oldj..j];
>> 				oldj = j + %%L;
>> 			}
>> 			else
>> 				oldj = %%L;
>> 		 j += %%L; continue;", T));
>> 		
>> 	}
>>
>> ExpandVariadicIf creates a series of if's for each variadic argument. There is some strange formatting(just some crap I threw together to get something working) but it boils down to generating compile time code that minimizes computations and lookups by directly using the known compile time literals passed.
>>
>> IMO these types of functions seem useful but ATM are just hacks. Hopefully there is a better way to do these sorts of things as I find them pretty useful.
>>
>> One of the big issues not being able to pass a variadic variable to a template directly which is why the formatting string is necessary(You can pass the typetuple to get the types and size but not the compile time values if they exist.
>>
>> I think int this case a variadic alias would be very useful.
>>
>> alias T... => alias T0, alias T1, etc....
>> (e.g. T[0] is an alias, T.length is number of aliases, etc...)
>>
>> In any case, maybe someone has a good way to make these things easier and more useful. Being able to handle variadic types and values in a consistent and simple way will make them moreful.
>
> How does this perform compared to naive/phobos splitting?

I don't know... probably not a huge difference unless phobo's is heavily optimized.

With just one delim, there should be no difference. With 100 delim literals, it should probably be significant, more so when chars are used. If the compiler is able to optimize slices of literal strings then it should be even better.


Heres my test code that you might be able to profile if you want:

http://dpaste.dzfl.pl/2f10d24a

The code has a bunch of errors on it but compiles fine on mine. Must be some command line switch or something.

The Expand templates simply allow one to expand the variadic args into compile time expressions. e.g., we can do if (a == b) with normal args but not with variargs... the templates help accomplish that. (I'm sure there are better ways... think of the code as proof of concept).

July 23, 2013
On Sunday, 21 July 2013 at 17:24:11 UTC, JS wrote:
> This seems to be a somewhat efficient string splitter
>
> http://dpaste.dzfl.pl/4307aa5f

I probably shouldn't have done this, but I wanted to know what that abomination actually does, so I reduced it (code below). In the end, all it does is accepting both char and string separators, something rather simple when you have static if.

Some comments on the result:

* I think the fiddling with i and r.length is silly, but it had some impact on performance, so I left it in.

* Likewise, I'd rather just use std.algorithm.startsWith and not distinguish between char and string separators in split. Again, performance was slightly worse.

And here it is:

inout(char)[][] split(Separators ...)(inout(char)[] s, Separators separators)
{
    size_t i = 0, oldj = 0;
    inout(char)[][] r;

    for(size_t j = 0; j < s.length; j++)
    {
        foreach(si, S; Separators)
        {
            immutable sep = separators[si];

            static if(is(S : char))
            {
                auto slice = s[j];
                enum seplen = 1;
            }
            else static if(is(S : const(char)[]))
            {
                auto slice = s[j .. min(s.length, j + sep.length)];
                immutable seplen = sep.length;
            }
            else static assert(false);

            if(slice == sep)
            {
                if(r.length <= i) r.length += 5;
                if(j != 0) r[i++] = s[oldj .. j];
                j += seplen;
                oldj = j;
            }
        }
    }

    if(oldj < s.length)
    {
        auto tail = s[oldj .. $];
        if(tail.length > 0)
        {
            if(r.length <= i) r.length++;
            r[i++] = tail;
        }
    }

    r.length = i;
    return r;
}