February 09, 2017
On 02/09/2017 02:04 PM, Nestor wrote:

> I tried running each algoritm a few times through avgtime using
> different digit lengths

avgtime profiles the whole process, right? It measures everything that is involved in that little program. At least OS starting the program, D runtime initializing, program interacting with the console, etc.

Since you're comparing algorithms, you need to take everything else out of the equation. A better approach is to use std.datetime.benchmark, which would repeat just the algorithm that you're interested in.

Ali

February 09, 2017
Dne 9.2.2017 v 23:08 Ali Çehreli via Digitalmars-d-learn napsal(a):

> On 02/09/2017 02:04 PM, Nestor wrote:
>
> > I tried running each algoritm a few times through avgtime using
> > different digit lengths
>
> avgtime profiles the whole process, right? It measures everything that is involved in that little program. At least OS starting the program, D runtime initializing, program interacting with the console, etc.
>
> Since you're comparing algorithms, you need to take everything else out of the equation. A better approach is to use std.datetime.benchmark, which would repeat just the algorithm that you're interested in.
>
> Ali
>
Yes this is important, and for example when you used shared libphobos it makes really big difference
February 09, 2017
On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:
> OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why?

 Truthfully, because you'll need tens of millions or hundreds of millions in length to determine if it makes a difference and how much. Addition, subtraction and simple memory lookups take very little time, and since the entire array (100 bytes) fits in the cache, it is going to perform very very very well regardless if you can optimize it further.

 If you tested this on a much slower system, say an 8bit 6502 the differences would be far more pronounced, but not that much different.

 Since the algorithm is more or less O(n) optimizing it won't make many differences.

 It's possible you could get a speedup by making them ints instead of chars, since then it might not have a penalty for the 'address not divisible by 4' that applies (which is more for ARM architectures and less for x86).

 Other optimizations could be to make it multiple levels, taking the basic 100 elements and expanding them 2-3 levels deep in a lookup and having it do it in more or less a single operation. (100 bytes for 1 level, 10,000 for 2 levels, 1,000,000 for 3 levels, 100,000,000 for 4 levels, etc), but the steps of converting it to the array lookup won't give you that much gain, although fewer memory lookups but none of them will be cached, so any advantage from that is probably lost. Although if you bump up the size to 16x10 instead of 10x10, you could use a shift instead of *10 which will make that slightly faster (there will be unused empty padded spots)

 In theory if you avoid the memory lookup at all, you could gain some amount of speed, depending on how it searches a manual table, although using a switch-case and a mixin to do all the details it feels like it wouldn't give you any gain...

 Division operations are the slowest operations you can do, but otherwise most instructions run really fast. Unless you're trying to get it smaller (fewer bytes for the call) or shaving for speed by instruction cycle counting (like on the 6502) i doubt you'll get much benefit.
February 10, 2017
On Thursday, 9 February 2017 at 23:49:19 UTC, Era Scarecrow wrote:
>  Other optimizations could be to make it multiple levels, taking the basic 100 elements and expanding them 2-3 levels deep in a lookup and having it do it in more or less a single operation. (100 bytes for 1 level, 10,000 for 2 levels, 1,000,000 for 3 levels, 100,000,000 for 4 levels, etc), but the steps of converting it to the array lookup won't give you that much gain, although fewer memory lookups but none of them will be cached, so any advantage from that is probably lost. Although if you bump up the size to 16x10 instead of 10x10, you could use a shift instead of *10 which will make that slightly faster (there will be unused empty padded spots)
>
>  In theory if you avoid the memory lookup at all, you could gain some amount of speed, depending on how it searches a manual table, although using a switch-case and a mixin to do all the details it feels like it wouldn't give you any gain...

Thank you for the detailed reply. I wasn't able to follow you regarding the multilevel stuff though :(
February 11, 2017
On Friday, 10 February 2017 at 11:27:02 UTC, Nestor wrote:
> Thank you for the detailed reply. I wasn't able to follow you regarding the multilevel stuff though :(

 The idea behind it is like this (which you can scale up):

static immutable int[] QG10Matrix2 = buildMatrix2();

int[] buildMatrix2() {
    string digits = "0123456789";
    int[] l = new int[16*16*10];
    char[3] s;
    foreach(a; digits)
    foreach(b; digits)
    foreach(c; digits) {
        s[] = [a,b,c];
        l[(a-'0')<< 8|(b-'0')<<4|(c-'0')]=checkDigit(cast(string) s) - '0';
    }

    return l;
}


Using that it SHOULD allow you to get the result of 2 inputs simply by using 2 characters (plus the old result)

char checkDigit2(string str) {
    int tmpdigit = 0;
    for(;str.length >= 2;str=str[2 .. $]) {
        tmpdigit = QG10Matrix2[tmpdigit<<8|(str[0]-'0')<< 4|(str[1]-'0')];
    }
   // handle remainder single character and return value


 While it should be easy, I'm having issues trying to get the proper results via unittests and I'm not sure why. Probably something incredibly simple on my part.
February 11, 2017
On Saturday, 11 February 2017 at 11:45:02 UTC, Era Scarecrow wrote:
> On Friday, 10 February 2017 at 11:27:02 UTC, Nestor wrote:
>> Thank you for the detailed reply. I wasn't able to follow you regarding the multilevel stuff though :(
>
>  The idea behind it is like this (which you can scale up):
>
> static immutable int[] QG10Matrix2 = buildMatrix2();
>
> int[] buildMatrix2() {
>     string digits = "0123456789";
>     int[] l = new int[16*16*10];
>     char[3] s;
>     foreach(a; digits)
>     foreach(b; digits)
>     foreach(c; digits) {
>         s[] = [a,b,c];
>         l[(a-'0')<< 8|(b-'0')<<4|(c-'0')]=checkDigit(cast(string) s) - '0';
>     }
>
>     return l;
> }
>
>
> Using that it SHOULD allow you to get the result of 2 inputs simply by using 2 characters (plus the old result)
>
> char checkDigit2(string str) {
>     int tmpdigit = 0;
>     for(;str.length >= 2;str=str[2 .. $]) {
>         tmpdigit = QG10Matrix2[tmpdigit<<8|(str[0]-'0')<< 4|(str[1]-'0')];
>     }
>    // handle remainder single character and return value
>
>
>  While it should be easy, I'm having issues trying to get the proper results via unittests and I'm not sure why. Probably something incredibly simple on my part.

Notice this is no ordinary matrix, but an Anti-Simmetric QuasiGroup of order 10, and tmpdigit (called interim in the algorithm) is used in each round (although the function isn't recursive) together with each digit to calculate final check digit.
February 11, 2017
On Saturday, 11 February 2017 at 20:19:51 UTC, Nestor wrote:
> Notice this is no ordinary matrix, but an Anti-Simmetric QuasiGroup of order 10, and tmpdigit (called interim in the algorithm) is used in each round (although the function isn't recursive) together with each digit to calculate final check digit.

 Yes i know, which is why i had 3 to calculate 2 inputs because the third is the temp/previous calculation.

 If however you were calculating a fixed number of digits a single table could be made and do a single lookup, assuming it wasn't too large to make it uncumbersome or impractical.
February 11, 2017
On Saturday, 11 February 2017 at 21:02:40 UTC, Era Scarecrow wrote:
>  Yes i know, which is why i had 3 to calculate 2 inputs because the third is the temp/previous calculation.

 Alright I've found the bug and fixed it, and it passes with flying colors (brute force tests up to 6 digits); However it doesn't use the original function to build the table. So I'm satisfied it will handle any length now.

 But it seriously is a lot of overhead for such a simple function.

int[] buildMatrix2() {
    string digits = "0123456789";
    int[] l = new int[16*16*10];
    l[] = -1; //printing the array it's obvious to see what is padding
    foreach(a; digits)
    foreach(b; digits)
    foreach(c; digits) {
        int t = (a-'0')*10,
            t2 = (QG10Matrix[(b - '0') + t]-'0') * 10,
            off = (a - '0') << 8 | (b - '0') << 4 | (c - '0');
        l[off] = (QG10Matrix[(c - '0') + t2]-'0')<<8;
    }

    return l;
}

char checkDigit2(string str) {
    int tmpdigit = 0;
    for(;str.length >= 2;str = str[2 .. $])
        tmpdigit = QG10Matrix2[tmpdigit|(str[0]-'0')<<4|(str[1]-'0')];

    tmpdigit>>=8;
    if (str.length==1)
        return QG10Matrix[(str[0]-'0')+tmpdigit*10];

    return (tmpdigit+'0') & 0xff;
}


February 11, 2017
On Saturday, 11 February 2017 at 21:41:11 UTC, Era Scarecrow wrote:
>  But it seriously is a lot of overhead for such a simple function.

 Just ran the unittests under the dmd profiler, says the algorithm is 11% faster now. So yeah slightly more optimized. Another level and we could probably get 25%, but the built matrix will blow up far larger than the 10k it is now.

  Num          Tree        Func        Per
  Calls        Time        Time        Call
12000000     1281989     1281989           0     char damm.checkDigit(immutable(char)[])
12000000     1146308     1146308           0     char damm.checkDigit2(immutable(char)[])


February 12, 2017
On Saturday, 11 February 2017 at 21:41:11 UTC, Era Scarecrow wrote:
> On Saturday, 11 February 2017 at 21:02:40 UTC, Era Scarecrow wrote:
>>  Yes i know, which is why i had 3 to calculate 2 inputs because the third is the temp/previous calculation.
>
>  Alright I've found the bug and fixed it, and it passes with flying colors (brute force tests up to 6 digits); However it doesn't use the original function to build the table. So I'm satisfied it will handle any length now.
>
>  But it seriously is a lot of overhead for such a simple function.
>
> int[] buildMatrix2() {
>     string digits = "0123456789";
>     int[] l = new int[16*16*10];
>     l[] = -1; //printing the array it's obvious to see what is padding
>     foreach(a; digits)
>     foreach(b; digits)
>     foreach(c; digits) {
>         int t = (a-'0')*10,
>             t2 = (QG10Matrix[(b - '0') + t]-'0') * 10,
>             off = (a - '0') << 8 | (b - '0') << 4 | (c - '0');
>         l[off] = (QG10Matrix[(c - '0') + t2]-'0')<<8;
>     }
>
>     return l;
> }
>
> char checkDigit2(string str) {
>     int tmpdigit = 0;
>     for(;str.length >= 2;str = str[2 .. $])
>         tmpdigit = QG10Matrix2[tmpdigit|(str[0]-'0')<<4|(str[1]-'0')];
>
>     tmpdigit>>=8;
>     if (str.length==1)
>         return QG10Matrix[(str[0]-'0')+tmpdigit*10];
>
>     return (tmpdigit+'0') & 0xff;
> }

I fail to see where you are declaring QG10Matrix2, because apparently it's an array of chars, but buildMatrix2 returns an array of int (2560 elements??) with lots of -1 values.