Can this implementation of Damm algorithm be optimized? - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » Learn » Can this implementation of Damm algorithm be optimized?

Thread overview

Can this implementation of Damm algorithm be optimized?
Feb 09, 2017 Nestor
Feb 09, 2017 Cym13
Feb 09, 2017 Era Scarecrow
Feb 09, 2017 Nestor
Feb 09, 2017 Daniel Kozak
Feb 09, 2017 Nestor
Feb 09, 2017 Daniel Kozak
Feb 09, 2017 Nestor
Feb 09, 2017 Ali Çehreli
Feb 09, 2017 Daniel Kozak
Feb 09, 2017 Cym13
Feb 09, 2017 Era Scarecrow
Feb 10, 2017 Nestor
Feb 11, 2017 Era Scarecrow
Feb 11, 2017 Nestor
Feb 11, 2017 Era Scarecrow
Feb 11, 2017 Era Scarecrow
Feb 11, 2017 Era Scarecrow
Feb 12, 2017 Era Scarecrow
Feb 13, 2017 Nestor
Feb 13, 2017 Era Scarecrow
Feb 12, 2017 Nestor
Feb 12, 2017 Era Scarecrow
Feb 09, 2017 Nestor

February 09, 2017

Can this implementation of Damm algorithm be optimized?

Posted by Nestor

Nestor

Hi,

I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:

import std.stdio;

static immutable char[] QG10Matrix =
  "03175986427092154863420687135917509834266123045978" ~
  "36742095815869720134894536201794386172052581436790";

char checkDigit(string str) {
  char tmpdigit = '0';
  foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
  return tmpdigit;
}

enum {
  EXIT_SUCCESS = 0,
  EXIT_FAILURE = 1,
}

int main(string[] args) {
  scope(failure) {
    writeln("Invalid arguments. You must pass a number.");
    return EXIT_FAILURE;
  }
  assert(args.length == 2);
  char digit = checkDigit(args[1]);
  if(digit == '0') writefln("%s is a valid number.", args[1]);
  else {
    writefln("%s is not a valid number (but it would be, appending digit %s).",
      args[1], digit);
  }

  return EXIT_SUCCESS;
}

[1] https://en.wikiversity.org/wiki/Damm_algorithm

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Cym13
in reply to Nestor

Cym13

Posted in reply to Nestor

On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
> Hi,
>
> I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:
>
> import std.stdio;
>
> static immutable char[] QG10Matrix =
>   "03175986427092154863420687135917509834266123045978" ~
>   "36742095815869720134894536201794386172052581436790";
>
> char checkDigit(string str) {
>   char tmpdigit = '0';
>   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
>   return tmpdigit;
> }
>
> enum {
>   EXIT_SUCCESS = 0,
>   EXIT_FAILURE = 1,
> }
>
> int main(string[] args) {
>   scope(failure) {
>     writeln("Invalid arguments. You must pass a number.");
>     return EXIT_FAILURE;
>   }
>   assert(args.length == 2);
>   char digit = checkDigit(args[1]);
>   if(digit == '0') writefln("%s is a valid number.", args[1]);
>   else {
>     writefln("%s is not a valid number (but it would be, appending digit %s).",
>       args[1], digit);
>   }
>
>   return EXIT_SUCCESS;
> }
>
> [1] https://en.wikiversity.org/wiki/Damm_algorithm

I can't see how to make it more efficient than it is (aside from finding a better algorithm which isn't a language concern). You are using C style and will get the same machine code as if you'd written it with C. The only difference would be boundchecking that you can disable by using the -boundscheck=off compiler switch.

By the way note that you can use the -profile switch to see where the bottlenecks are, in your case writting the result to stdout is *by far* what takes the most time (and I'm quite certain the bottleneck is the same in the C version for it's a fact that I/O is slow).

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Era Scarecrow
in reply to Nestor

Era Scarecrow

Posted in reply to Nestor

On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
> I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:
>
> import std.stdio;
>
> static immutable char[] QG10Matrix =
>   "03175986427092154863420687135917509834266123045978" ~
>   "36742095815869720134894536201794386172052581436790";
>
> char checkDigit(string str) {
>   char tmpdigit = '0';
>   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
>   return tmpdigit;
> }

Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference.

Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand)


 static immutable char[] QG10Matrix = [
    00,30,10,70,50,90,80,60,40,20,
    70,00,90,20,10,50,40,80,60,30,
    40,20,00,60,80,70,10,30,50,90,
    10,70,50,00,90,80,30,40,20,60,
    60,10,20,30,00,40,50,90,70,80,
    30,60,70,40,20,00,90,50,80,10,
    50,80,60,90,70,20,00,10,30,40,
    80,90,40,50,30,60,20,00,10,70,
    90,40,30,80,60,10,70,20,00,50,
    20,50,80,10,40,30,60,70,90,00];

 char checkDigit(string str) {
   char tmpdigit = 0;
   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
 tmpdigit];
   return (tmpdigit/10) + '0';
 }

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Nestor
in reply to Era Scarecrow

Nestor

Posted in reply to Era Scarecrow

On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:
> On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
>> I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:
>>
>> import std.stdio;
>>
>> static immutable char[] QG10Matrix =
>>   "03175986427092154863420687135917509834266123045978" ~
>>   "36742095815869720134894536201794386172052581436790";
>>
>> char checkDigit(string str) {
>>   char tmpdigit = '0';
>>   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
>>   return tmpdigit;
>> }
>
> Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference.
>
> Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand)
>
>
>  static immutable char[] QG10Matrix = [
>     00,30,10,70,50,90,80,60,40,20,
>     70,00,90,20,10,50,40,80,60,30,
>     40,20,00,60,80,70,10,30,50,90,
>     10,70,50,00,90,80,30,40,20,60,
>     60,10,20,30,00,40,50,90,70,80,
>     30,60,70,40,20,00,90,50,80,10,
>     50,80,60,90,70,20,00,10,30,40,
>     80,90,40,50,30,60,20,00,10,70,
>     90,40,30,80,60,10,70,20,00,50,
>     20,50,80,10,40,30,60,70,90,00];
>
>  char checkDigit(string str) {
>    char tmpdigit = 0;
>    foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
>  tmpdigit];
>    return (tmpdigit/10) + '0';
>  }



OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why?

import std.stdio;

static immutable ubyte[][] QG10Matrix = [
  [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
  [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
  [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
  [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
  [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
];

static int charToInt(char chr) {
  scope(failure) return -1;
  return cast(int)(chr - '0');
}

ubyte checkDigit(string str) {
  ubyte tmpdigit;
  foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
  return tmpdigit;
}

enum {
  EXIT_SUCCESS = 0,
  EXIT_FAILURE = 1,
}

int main(string[] args) {
  scope(failure) {
    writeln("Invalid arguments. You must pass a number.");
    return EXIT_FAILURE;
  }
  assert(args.length == 2);
  ubyte digit = checkDigit(args[1]);
  if(digit == 0) writefln("%s is a valid number.", args[1]);
  else {
    writefln("%s is not a valid number (but it would be, appending digit %s).",
      args[1], digit);
  }

  return EXIT_SUCCESS;
}

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Nestor
in reply to Era Scarecrow

Nestor

Posted in reply to Era Scarecrow

On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:
> ...
> Actually since you're also multiplying by 10, you can incorporate that in the table too...

I forgot to comment that what is multiplied by ten is not the value but the starting position in the array (a way to emulate a matrix), but anyway see my comments in previous post.

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Daniel Kozak
in reply to Nestor

Daniel Kozak

Posted in reply to Nestor

Dne 9.2.2017 v 20:39 Nestor via Digitalmars-d-learn napsal(a):

> On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:
>> On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
>>> I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:
>>>
>>> import std.stdio;
>>>
>>> static immutable char[] QG10Matrix =
>>>   "03175986427092154863420687135917509834266123045978" ~
>>>   "36742095815869720134894536201794386172052581436790";
>>>
>>> char checkDigit(string str) {
>>>   char tmpdigit = '0';
>>>   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
>>>   return tmpdigit;
>>> }
>>
>> Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference.
>>
>> Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand)
>>
>>
>>  static immutable char[] QG10Matrix = [
>>     00,30,10,70,50,90,80,60,40,20,
>>     70,00,90,20,10,50,40,80,60,30,
>>     40,20,00,60,80,70,10,30,50,90,
>>     10,70,50,00,90,80,30,40,20,60,
>>     60,10,20,30,00,40,50,90,70,80,
>>     30,60,70,40,20,00,90,50,80,10,
>>     50,80,60,90,70,20,00,10,30,40,
>>     80,90,40,50,30,60,20,00,10,70,
>>     90,40,30,80,60,10,70,20,00,50,
>>     20,50,80,10,40,30,60,70,90,00];
>>
>>  char checkDigit(string str) {
>>    char tmpdigit = 0;
>>    foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
>>  tmpdigit];
>>    return (tmpdigit/10) + '0';
>>  }
>
>
>
> OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why?
>
> import std.stdio;
>
> static immutable ubyte[][] QG10Matrix = [
>   [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
>   [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
>   [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
>   [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
>   [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
> ];
>
> static int charToInt(char chr) {
>   scope(failure) return -1;
>   return cast(int)(chr - '0');
> }
>
> ubyte checkDigit(string str) {
>   ubyte tmpdigit;
>   foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
>   return tmpdigit;
> }
>
> enum {
>   EXIT_SUCCESS = 0,
>   EXIT_FAILURE = 1,
> }
>
> int main(string[] args) {
>   scope(failure) {
>     writeln("Invalid arguments. You must pass a number.");
>     return EXIT_FAILURE;
>   }
>   assert(args.length == 2);
>   ubyte digit = checkDigit(args[1]);
>   if(digit == 0) writefln("%s is a valid number.", args[1]);
>   else {
>     writefln("%s is not a valid number (but it would be, appending digit %s).",
>       args[1], digit);
>   }
>
>   return EXIT_SUCCESS;
> }
>
Maybe you can try use static array instead of dynamic
static immutable ubyte[10][10] QG10Matrix = ...

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Cym13
in reply to Nestor

Cym13

Posted in reply to Nestor

On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:
> On Thursday, 9 February 2017 at 18:34:30 UTC, Era Scarecrow wrote:
>> On Thursday, 9 February 2017 at 17:36:11 UTC, Nestor wrote:
>>> I was trying to port C code from the article in Wikiversity [1] to D, but I'm not sure this implementation is the most efficient way to do it in D, so suggestions to optimize it are welcome:
>>>
>>> import std.stdio;
>>>
>>> static immutable char[] QG10Matrix =
>>>   "03175986427092154863420687135917509834266123045978" ~
>>>   "36742095815869720134894536201794386172052581436790";
>>>
>>> char checkDigit(string str) {
>>>   char tmpdigit = '0';
>>>   foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') + (tmpdigit - '0') * 10];
>>>   return tmpdigit;
>>> }
>>
>> Well one thing is you can probably reduce them from chars to just bytes, instead of having to subtract you can instead add at the end. Although unless you're working with a VERY large input you won't see a difference.
>>
>> Actually since you're also multiplying by 10, you can incorporate that in the table too... (although a mixin might be better for the conversion than by hand)
>>
>>
>>  static immutable char[] QG10Matrix = [
>>     00,30,10,70,50,90,80,60,40,20,
>>     70,00,90,20,10,50,40,80,60,30,
>>     40,20,00,60,80,70,10,30,50,90,
>>     10,70,50,00,90,80,30,40,20,60,
>>     60,10,20,30,00,40,50,90,70,80,
>>     30,60,70,40,20,00,90,50,80,10,
>>     50,80,60,90,70,20,00,10,30,40,
>>     80,90,40,50,30,60,20,00,10,70,
>>     90,40,30,80,60,10,70,20,00,50,
>>     20,50,80,10,40,30,60,70,90,00];
>>
>>  char checkDigit(string str) {
>>    char tmpdigit = 0;
>>    foreach(chr; str) tmpdigit = QG10Matrix[(chr - '0') +
>>  tmpdigit];
>>    return (tmpdigit/10) + '0';
>>  }
>
>
>
> OK I changed the approach using a multidimensional array for the matrix so I could ditch arithmetic operations altogether, but curiously after measuring a few thousand runs of both implementations through avgtime, I see no noticeable difference. Why?
>
> import std.stdio;
>
> static immutable ubyte[][] QG10Matrix = [
>   [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
>   [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
>   [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
>   [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
>   [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
> ];
>
> static int charToInt(char chr) {
>   scope(failure) return -1;
>   return cast(int)(chr - '0');
> }
>
> ubyte checkDigit(string str) {
>   ubyte tmpdigit;
>   foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
>   return tmpdigit;
> }
>
> enum {
>   EXIT_SUCCESS = 0,
>   EXIT_FAILURE = 1,
> }
>
> int main(string[] args) {
>   scope(failure) {
>     writeln("Invalid arguments. You must pass a number.");
>     return EXIT_FAILURE;
>   }
>   assert(args.length == 2);
>   ubyte digit = checkDigit(args[1]);
>   if(digit == 0) writefln("%s is a valid number.", args[1]);
>   else {
>     writefln("%s is not a valid number (but it would be, appending digit %s).",
>       args[1], digit);
>   }
>
>   return EXIT_SUCCESS;
> }

The question is, why do you expect it to be noticably faster? On modern hardware it the optimizations are such that so little a change in code is very hard to link to a difference in running time. If you really want to show one the following code does the benchmark the right way (ie: using a very long input, tens of thousands of runs, and avoiding I/O and load time of the program to compare the bare function implementations):

import std.conv;
import std.stdio;
import std.range;
import std.array;
import std.algorithm;

string testcase;

static immutable char[] QG10MatrixOne =
  "03175986427092154863420687135917509834266123045978" ~
  "36742095815869720134894536201794386172052581436790";

char checkDigitOne(string str) {
  char tmpdigit = 0;
  foreach(chr; str) tmpdigit = QG10MatrixOne[(chr - '0') + (tmpdigit - '0') * 10];
  return tmpdigit;
}

void testCheckDigitOne() {
    checkDigitTwo(testcase);
}

static immutable ubyte[][] QG10MatrixTwo = [
  [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
  [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
  [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
  [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
  [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
];

static int charToInt(char chr) {
  scope(failure) return -1;
  return cast(int)(chr - '0');
}

ubyte checkDigitTwo(string str) {
  ubyte tmpdigit;
  foreach(chr; str) {
      tmpdigit = QG10MatrixTwo[tmpdigit][charToInt(chr)];
  }
  return tmpdigit;
}

void testCheckDigitTwo() {
    checkDigitTwo(testcase);
}

void main(string[] args) {
  testcase = iota(10).cycle.take(100000).map!(to!string).array.join;

  import std.datetime: benchmark;
  benchmark!(testCheckDigitOne, testCheckDigitTwo)(10000).each!writeln;
}


Yet on my machine I have the following times (note that the time itself depends on my hardware, what's important is the difference):

TickDuration(15785255852)
TickDuration(15784826803)

So it seems that the first version is slower than the second one, but by so little that it's hard to link to the actual implementation. If anything it shows the futility of such low-level tricks for meaningless optimization. And by the way note that the benchmark avoid measuring IO because we were interested in the functions. But if we were measuring it it would represent 4 fifth of the running time (on my machine, with numbers as long as the one used in the benchmark). This means even a 20% improvement in checkDigit would actually only represent a 4% improvement of the program.

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Nestor
in reply to Daniel Kozak

Nestor

Posted in reply to Daniel Kozak

On Thursday, 9 February 2017 at 20:46:06 UTC, Daniel Kozak wrote:
> Maybe you can try use static array instead of dynamic
> static immutable ubyte[10][10] QG10Matrix = ...

I shaved it to this to discard unneccessary time-consuming functions:

static immutable ubyte[10][10] QG10Matrix = [
  [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
  [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
  [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
  [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
  [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
];

static immutable string number =
  "0123456789012345678901234567890123456789012345678901234567890123456789";

static int charToInt(char chr) {
  return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1;
}

ubyte checkDigit(string str) {
  ubyte tmpdigit;
  foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
  return tmpdigit;
}

void main() {
  auto digit = checkDigit(number);
}

I even tried making checkDigit static, but surprisingly this increased average execution time by 1ms.

Anyway, the previopus version (modified to benefit from a little optimization too) is almost as performant, even though it includes multiplication and sum:

static immutable char[] QG10Matrix =
  "03175986427092154863420687135917509834266123045978" ~
  "36742095815869720134894536201794386172052581436790";

static immutable string number =
  "0123456789012345678901234567890123456789012345678901234567890123456789";

static int charToInt(char chr) {
  return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1;
}

char checkDigit(string str) {
  char tmpdigit = '0';
  foreach(chr; str) tmpdigit =
    QG10Matrix[charToInt(chr) + (charToInt(tmpdigit) * 10)];
  return tmpdigit;
}

void main() {
  auto digit = checkDigit(number);
}

I compiled both with -inline -noboundscheck -release and the multidimensional array version does perform 1ms faster for a couple hundred runs, but I expected the difference to be much more noticeable.

Any idea of what might be happening here?

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Daniel Kozak
in reply to Nestor

Daniel Kozak

Posted in reply to Nestor

Dne 9.2.2017 v 22:29 Nestor via Digitalmars-d-learn napsal(a):

> On Thursday, 9 February 2017 at 20:46:06 UTC, Daniel Kozak wrote:
>> Maybe you can try use static array instead of dynamic
>> static immutable ubyte[10][10] QG10Matrix = ...
>
> I shaved it to this to discard unneccessary time-consuming functions:
>
> static immutable ubyte[10][10] QG10Matrix = [
>   [0,3,1,7,5,9,8,6,4,2],[7,0,9,2,1,5,4,8,6,3],
>   [4,2,0,6,8,7,1,3,5,9],[1,7,5,0,9,8,3,4,2,6],
>   [6,1,2,3,0,4,5,9,7,8],[3,6,7,4,2,0,9,5,8,1],
>   [5,8,6,9,7,2,0,1,3,4],[8,9,4,5,3,6,2,0,1,7],
>   [9,4,3,8,6,1,7,2,0,5],[2,5,8,1,4,3,6,7,9,0],
> ];
>
> static immutable string number =
> "0123456789012345678901234567890123456789012345678901234567890123456789";
>
> static int charToInt(char chr) {
>   return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1;
> }
>
> ubyte checkDigit(string str) {
>   ubyte tmpdigit;
>   foreach(chr; str) tmpdigit = QG10Matrix[tmpdigit][charToInt(chr)];
>   return tmpdigit;
> }
>
> void main() {
>   auto digit = checkDigit(number);
> }
>
> I even tried making checkDigit static, but surprisingly this increased average execution time by 1ms.
>
> Anyway, the previopus version (modified to benefit from a little optimization too) is almost as performant, even though it includes multiplication and sum:
>
> static immutable char[] QG10Matrix =
>   "03175986427092154863420687135917509834266123045978" ~
>   "36742095815869720134894536201794386172052581436790";
>
> static immutable string number =
> "0123456789012345678901234567890123456789012345678901234567890123456789";
>
> static int charToInt(char chr) {
>   return ((chr >= '0') || (chr <= '9')) ? cast(int)(chr - '0') : -1;
> }
>
> char checkDigit(string str) {
>   char tmpdigit = '0';
>   foreach(chr; str) tmpdigit =
>     QG10Matrix[charToInt(chr) + (charToInt(tmpdigit) * 10)];
>   return tmpdigit;
> }
>
> void main() {
>   auto digit = checkDigit(number);
> }
>
> I compiled both with -inline -noboundscheck -release and the multidimensional array version does perform 1ms faster for a couple hundred runs, but I expected the difference to be much more noticeable.
>
> Any idea of what might be happening here?
Did you try it with different backends? llvm (ldc), gcc(gdc)?

February 09, 2017

Re: Can this implementation of Damm algorithm be optimized?

Posted by Nestor
in reply to Daniel Kozak

Nestor

Posted in reply to Daniel Kozak

On Thursday, 9 February 2017 at 21:43:08 UTC, Daniel Kozak wrote:
>> Any idea of what might be happening here?
> Did you try it with different backends? llvm (ldc), gcc(gdc)?

Not really, just standard dmd.

I tried running each algoritm a few times through avgtime using different digit lengths (upto 10,000, my PC won't handle much more) and different amount of repetitions, and the results aren't consistent, some times one algorithm is marginally faster, sometimes the other. Apparently the compiler is capable of optimizing the unidimensional array version.

Thank you all nevertheless for the suggestions.

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation