August 16, 2012
I've now ran in something odd. Using a slight variant from my program on dpaste (can't upload modified version atm but changes are minimal) it takes 0.6 seconds to solve the hard puzzle the python version took 180 seconds for. Yet on the last puzzle, the impossible one, python takes 1800 seconds to figure out it's impossible yet mine takes over 3885 seconds. Where is this slowdown coming from?


August 16, 2012
On 08/16/2012 09:52 PM, maarten van damme wrote:
> I've now ran in something odd. Using a slight variant from my program on
> dpaste (can't upload modified version atm but changes are minimal) it
> takes 0.6 seconds to solve the hard puzzle the python version took 180
> seconds for.

This is because the python version happened to choose an unlucky search
order. Sudoku is NP-complete after all, so it is not surprising that
programs have a bad worst case run time.

> Yet on the last puzzle, the impossible one, python takes
> 1800 seconds to figure out it's impossible yet mine takes over 3885
> seconds. Where is this slowdown coming from?
>

This is because your specific solution is slow.

Mine takes <20ms on the 'hard' puzzle and ~13s on the impossible one.
(~2.4 GHZ intel x86_64 machine, with gdmd -O -release -inline -noboundscheck.)
There is a constant factor between those two solutions.
August 16, 2012
> This is because your specific solution is slow.
>
> Mine takes <20ms on the 'hard' puzzle and ~13s on the impossible one. (~2.4 GHZ intel x86_64 machine, with gdmd -O -release -inline
-noboundscheck.)
> There is a constant factor between those two solutions.

I've compiled it using dmd on my latitude e5500 which is not that fast so I don't think it's that slow...

Besides, lets say mine is five times slower, 3000 seconds is still waaay to much. When I'm back able to get my laptop to use my crapy data connection I'll compare.

Do you have some optimization ideas by the way?


August 16, 2012
On 08/16/2012 11:51 PM, maarten van damme wrote:
>
>  > This is because your specific solution is slow.
>  >
>  > Mine takes <20ms on the 'hard' puzzle and ~13s on the impossible one.
>  > (~2.4 GHZ intel x86_64 machine, with gdmd -O -release -inline
> -noboundscheck.)
>  > There is a constant factor between those two solutions.
>
> I've compiled it using dmd on my latitude e5500 which is not that fast
> so I don't think it's that slow...
>

Compiled and run in the same environment, your solution takes 0.26s on
the 'hard' one, whereas mine takes 0.0013s. Your solution takes ~1600s
on the impossible one whereas mine takes ~13s.

> Besides, lets say mine is five times slower,

Hard facts say that it is at around 100x-200x slower.

> 3000 seconds is still waaay  to much.

Sounds reasonable to me.

> When I'm back able to get my laptop to use my crapy data
> connection I'll compare.
>
> Do you have some optimization ideas by the way?
>

First of all, getting rid of the dynamic allocations is low hanging fruit.

Then you'll have to reduce the number of times you recompute the same
information. Update/restore the possibilities array as you
update/restore the solution attempts. Do this for the whole board at
once and use a compact representation of possibilities.
August 16, 2012
great, going to play around with it tomorrow.
Caching the possibilities is going to look really ugly but you're
right, it's going to give quiet a boost in performance.

I'm also going to format your source code a bit more and see if I can follow it's logic as it seems to be way more efficient. (although compilation is failing here, I'm running dmd 2.059 and it gives "can't stride on chunks!(short))

would using short or bytes increase performance compared to integers? if so, why did you use shorts instead of bytes?

2012/8/17, Timon Gehr <timon.gehr@gmx.ch>:
> On 08/16/2012 11:51 PM, maarten van damme wrote:
>>
>>  > This is because your specific solution is slow.
>>  >
>>  > Mine takes <20ms on the 'hard' puzzle and ~13s on the impossible one.
>>  > (~2.4 GHZ intel x86_64 machine, with gdmd -O -release -inline
>> -noboundscheck.)
>>  > There is a constant factor between those two solutions.
>>
>> I've compiled it using dmd on my latitude e5500 which is not that fast so I don't think it's that slow...
>>
>
> Compiled and run in the same environment, your solution takes 0.26s on the 'hard' one, whereas mine takes 0.0013s. Your solution takes ~1600s on the impossible one whereas mine takes ~13s.
>
>> Besides, lets say mine is five times slower,
>
> Hard facts say that it is at around 100x-200x slower.
>
>> 3000 seconds is still waaay  to much.
>
> Sounds reasonable to me.
>
>> When I'm back able to get my laptop to use my crapy data connection I'll compare.
>>
>> Do you have some optimization ideas by the way?
>>
>
> First of all, getting rid of the dynamic allocations is low hanging fruit.
>
> Then you'll have to reduce the number of times you recompute the same information. Update/restore the possibilities array as you update/restore the solution attempts. Do this for the whole board at once and use a compact representation of possibilities.
>
August 17, 2012
On Thursday, 16 August 2012 at 23:13:56 UTC, maarten van damme wrote:
> great, going to play around with it tomorrow.
> Caching the possibilities is going to look really ugly but you're
> right, it's going to give quiet a boost in performance.
>
> I'm also going to format your source code a bit more and see if I can
> follow it's logic as it seems to be way more efficient. (although
> compilation is failing here, I'm running dmd 2.059 and it gives "can't
> stride on chunks!(short))
>
> would using short or bytes increase performance compared to integers?
> if so, why did you use shorts instead of bytes?

Gonna chime in a bit here:

There's a lot of factors at play when deciding to use shorts vs bytes vs native-sized ints. The best way to decide is to time all of them and see which works best overall.

With caching on a larger problem, I'd guess that the smaller you go, the better. The reason is that you run the risk of your data getting large enough that it can't all fit in the L2 cache and waiting for information to come from memory takes forever (comparatively speaking).

Also, I just looked at your solution (not Mr. Gehr's solution), but it looks like you could approach this much more efficiently.

It seems like you're storing a lot of information in arrays of ints. At least some of that could be stored in a bitmap/bitset (http://en.wikipedia.org/wiki/Bit_array) and give you significantly better memory efficiency. Array!bool in std.container will actually do the correct thing and store things like a bitset, so you don't necessarily have to implement your own (however, I think that it stores it in an int when you could use a short to hold 1-9 ... but it's not enough to really worry about).
August 17, 2012
On 08/17/2012 01:13 AM, maarten van damme wrote:
> great, going to play around with it tomorrow.
> Caching the possibilities is going to look really ugly but you're
> right, it's going to give quiet a boost in performance.
>
> I'm also going to format your source code a bit more and see if I can
> follow it's logic as it seems to be way more efficient. (although
> compilation is failing here, I'm running dmd 2.059 and it gives "can't
> stride on chunks!(short))

Works for me both with DMD 2.060 and GDC with the 2.059 front end.

>
> would using short or bytes increase performance compared to integers?

Using less memory sometimes increases performance, but here it is not significant.

> if so, why did you use shorts instead of bytes?

Because I need 9 bits per entry to keep track of all the combinations
of possibilities.
August 19, 2012
Great, i tried to get rid of the dynamic array "possibilities" by representing it using a static array of bools. This approach made it 4 times faster :)

When i have a solid wifi connection I'm going to install dmd 2.60 to compile timon's code. In the meantime I've started beautifying the source so i can follow the logic.

I still have a few questions however:
- era claims his code takes 12 seconds on the hardest supplied puzzle yet
it enters an infinite loop when the puzzle isnt solvable. Is he talking
about a different puzzle?

-is it normal that const ref is slower then ref?

- in an effort of trying to make the program reuse the same memory I've moved some temporary variables outside the function but the program became slower, how can this be?

- in a few hours i'll upload my newest source. I cant find that much stupid design decisions anymore that cause slowdowns yet it keeps lagging behind by an enormous amount to timon's solver. What's that much more efficient in his solution?


August 19, 2012
my code is located at http://dpaste.dzfl.pl/93cd5f45

2012/8/19, maarten van damme <maartenvd1994@gmail.com>:
> Great, i tried to get rid of the dynamic array "possibilities" by representing it using a static array of bools. This approach made it 4 times faster :)
>
> When i have a solid wifi connection I'm going to install dmd 2.60 to compile timon's code. In the meantime I've started beautifying the source so i can follow the logic.
>
> I still have a few questions however:
> - era claims his code takes 12 seconds on the hardest supplied puzzle yet
> it enters an infinite loop when the puzzle isnt solvable. Is he talking
> about a different puzzle?
>
> -is it normal that const ref is slower then ref?
>
> - in an effort of trying to make the program reuse the same memory I've moved some temporary variables outside the function but the program became slower, how can this be?
>
> - in a few hours i'll upload my newest source. I cant find that much stupid design decisions anymore that cause slowdowns yet it keeps lagging behind by an enormous amount to timon's solver. What's that much more efficient in his solution?
>
August 19, 2012
On Sunday, 19 August 2012 at 09:39:53 UTC, maarten van damme wrote:
> Great, I tried to get rid of the dynamic array "possibilities" by representing it using a static array of bools. This approach made it 4 times faster :)
>
> When I have a solid wifi connection I'm going to install dmd 2.60 to compile timon's code. In the meantime I've started beautifying the source so I can follow the logic.
>
> I still have a few questions however:
> - era claims his code takes 12 seconds on the hardest supplied puzzle yet it enters an infinite loop when the puzzle isn't solvable. Is he talking about a different puzzle?

The one supplied: .....6....59.....82....8....45........3........6..3.54...325..6..................

 The infinite loop is likely the brute force actually brute forcing. I'm sure most other programs will lock up too until they can prove it won't work. I can re-check my code, but if it can't solve it it should tell you as it throws an exception. On ones filled with a lot more numbers it will take a lot less time.

> -is it normal that const ref is slower then ref?

 Ummm... No? I don't see why it would be.