It is a logic puzzle with fairly simple rules. There is a 9×9 grid, broken down into nine 3×3 blocks. The aim is to layout the digits 0 to 9 such that they follow some simple constraints – no row, column or block may have a repeated digit.
You can find far a better description at the Sudoku home page.
Okay, so a new type of puzzle? I’m up for that. Earlier today, I wrote about my approach to tackling such puzzles.
Sudoku turns out to be a similar type of puzzle to the unfortunately-named Logic Problems class of puzzles – you know, the type with clues like “The butcher’s wife plays cribbage with the redhead on Thursdays.” I have seen this type of problem solved by some software written by a colleague in 1992 as he experimented with the Prolog programming language.
As I read the hype about the puzzle, I thought the puzzle would probably fall down to a similar method – which may, or may not, end up being brute-force – I could never be sure with Prolog software.
Then something in the article caught my eye. Wayne Gould, a sudoku expert and owner of the Sudoku home page, chided a beginner:
“So you tried trial and error, then? That’s not the way to go. One golden rule of sudoku is never guess.”
Never guess? That’s an excellent sign from the perspective of solving it automatically. However, a terrible thought occurred to me. Could it be that this entire puzzle, despite all of the hype, is merely using simple elimination, with no higher thought processes? Could the whole thing be that straight-forward?
Let me give an example.
Suppose we have five grid cells in a row A, B, C, D and E which must be assigned unique values 1, 2, 3, 4 or 5.
Suppose we already know some constraints:
A is NOT 3, 4 or 5
B is NOT 3, 4 or 5
C is NOT 5
D is NOT 1 or 5
E is NOT 1
Now, we know that one of the cells must have the value 5, and only E can have the value 5, so by elimination on E must equal 5.
It takes a little bit more thought to see that A and B both can only be 1 or 2, so therefore no other cell can have the value 1 or 2 – that would mean that there wasn’t room for both A and B.
We can conclude that:
A is NOT 3, 4 or 5
B is NOT 3, 4 or 5
C is NOT
1, 2 or 5
D is NOT 1
, 2or 5
You could draw this same conclusion by backtracking; guessing that D = 2 would eventually lead to the same conclusion. However recognising this pattern generally is a bit harder than simple elimination.
For all I knew, the general Sudoku problem could require brute-force to solve (i.e. it could be NP-complete). Alternatively, they could require more sophisticated logic than simply ticking off items until a conclusion could be drawn.
However, it could be that the subset of puzzles that actually get printed in the newspaper might not be so hard. The hype just seemed too much; perhaps the printed puzzles were actually trivial.
The weekend SMH ones are supposed to be the hardest ones they publish. If my theory was right, even the hardest published ones would fall easily – not to brute-force guessing. but to methodical elimination.
So away I went.
I wrote 150 lines of Python.
The design consists of three classes, Cells, Neighbourhoods (representing rows, columns and blocks) and the Big Grid.
The Big Grid is just a container which creates the cells and neighbourhoods, and helps them link together, so each cell knows of its three neighbourhoods, and each neighborhood knows of its nine cells.
Each cell is responsible for tracking what values are still legal for that cell. If it every works out by eliminiation what the correct value was for that cell, it notifies its three neighbourhoods of the happy news.
Each neighbourhood is responsible for two tasks – (i) informing the unsolved cells of values that are no longer legal (because they were used by others cells in the neighbourhood) and (ii) watching out for the case that only one cell in the neighbourhood could have a particular value, meaning it could be resolved by elimination.
Notifications go back and forth between the cells informing their neighbourhoods of news, who in turn notify the neighbouring cells, who themselves draw further conclusions and call back to the neighbourhoods.
The system was very recursive – the recursion tails out when a cell is informed of a fact of which it is already aware. Cells also unsubscribe to notifications once they are resolved.
An optimisation would be to push all notifications on a stack and then iterate through them rather than recursing. With this technique, it would be possible to prioritise notifications (“You are a 6! You can stop listening now.” is a more important message than “You are not a 3.”), to consolidate them (“Here is a bunch of messages at once for this cell.”), and to dismiss them faster (“This cell is no longer subscribing – discard the message.”)
However, the speed of the application was not slow, and these optimisations were not required.
So that was my experimental method – what were the results?
An empty grid has 91 cells. A total of 31 clues were provided.
After entering 29 of the clues, the software had still failed to fill in additional cells. After entering 30 of them, it was able to work out only a few of the cells.
After entering the final clue, it worked out a few more – a total of 12 cells of a possible 50.
My hypothesis was wrong: Sudoku is not a trivial puzzle, relying only on elimination.
Only after getting this far, did I let myself search the web to see what other Sudoku solvers are out there.
The first I looked at was Sudoku Solver (I didn’t install it. Install at own risk.)
The techniques Sudoku Solver uses are not surprising. The manual states:
The system will then attempt to solve the puzzle deterministically. For most puzzles this will yield the solution. For some puzzles the deterministic capability produces no further progress, either because the puzzle has more than one solution or because it is very subtle. At this point the system does a structured exploration of solutions, looking for inconsistencies until the solution is found.
A far more interesting site is the similarly named Sudoku Solver by logic.
While this solver, too, will resort to backtracking (They call it “guess-and-check”.) where necessary, the authors seem to have a similar aversion to it. Their web-site explains:
There’s an argument as to whether [guess-and-check] is ‘logic’, so we have made it an option using a tickbox.
Even with backtracking turned off, their software quickly solved the puzzle in the SMH that had thwarted my software. The solution even showed that my half-finished initial practice attempt at solving the same Sudoku manually had a mistake in it.
The authors take the time to explain their methods. They seem to have neglected to mention the most obvious method which they have clearly implemented. (i.e If a cell has eight unique neighbours, it must be the only unused number.) Using their terminology, I only implemented that obvious method, plus Method A. My example above, demonstrating more sophisticated logic would have been cracked by their Method C.
They also admit that they haven’t completely cracked it, and offer you the chance to help – take a stab at one of the solutions they can’t solve automatically, describe how you solved it, and they will try to implement that method.
An exciting idea, until a horrible thought occurs – what if the problem is NP-complete?
Sure enough, the Wikipedia article on Sudoku confirms the bad news: Sudoku is NP-complete. Elegant logic may speed up the processing a little, but when it all boils down to it, you are going to have to try out values one by one until you find one that works.
Wayne Gould, you’re wrong! The golden rule of sudoku is, sometimes, guessing is the only way forward.