Solving a sudoku with deep learning and backtracking.

Simon Burgermeister
6 min readMar 5, 2020

--

In this article we are going to see how to recognize a Sudoku from a simple cell phone picture, extracting the grid and solve it.

Steps covered

A follow-up article will cover how to wrap everything in a mobile application.

Detecting the grid:

First we need to some basic preprocessing of the image using the open-cv library for blurring and thresholding:

Gaussian Blur & thresholding with open-cv
prepossessing steps.

Then we will use the cv2.findContours() function to detect the biggest contour in the image. From the contours we are then able able to crop the image adjust the grid to fit a square using cv2.getPerspectiveTransform.

We can then separate the grid into equal sized cells representing each digit area by dividing the length and high of the square by 9:

Digit area delimitation.

Next, we will need to find a way for those cell areas to be recognized using deep learning.

Training a Neural Network to recognize the digits:

We will use convolutional neural networks trained on synthetic image data (training set) to recognize the cell content of the Sudoku grid. The resulting Model will then be evaluated using real Sudoku cell (validation set).

Generate the training Data:

We want to train our model to recognize digit in a square cell with random noise and residues of the delimitation lines while taking into account that the picture can be distorted.

First we generate a random digit inside a square cell with random background and select a random square area around the center of the cell:

Randomly selected areas around the cell’s center.

Then we have to account for random noise and image deformation:

Random noise and deformation

This is done with those two functions:

Here is an example of the images generated using the above described technique:

Samples of generated training images

Model Training:

We will use the data generated in the previous section to train a convolutional neural network using Keras with tensorflow back-end.

from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
x_train, x_test, y_train, y_test = train_test_split(images_array, labels, test_size=0.25)cell_size=45
batch_size = 128
num_classes = len(np.unique(labels))
epochs = 2
# input image dimensions
img_rows, img_cols = cell_size, cell_size
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

The Model used here is inspired from the official Keras Website (https://keras.io/examples/cifar10_cnn/) and is composed of 4 convolutional layers, 2 fully connected layers and 2 dropout to avoid over-fitting (http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf):

model = Sequential()
model.add(Conv2D(32, (4, 4), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32,(3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

Train the model:

opt = keras.optimizers.RMSprop(learning_rate=0.0001, decay=1e-6)model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=opt,
metrics=['accuracy'])
history=model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

For the validation, we will use “real” Sudoku cells extracted from 28 grids (2268 cells in total). The validation data can be found here. The results of training and validation for each batch are shown here:

Training for each epoch.

We can see the accuracy keeps increasing and the loss keeps decreasing but they both reach a plateau. The final accuracy on the validation set is 97.71%.

We can check some of the failed recognitions of our validation set:

Failed recognitions on validation set.

We can see that the failed images have different letter size which is something we did not account for in our training data. They also appear to be noisier, we will therefor also add Perlin and Salt and Pepper noise:

additional Noise
import noise # New package needed for perlin noise.def add_perlin_noise(img):
shape = np.shape(img)[0], np.shape(img)[1]
scale=900
octaves=6

persistence=0.5
lacunarity=30
lacunarity=3
noisy = np.zeros(shape)
for i in range(shape[0]):
for j in range(shape[1]):
noisy[i][j] = noise.pnoise2(i/scale,
j/scale,
octaves=octaves,
persistence=persistence,
lacunarity=lacunarity,
repeatx=1024,
repeaty=1024,
base=0)
noisy=noisy*250
img2=img+noisy
img2=np.minimum(255, img2)
img2=np.maximum(0, img2)
return img2, noisy
def salt_pepper(image):

row,col = image.shape
s_vs_p = 0.5
amount = 0.3
out = np.copy(image)
# Salt
num_salt = np.ceil(amount * image.size * s_vs_p)
coords = [np.random.randint(0, i - 1, int(num_salt)) for i in image.shape]

out[tuple(coords)] = 1
#Salt num_pepper = np.ceil(amount* image.size * (1. - s_vs_p))
coords = [np.random.randint(0, i - 1, int(num_pepper)) for i in image.shape]

out[tuple(coords)] =0
return out

Now let’s look at the new training data we generated with those changes:

New Training data

If we retrain the network using the same architecture and same validation set:

Training Stats

We can see that the validation accuracy and loss fluctuate less than in the previous training which indicate that our training set provide a more robust representation of the real data. The model is therefor more stable.

Confusion Matrix

This time we reach an accuracy on the validation set of 99.56%.

Note that we could probably increase the Model performance even further by using transfer learning.

Solving the Grid

Here we are going to use a very simple algorithm called “Backtracking”.

Basically what it does is look for a possible answer to an empty grid cell check if that answer is possible, fill it in and goes back if it reaches a dead end. The function that we will us to check if an answer is possible is the following:

def verify_possile_value(x, y, n, grid):
for i in range(0, 9):
if grid[x, i]==n:
return False

for i in range(0, 9):
if grid[i, y]==n:
return False
x0=(x//3)*3
y0=(y//3)*3
for i in range(0, 3):
for j in range(0, 3):
if grid[x0+i, y0+j]==n:
return False

return True

This function simply checks if the value tested is already present in the same row, columns or 9-cell square. It returns False if the value is already present and True otherwise.

Then we have the backtracking algorithm that will simply use the previous function to check if a values is a possible fit for an empty cell, replace the value and continue. In the case of a dead end, the value is replaced again by 0 and we restart the process.

def grid_solver(grid):
for x in range(9):
for y in range(9):
if grid[x, y]==0:
for n in range(1, 10):
if verify_possile_value(x, y, n, grid):
grid[x, y]=n
result=grid_solver(grid)

if result is not None:
return result

grid[x, y]=0 #Backtracking
return None
return grid
Result

Full code and images available here.

--

--

No responses yet