Solving a sudoku with deep learning and backtracking.
In this article we are going to see how to recognize a Sudoku from a simple cell phone picture, extracting the grid and solve it.
A follow-up article will cover how to wrap everything in a mobile application.
Detecting the grid:
First we need to some basic preprocessing of the image using the open-cv library for blurring and thresholding:
Then we will use the cv2.findContours() function to detect the biggest contour in the image. From the contours we are then able able to crop the image adjust the grid to fit a square using cv2.getPerspectiveTransform.
We can then separate the grid into equal sized cells representing each digit area by dividing the length and high of the square by 9:
Next, we will need to find a way for those cell areas to be recognized using deep learning.
Training a Neural Network to recognize the digits:
We will use convolutional neural networks trained on synthetic image data (training set) to recognize the cell content of the Sudoku grid. The resulting Model will then be evaluated using real Sudoku cell (validation set).
Generate the training Data:
We want to train our model to recognize digit in a square cell with random noise and residues of the delimitation lines while taking into account that the picture can be distorted.
First we generate a random digit inside a square cell with random background and select a random square area around the center of the cell:
Then we have to account for random noise and image deformation:
This is done with those two functions:
Here is an example of the images generated using the above described technique:
Model Training:
We will use the data generated in the previous section to train a convolutional neural network using Keras with tensorflow back-end.
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as Kx_train, x_test, y_train, y_test = train_test_split(images_array, labels, test_size=0.25)cell_size=45
batch_size = 128
num_classes = len(np.unique(labels))
epochs = 2# input image dimensions
img_rows, img_cols = cell_size, cell_sizeif K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
The Model used here is inspired from the official Keras Website (https://keras.io/examples/cifar10_cnn/) and is composed of 4 convolutional layers, 2 fully connected layers and 2 dropout to avoid over-fitting (http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf):
model = Sequential()
model.add(Conv2D(32, (4, 4), padding='same', input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32,(3, 3)))
model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))model.add(Dense(num_classes))
model.add(Activation('softmax'))
Train the model:
opt = keras.optimizers.RMSprop(learning_rate=0.0001, decay=1e-6)model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=opt,
metrics=['accuracy'])history=model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
For the validation, we will use “real” Sudoku cells extracted from 28 grids (2268 cells in total). The validation data can be found here. The results of training and validation for each batch are shown here:
We can see the accuracy keeps increasing and the loss keeps decreasing but they both reach a plateau. The final accuracy on the validation set is 97.71%.
We can check some of the failed recognitions of our validation set:
We can see that the failed images have different letter size which is something we did not account for in our training data. They also appear to be noisier, we will therefor also add Perlin and Salt and Pepper noise:
import noise # New package needed for perlin noise.def add_perlin_noise(img):
shape = np.shape(img)[0], np.shape(img)[1]
scale=900
octaves=6
persistence=0.5
lacunarity=30
lacunarity=3
noisy = np.zeros(shape)
for i in range(shape[0]):
for j in range(shape[1]):
noisy[i][j] = noise.pnoise2(i/scale,
j/scale,
octaves=octaves,
persistence=persistence,
lacunarity=lacunarity,
repeatx=1024,
repeaty=1024,
base=0)
noisy=noisy*250
img2=img+noisy
img2=np.minimum(255, img2)
img2=np.maximum(0, img2)
return img2, noisydef salt_pepper(image):
row,col = image.shape
s_vs_p = 0.5
amount = 0.3
out = np.copy(image)
# Salt
num_salt = np.ceil(amount * image.size * s_vs_p)
coords = [np.random.randint(0, i - 1, int(num_salt)) for i in image.shape]
out[tuple(coords)] = 1 #Salt num_pepper = np.ceil(amount* image.size * (1. - s_vs_p))
coords = [np.random.randint(0, i - 1, int(num_pepper)) for i in image.shape]
out[tuple(coords)] =0
return out
Now let’s look at the new training data we generated with those changes:
If we retrain the network using the same architecture and same validation set:
We can see that the validation accuracy and loss fluctuate less than in the previous training which indicate that our training set provide a more robust representation of the real data. The model is therefor more stable.
This time we reach an accuracy on the validation set of 99.56%.
Note that we could probably increase the Model performance even further by using transfer learning.
Solving the Grid
Here we are going to use a very simple algorithm called “Backtracking”.
Basically what it does is look for a possible answer to an empty grid cell check if that answer is possible, fill it in and goes back if it reaches a dead end. The function that we will us to check if an answer is possible is the following:
def verify_possile_value(x, y, n, grid):
for i in range(0, 9):
if grid[x, i]==n:
return False
for i in range(0, 9):
if grid[i, y]==n:
return False
x0=(x//3)*3
y0=(y//3)*3
for i in range(0, 3):
for j in range(0, 3):
if grid[x0+i, y0+j]==n:
return False
return True
This function simply checks if the value tested is already present in the same row, columns or 9-cell square. It returns False if the value is already present and True otherwise.
Then we have the backtracking algorithm that will simply use the previous function to check if a values is a possible fit for an empty cell, replace the value and continue. In the case of a dead end, the value is replaced again by 0 and we restart the process.
def grid_solver(grid):
for x in range(9):
for y in range(9):
if grid[x, y]==0:
for n in range(1, 10):
if verify_possile_value(x, y, n, grid):
grid[x, y]=n
result=grid_solver(grid)
if result is not None:
return result
grid[x, y]=0 #Backtracking
return None
return grid
Full code and images available here.