How the Tic Tac Toe Neural Network Works
This neural network learns to play Tic Tac Toe by representing the board as a numerical input and using a simple feedforward network with one hidden layer to decide moves. Here’s how it works:
1. Programming Language
I decided to use the programming language designed by OCR for their computer science exams to make this program.
Why? It seemed fun at the time
Drawbacks
Compared with Python, the ERL is much more limited in scope and flexibility. While Python is a general-purpose language with extensive libraries, dynamic typing, and support for object-oriented, functional, and procedural programming, ERL is primarily procedural and designed for educational purposes. Key limitations include:
- Data Structures: ERL relies on fixed-size arrays and manual indexing, lacking Python’s lists, dictionaries, sets, and NumPy arrays for efficient numerical computation.
- Mathematical Operations: ERL only supports basic arithmetic and simple random functions, whereas Python can leverage vectorized operations, linear algebra, and GPU-accelerated computation via libraries like NumPy or PyTorch.
- Functions & Abstraction: ERL has basic functions without closures, higher-order functions, or classes. Python allows rich abstractions, making code modular, reusable, and easier to maintain.
- Libraries & Ecosystem: ERL has no external libraries for machine learning, plotting, or file/database handling. Python has a vast ecosystem that enables fast prototyping, visualization, and advanced AI development.
- Performance & Scalability: ERL is interpreted and slow for nested loops and repeated computations, while Python can exploit vectorization and parallelization to handle larger datasets and more complex networks efficiently.
These limitations did mean that I had to implement a lot of common librarys as functions. But, in the end, it did work out; after a few hours of training it became semi compitent.
2. Board Representation
- The Tic Tac Toe board is a 1D array of 9 elements:
0= empty square1= player X-1= player O
- Each position in the array corresponds to a cell on the 3x3 board.
- Input to the network is the current board state (
inputLayer).
3. Network Architecture
-
Input Layer: 9 neurons (one per board cell)
-
Hidden Layer: 18 neurons with ReLU activation
-
Output Layer: 9 neurons representing move probabilities for each board cell
-
Weights and Biases:
W1andb1connect input → hidden layerW2andb2connect hidden → output layer
-
Forward Pass:
- Compute
hiddenLayer = ReLU(inputLayer × W1 + b1) - Compute
outputLayer = hiddenLayer × W2 + b2 - Normalize output with a softmax-like function to get move probabilities (
AImoveProbs)
- Compute
4. Move Selection
- Uses an epsilon-greedy strategy:
- With probability
epsilon(small, e.g., 1%), choose a random legal move (exploration) - Otherwise, pick the move with the highest probability from the network (exploitation)
- With probability
- This ensures the AI explores new moves and doesn’t get stuck repeating suboptimal moves.
5. Game Play
-
For each turn:
- Update the
inputLayerwith the current board - Perform a forward pass to calculate move probabilities
- Choose and play a move
- Record the move in
AImovesfor later training
- Update the
-
Human vs AI: the human inputs coordinates
-
AI vs AI: both sides use the network to select moves
6. Training via Backpropagation
-
After a game, the AI updates weights based on the result:
- Calculate error at output:
dOut = (predictedProb - target) * rewardtarget = 1for the move actually madereward = 1for win,0for draw
- Backpropagate error to hidden layer:
dHidden = W2 × dOut * ReLU_derivative(hiddenLayer) - Update weights:
W2 -= learningRate × hiddenLayer × dOutW1 -= learningRate × inputLayer × dHidden- Biases updated similarly (
b1andb2)
- Calculate error at output:
-
This allows the network to learn which moves lead to wins and improve over time
7. Persistence
- The network saves all weights and biases to a file (
TicTacToeAI.txt) - On startup, the network loads previous weights to continue learning from where it left off
8. Iterative Improvement
- By playing thousands of games against itself, the AI gradually:
- Avoids losing moves
- Learns winning patterns
- Balances exploration and exploitation
- Over time, it becomes nearly unbeatable at Tic Tac Toe
So, how was it?
Tiring. Did it teach me anything? Yes, and, the outcome was pretty cool!