How the Tic Tac Toe Neural Network Works

This neural network learns to play Tic Tac Toe by representing the board as a numerical input and using a simple feedforward network with one hidden layer to decide moves. Here’s how it works:

1. Programming Language

I decided to use the programming language designed by OCR for their computer science exams to make this program.

Why? It seemed fun at the time

Drawbacks

Compared with Python, the ERL is much more limited in scope and flexibility. While Python is a general-purpose language with extensive libraries, dynamic typing, and support for object-oriented, functional, and procedural programming, ERL is primarily procedural and designed for educational purposes. Key limitations include:

Data Structures: ERL relies on fixed-size arrays and manual indexing, lacking Python’s lists, dictionaries, sets, and NumPy arrays for efficient numerical computation.
Mathematical Operations: ERL only supports basic arithmetic and simple random functions, whereas Python can leverage vectorized operations, linear algebra, and GPU-accelerated computation via libraries like NumPy or PyTorch.
Functions & Abstraction: ERL has basic functions without closures, higher-order functions, or classes. Python allows rich abstractions, making code modular, reusable, and easier to maintain.
Libraries & Ecosystem: ERL has no external libraries for machine learning, plotting, or file/database handling. Python has a vast ecosystem that enables fast prototyping, visualization, and advanced AI development.
Performance & Scalability: ERL is interpreted and slow for nested loops and repeated computations, while Python can exploit vectorization and parallelization to handle larger datasets and more complex networks efficiently.

These limitations did mean that I had to implement a lot of common librarys as functions. But, in the end, it did work out; after a few hours of training it became semi compitent.

2. Board Representation

The Tic Tac Toe board is a 1D array of 9 elements:
- 0 = empty square
- 1 = player X
- -1 = player O
Each position in the array corresponds to a cell on the 3x3 board.
Input to the network is the current board state (inputLayer).

3. Network Architecture

Input Layer: 9 neurons (one per board cell)
Hidden Layer: 18 neurons with ReLU activation
Output Layer: 9 neurons representing move probabilities for each board cell
Weights and Biases:
- W1 and b1 connect input → hidden layer
- W2 and b2 connect hidden → output layer
Forward Pass:
1. Compute hiddenLayer = ReLU(inputLayer × W1 + b1)
2. Compute outputLayer = hiddenLayer × W2 + b2
3. Normalize output with a softmax-like function to get move probabilities (AImoveProbs)

4. Move Selection

Uses an epsilon-greedy strategy:
- With probability epsilon (small, e.g., 1%), choose a random legal move (exploration)
- Otherwise, pick the move with the highest probability from the network (exploitation)
This ensures the AI explores new moves and doesn’t get stuck repeating suboptimal moves.

5. Game Play

For each turn:
- Update the inputLayer with the current board
- Perform a forward pass to calculate move probabilities
- Choose and play a move
- Record the move in AImoves for later training
Human vs AI: the human inputs coordinates
AI vs AI: both sides use the network to select moves

6. Training via Backpropagation

After a game, the AI updates weights based on the result:
1. Calculate error at output: dOut = (predictedProb - target) * reward
  - target = 1 for the move actually made
  - reward = 1 for win, 0 for draw
2. Backpropagate error to hidden layer: dHidden = W2 × dOut * ReLU_derivative(hiddenLayer)
3. Update weights:
  - W2 -= learningRate × hiddenLayer × dOut
  - W1 -= learningRate × inputLayer × dHidden
  - Biases updated similarly (b1 and b2)
This allows the network to learn which moves lead to wins and improve over time

7. Persistence

The network saves all weights and biases to a file (TicTacToeAI.txt)
On startup, the network loads previous weights to continue learning from where it left off

8. Iterative Improvement

By playing thousands of games against itself, the AI gradually:
- Avoids losing moves
- Learns winning patterns
- Balances exploration and exploitation
Over time, it becomes nearly unbeatable at Tic Tac Toe

So, how was it?

Tiring. Did it teach me anything? Yes, and, the outcome was pretty cool!