NIM: abstraction beats “raw board recognition”

NIM is solved by a simple invariant: the nim-sum (bitwise xor) of the pile sizes. This page lets you play against a near-perfect “teacher” (perfect play, except with probability ε it makes a random move) and see the winning/losing structure.

← Research demos

Play NIM

Current piles
nim-sum (xor)
Show the “winning test” (why xor matters)

Let x = a₁ ⊕ a₂ ⊕ … ⊕ a_m, where is bitwise xor. If x = 0, the position is losing for the next player (with perfect play). If x ≠ 0, there is a move that makes the nim-sum zero.

In other words: many “different looking” boards have the same strategic type. This mismatch is one reason a single-frame network that tries to directly associate patterns in the board image with the best move can struggle, compared to a model that can compute or represent the xor-structure.

Controls

Choose a start, then play a move.

Move log

Winning/losing is decided by nim-sum.

# Player Move Piles after nim-sum after

Figure

Single-frame vs two-frame learning behaviour on NIM.

Validation accuracy comparing single-frame and two-frame models on NIM
The one-frame model conditions only on the current board state, and tends to stay near chance because optimal play depends on the global xor invariant (nim-sum). The two-frame model also sees the previous position, making it possible to learn a local “restoration” rule from the observed change.
References
  • S. Riis, Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach, arXiv preprint (local copy: arXiv-2411.06403v2/paper_arxiv3_arxivready_H20k4_UK.tex).
  • B. Zhou & S. Riis, Impartial Games: A Challenge for Reinforcement Learning, forthcoming in Machine Learning.