AlphaZero Togyz Kumalak
Published:
I built an AlphaZero AI that plays Togyz Kumalak (a traditional Central Asian board game, often described as Chess meets Mancala). It has been played for centuries in Kazakhstan and Kyrgyzstan, but no one had trained an AlphaZero agent on it before, so I decided to change that!
I “vibe coded” the entire project with Claude Code. The setup includes:
- DeepMind’s AlphaZero algorithm (Self-play + Monte Carlo Tree Search + Neural Network)
- Built with PyTorch and optimized with Numba JIT for fast game simulation (achieving 5x faster self-play).
- A ~1.8M parameter ResNet with both policy and value heads.
- Trained entirely through self-play using zero human game data.
- Squeeze-and-Excitation attention blocks and BF16 mixed precision on H100.
- A full GUI to play against the trained agent.
After thousands of self-play games, the agent independently learned opening strategies, tactical maneuvers like creating a tuzdyk (permanent capture pit), and endgame stone counting. Watching the AI discover human-like tactics all on its own was genuinely exciting.
AI-assisted coding isn’t just about writing boilerplate faster. It let me ship a complete RL research project—game environment, training infrastructure, evaluation tools, GUI—as a solo developer. The barrier between “I have an idea” and “I have a working system” has never been lower.
You can check out the source code and play against the agent using the links below: