COINS: Backwards Induction Through Reinforcement Learning
Contributors:  Tom Busey: Author
Rupali Parab: Programming
Dean Wyatte: Programming



Coins is a relatively simple game where fifteen coins are placed in a pile and two players take turns removing either one, two, or three coins from the pile. The winner is the player who removes the last coin from the pile. It is a game of strategy, and a perfect strategy exists where the player who goes first can always win.

This game shows how reinforement learning can perform backward inductive reasoning. The process of backward induction proceeds by first looking at the last possible action, then determining what the last player will do in each situation. Using this information, one can then determine what the second to last player will do. This process continues until one determines all possible actions.

Try playing and see if you can figure out the perfect strategy. You can play against four different kinds of computers. There is a random computer, which will just make random moves and is pretty easy to beat, a novice computer, which makes random moves unless it sees the opportunity to win, a perfect computer, which will always win if you do not play with the perfect strategy, and a learning computer, which is by far the most interesting. The learning computer starts out by essentially making random moves, but as it plays more and more it learns the perfect strategy. You might not want to play the learning computer enough to see it improve, it takes about 400 games, so instead you can pit it against one of the other computer players and let it play a few hundred games to get some experience. Try playing the learning computer for a few games and then let it get some experience against the novice or perfect computer for a few hundred games. After it gains some experience it might be good enough to win every time if it goes first, or play just as well as the perfect computer if it goes second.



This module was created by the Indiana University Cognitive Science Program and is also available on their website: IU Cognitive Science Software.


This module was supported by National Science Foundation Grant #0127561.