AlphaHearts Zero: Implementing AlphaZero techniques for Imperfect Information Trick-Taking Games

Author: 
Andrew Wei
Adviser(s): 
Dr. James Glenn
Abstract: 

The AlphaZero technique has seen remarkable success in evaluating perfect information games. However, owing to the difficulty of evaluating information sets rather than states, little work has been done in applying this technique to imperfect information games. In this work, I apply the AlphaZero technique to Hearts, an imperfect information trick-taking game. I create an agent (“AlphaHearts Zero”) capable of playing Hearts at a human level, outclassing baseline agents and approaching the performance of cheating agents. In particular, the agent is capable of being trained tabula rasa without any prior knowledge or preloaded heuristic. AlphaHearts Zero utilizes the Perfect Information Monte Carlo (PIMC) technique to convert in-game information sets into perfect information states, and then applies a version of the Monte Carlo Tree Search (MCTS) algorithm enhanced with a Deep Q-Network (DQN). The DQN neural network is trained by self-play through deep Reinforcement Learning (RL), as per other AlphaZero implementations. Self-play RL is able to steadily increase the performance of AlphaHearts Zero over many iterations. Owing to the fact that the AlphaHearts Zero is trained without any prior knowledge, the methods demonstrated in this paper in Hearts show the viability of both PIMC and AlphaZero in a wide range of imperfect information games. Possibilities for future exploration include a specialized inference system to more accurately convert information sets into perfect information states, a more efficient tree search algorithm, further refinement of the neural network and an improved self-play procedure.

Term: 
Spring 2022