By Tyler Johnson and Isaac Reibman
In the world of game theory, we refer to games such as Catan, Risk, and Civilization 6 as large-scale strategy games. The defining trait of these games is their massive number of components and how they interact. Games often give players the option to compete against the computer. These computer players are called artificial intelligence (AIs). The purpose of these AIs is to give players an equal challenge.
Most AIs follow a given algorithm for determining which move to make. However, because of the scale and complexity of these games, writing a perfect algorithm is incredibly difficult, if not impossible. Thus most of the current AIs have flaws that can be exploited by the player. While a human could adapt his strategy once he realizes it is failing, an algorithmic AI will continue to make the same decisions. This takes away the challenge and defeats the purpose of the AI.
Researchers Charles Madeira, Vincent Corruble, and Geber Ramalho at the University of Pierre and Marie Curie decided to approach the AI problem from a different direction. Reinforcement Learning (RL) is a form of machine learning that is based on the concept of trial and error. After the AI makes a move, it determines whether that move was good or bad. It then adjusts its strategy accordingly and continues. This allows it to learn from its mistakes. The team wanted to see if applying RL to large-scale strategy AIs would allow the computer to make better decisions.
Reinforcement learning works well in games like Backgammon, but a large-scale strategy game is much more complicated. In order to show the difference, the researchers used John Tiller’s Battleground™ for their AI. This is a complex game consisting of a battlefield of hexagons and hundreds of units. The entire game of Backgammon has a total of 1020 separate states – ways the board can be at any given time. By comparison, just a single scenario of Battleground™ has 101887 possible states. To counter the massive scope of this game, the researchers decided to split the learning into stages.
In Battleground™ each player has control over all aspects of his army. This includes everything from the commands of the general to the actions of each individual soldier. Learning to control every aspect of an army is challenging, so the researchers only allowed the learning AI to control part of the hierarchy. A premade AI, called the bootstrap AI, controlled the rest of the army and the opponent. This way, the learning AI only has to study a small part of the decision-making process at any given time.
To make good decisions, the RL AI needs to understand the status of the game, known as the game state. This lets the computer know what to consider when making its decision. It also needs to know which moves it can make, known as the action space. Finally, it uses a “reward function” to know how well it is doing. These three things differ with each game and even among AIs for the same game.
This is the most challenging part of making a RL AI. The AI can’t make smart moves if it doesn’t correctly understand the game state and action space. If the reward function is faulty, the AI won’t be able to tell whether it’s doing well or poorly. Thus it won’t adjust its behavior correctly.
The researchers decided that the positions of units relative to the terrain layout was the most crucial for the AI to understand. This would allow the AI to identify strategic locations, such as places to hide troops and best lines of sight and fire. To test out the AI they used a specific scenario of a battle between French and Russian armies that took place in 1812.
They used two different variations of the AI. One, called neural network (NN) LAI 1, used a single “brain” for all actions. The other, called NN LAI 45, used different “brains” for each task. Both learning AIs controlled the French army while the Bootstrap AI controlled the Russian one. They trained the AI 10,000 times and then they checked the results of its training against random moves, the commercial AI, and a human.
After 500 games, the AIs reached the skill level of an average human player. The single “brain” AI consistently scored in the 50-100 range. The multiple “brain” AI took much longer to learn, but it achieved greater scores on average, between 100-175. An average human player scores around 80, and the commercial AI scores around -380.
Both AIs were able to adapt to the game and perform better than the commercial AI. The single, brained AI maintained a stable score similar to a human player. The multiple brained AI took more risks, so it’s score varied more over the game but in the end performed better. This means that Learning AI may be the next step in large-scale strategy AIs. They could more equally challenge players, creating a more engaging single-player experience.