Teaching a computer to play the game Go is difficult

Go is a 4000-year old game where each player tries to form borders around other players’ pieces to destroy them. The winner is the person who has captured the most squares on the board when both players pass. The game presents a challenge to artificial intelligence (AI) researchers because of its complexity.

It is difficult to evaluate how good a particular Go move is. Traditional AI game strategies don’t work as well with Go as they do with other games like Chess because traditional AI game strategies rely on an algorithm called “Minimax.” Minimax is a strategy that finds the best outcome for a move you want to make, assuming that the other player is trying to find the outcome that most heavily thwarts you.

Minimax’s shortcoming is that it needs a move evaluator to work well. In Go, however, Minimax doesn’t work well because Go doesn’t have a good way to evaluate moves as a result of its complexity. If Minimax is used along with a method that doesn’t require a good move evaluator, then it works much better.

A mathematical technique called Monte-Carlo is an advantageous strategy for Go. Monte-Carlo (in its simplest form) checks random moves, finds out which moves lead to wins, and picks the best move based on that. In more complicated versions, Monte-Carlo also selects moves by checking its previous knowledge of that move or similar moves, gained from playing experience.

A variation of Monte-Carlo, called Objective Monte-Carlo, has a different move evaluator. Objective Monte-Carlo will pick a move if it gives itself a good chance of improving its position, a method that does not require a good move evaluation function. Depending on how good the move is determined to be, the computer gives that move a score called “Ob”. The AI player then self-adjusts based on if it is winning or losing. In other words, if Ob is high, the AI will explore more risky moves. If it is decreased, it will play safer moves.

The move selection method chooses the next move with increasing chance based on the rating of the move. This strategy, called backpropagation, weights moves based on the performance of the move and the total number of games played with that move. Objective Monte-Carlo, with its specialized backpropagation function, performs better than Minimax paired with Monte-Carlo.

Objective Monte-Carlo is a strategy that even works for games other than Go. It does not require a way to evaluate moves which means that it can be applied to games that are difficult to create a move evaluation function. Because it doesn’t have to explore every possibility, it can be more easily scaled up. This could make it a much better alternative to other AI strategies for other games.

Post Views: 114,079

Study Information