Google DeepMind’s latest chess AI uses a language model architecture, plays at a high level, and shows that transformers can be more than just stochastic parrots.
Researchers at Google DeepMind have developed an AI model that plays chess at a grandmaster level without relying on the complex search algorithms or heuristics that have characterized powerful chess programs such as Stockfish 16, IBM’s Deep Blue, or Deepmind’s AlphaZero.
Instead, the DeepMind team trained a 270-million-parameter transformation model using chess games. Traditional chess computers use sophisticated algorithms and search strategies to find the best move, while the DeepMind model is based solely on predicting action-values.
Google Deepmind’s Transformer model learns from Stockfish
First, the team collected 10 million chess games and assigned each board a state-value indicating the probability of winning according to Stockfish 16. They then calculated all the legal moves for each game and scored them as well via assigning action-values, resulting in a large dataset of 15 billion data points. The Transformer network then learned to predict these game values by training it on this dataset using supervised learning. The network was optimized so that the predicted values were as close as possible to the values provided by Stockfish. In essence, the team distilled Stockfish’s capabilities into a chess policy for the Transformer model.
In tests, the model achieved an Elo rating of 2895 in rapid chess games against human players, putting it at the grandmaster level and enabling it to solve several challenging chess problems. This means that the Transformer network also outperforms AlphaZero when used without the MCTS (Monte Carlo Tree Search) search strategy.
However, the model also has limitations: it cannot store the course of the game, and it cannot plan based on the history of the game. It also performs worse against chess computers than against humans, especially in situations where humans normally give up, while chess computers play the game to the end despite having few chances. However, the team believes that these problems can be solved.
Chess skills as an argument against parrots
This research is not only relevant to chess but also offers insights into the potential of Transformer architecture in other domains. The team explicitly refers to the narrative of large language models as “stochastic parrots”: “Our work thus adds to a rapidly growing body of literature showing that complex and sophisticated algorithms can be distilled into feed-forward transformers, implying a paradigm-shift away from viewing large transformers as “mere” statistical pattern recognizers to viewing them as a powerful technique for general algorithm approximation.”
Other projects such as OthelloGPT have already shown that transformers can be more than just statistical pattern recognizers.