Was man von Pac-Man-spielenden Computern lernen kann
Studie der Universität Augsburg zeigt, welche Informationen Anwendern helfen, die Qualität selbstlernender Algorithmen zu beurteilen
Self-learning computer programmes can do a great deal today: predict the weather, discover tumours in X-rays, play chess better than any human being. How the algorithms draw their conclusions, however, is often not even known by those who programmed them. Researchers at the University of Augsburg and the Israel Institute of Technology (Technion) have now compared two approaches to shed some light on this "black box". The study shows what information helps users to assess the quality of artificial intelligence processes. It was published in the journal Artificial Intelligence, one of the most internationally prestigious specialist journals in AI research. This is a general problem of artificially intelligent (abbreviated to AI) processes - even their programmers usually do not know how they come to their conclusions. Their behaviour is a black box, and the bigger the tasks that we entrust to the programmes, the more this becomes a problem. Who wants to blindly trust a machine in making a life and death decision? And how can one judge which of several algorithms is best suited for a task? An important concern of AI research at the moment is therefore to shed light on this black box. However, this task is anything but trivial and the scientific community has been preoccupied with it for many years. The new study now brings research a great step forward. Tobias Huber, Katharina Weitz and Elisabeth André from the University of Augsburg have teamed up with researcher Ofra Amir from the Israel Institute of Technology (Technion). The problem on which they trained their self-learning process was also a game - not chess, though, but Pac-Man. Pac-Man is a Japanese computer game that began its triumphal march around the world in 1980. "It is one of the most difficult arcade games for an AI," explains Tobias Huber, who is completing his doctorate under Prof. André at the Chair for Human-Centred Artificial Intelligence. The game character has to eat biscuits in a maze and is pursued by ghosts. She gets points for each biscuit; if she is caught, she dies. Similar to chess, the game is therefore ideally suited for a special category of AI algorithms, namely those that learn through reinforcement. "We let our programme play Pac-Man thousands of times in a row," says Huber. “The better the strategy, the more points they score. Based on its previous experience, the algorithm learns over time how it should behave in which situation.” But how can an observer judge the criteria on which the AI’s behaviour is based and how good its decisions are? To assess this, the researchers devised a simple experiment. First, they trained the computer to play Pac-Man, but secretly modified the rules according to which the points were awarded. In one case, for example, the character did not lose any points when it died. The algorithm trained in this way (the researchers also refer to it as an “agent”) was therefore not impressed by any ghosts nearby when making its decisions. For a second agent, they changed the value of the biscuits; a third, on the other hand, played according to the normal rules. "We have now asked test subjects to assess the three agents," Huber explains. “They were not allowed to watch several complete games though, but were only shown brief excerpts.” On this basis, the test subjects were asked to indicate which of the agents they would most likely allow to compete for them in a Pac-Man game. They were also asked to briefly describe the strategy of all three AI processes in their own words. “We wanted to find out whether the test subjects had understood why the algorithm was performing certain actions,” says the computer scientist. To this end, the participants were divided into four groups. Each was allowed to look at five three-second excerpts from the games of the three agents. For the first group, these short clips were chosen at random. For the second group, a kind of “attention map” was also inserted into the random short clips. It showed which influences in its environment the agent was paying particular attention to at that moment. The third group, on the other hand, saw the most “dramatic” game scenes - those in which the agent's decision had a particularly large impact (for example, could lead to the death of the game character or a particularly high points score). In AI research, this is also referred to as a “strategy summary”. In the fourth group, this summary was supplemented by the attention map. The result of the experiment was clear. "Test subjects who saw the summary were most likely to develop a feeling for the strategy of the respective agent as a result of this," explains Huber. “The attention cards, on the other hand, were of significantly less help to them. Even in combination with the summary, they only gave rise to a small additional benefit.” It was, he said, cognitively very demanding to look at the game excerpts and at the same time to watch out for the information from the attention cards. “We assume that their contribution would be greater if the information were better presented.” The researchers now want to investigate how the attention cards could be optimised in such a way that, together with the strategy summary, they make an AI agent’s decisions even more comprehensible.
Email:
tobias.huber@thithi.de ()
At the World Chess Championship in Dubai, Norwegian Magnus Carlsen recently made a sacrifice of a pawn that surprised many observers - also because his move had apparently never been played in a big game before. When computers compete against each other, however, it appears from time to time. Carlsen is known to adjust his game based on the insights gained from these clashes with machines. In an interview he once described the AlphaZero chess algorithm as his idol. AlphaZero is a self-learning programme that has acquired enormous skill levels in millions of duels against itself. But even experts can scarcely figure out why it decides as it does in which situations.Eating biscuits for research
Summary of the "most dramatic" game scenes helps the most
Publication
Scientific Contact
Media contact