Facebook and Intel reign supreme in ‘Doom’ AI deathmatch
This is VizDoom, a contest born from one man’s idea: to improve the state of artificial intelligence by teaching computers the art of fragging. That simple notion then spiraled into a battle between tech giants, universities and coders. Over the past few months, they’ve all been honing their bots (known as “agents”), building up to one final death match.
OK, it was a lot more than one match. But that doesn’t sound nearly as dramatic.
The competition is all about machine visual learning. Just like when you or I play Doom, the agents can make decisions based only on what they “see” and have no access to information within the game’s code.
There were two “tracks” for agents to compete on, offering very different challenges. Track 1 featured a map known to the teams, and rocket launchers were the only weapons. The agents started with a weapon but were able to collect ammo and health kits.
Track 2 was a far harder challenge. It featured three maps, unknown to teams, and a full array of weapons and items. While Track 1 agents could learn by repeating a map over and over, agents competing in Track 2 needed more general AI capabilities to navigate their unknown environments. Both maps were played for a total of two hours, with Track 1 consisting of 12 10-minute matches and Track 2 consisting of three sets of four 10-minute matches (one for each map).
As you might have expected, the winners for both categories came from the private sector. The agent F1, programmed by Facebook AI researchers Yuxin Wu and Yuandong Tian, won Track 1 overall, besting its opponents in 10 of 12 rounds. For Track 2, IntelAct, programmed by Intel Labs researchers Alexey Dosovitskiy and Vladlen Koltun, put in a similarly dominating performance, taking the victory and winning 10 of 12 rounds. But while Intel and Facebook may have won the overall prizes, there were other impressive performances. Three standout bots — Arnold, Clyde and Tuho — came from students.
Arnold
Arnold is the product of Devendra Singh Chaplot and Guillaume Lample, two master’s students from Carnegie Mellon University’s School of Computer Science. Their team, The Terminators, competed on Tracks 1 and 2, and saw success on both. In fact, Arnold was the only agent outside Facebook and Intel to win rounds. On Track 1, each bot had to skip one round, and F1’s departure gave round three to Arnold. In round six, though, Arnold won outright, besting F1 by two frags. The result never looked in doubt, though, and Arnold ended in second place, 146 frags behind F1.
Track 2 was where things got interesting. Arnold was competitive in the first map, but IntelAct already had a 19-frag lead heading into map two. On the second map, however, Arnold suddenly came alive. It won the first two rounds, closing the gap to just 11 frags at one point and ending the map 15 behind. But it wasn’t to be. IntelAct excelled at the final map, scoring 130 frags in just four rounds and destroying the plucky underdog’s hopes of pulling off an upset. Arnold lost the overall count 256 to 164, again ending in second place.
Behind the scenes, though, all the work was done as long as several months ago. Arnold is one of the more ambitious efforts in the VizDoom competition, combining multiple techniques. It’s actually the result of two distinct networks. The first is a deep Q-network (DQN), a technique Google DeepMind pioneered to master 49 Atari 2600 games. The second is a deep recurrent Q-network (DRQN). It’s similar to a DQN but it processes information in a directed cycle and uses its internal memory of what’s come before deciding what to do next. Arnold’s DRQN has been augmented to help the agent detect when an enemy is visible in the frame.
In a death match, Arnold can be in one of two states: navigation (exploring the map to pick objects and find enemies) or action (combat with enemies), with separate neural networks handling each. The DQN is for navigation. It’s responsible for moving the agent around the level when nothing much is happening, hunting down items and other players. As soon as an enemy shows up on the screen, however, it hands control to the DRQN, which sets about shooting things. Combining these two methods, which can be trained in parallel independently, is the key to Arnold’s success.
But Arnold’s creators aren’t interested in pursuing an unbeatable Doom agent. Instead, they saw VizDoom as a nice application to test their ideas on reinforcement learning. Speaking by phone, Chaplot explained that the networks deployed in Arnold can be applied to robotics in the real world. Navigation and self-localization are a real challenge for machines, and the team is now focused on solving those issues. They’ve published their initial findings from Arnold and VizDoom, and are using what they’ve learned to try to create better robots.
Clyde
Clyde was created by Dino Ratcliffe, a Ph.D. candidate at the University of Essex in the Intelligent Games and Game Intelligence program. A one-person effort, the AI competed on Track 1 only. Though Clyde never won a round, it was extremely competitive throughout, besting Arnold in five rounds and, in one match, losing to F1 by only one frag. It ended the competition in third place with 393 frags, putting it 20 behind Arnold and 166 behind F1.
It could have gone so differently for Clyde. Ratcliffe began development in order to understand “what the state of the art in general video-game-playing” was for AI right now. He used asynchronous advantage actor-critic (A3C), an advancement in the DQN method that uses multiple neural networks learning in parallel to update a global network.
Ratcliffe told me he took a hands-off approach to training, preferring the agent to learn by itself what enemies are, what death is, what health packs are and so on. “I think it’s dangerous to start encoding your own domain knowledge into these agents as it inhibits their ability to generalize across games,” he explained. “I simply gave it a reward for killing opponents and increasing its health, ammo or armor.”
But a catastrophic failure — Ratcliffe’s PC power supply blew up 24 hours before the competition deadline — caused Clyde to complete only around 40 percent of its training regimen. That meant that it had learned from 30 million frames rather than the necessary 80 million. The biggest downside of this incomplete training, Ratcliffe explains, is that the agent still occasionally commits suicide. It’s for this reason that Clyde got his moniker — it’s named for the weakest ghost in Pac-Man, which, rather than pursuing or holding position, just moves around at random.
Clyde learned a simple form of spawn camping
The fully trained Clyde, which wasn’t submitted, is far stronger. Ratcliffe said he’s observed Clyde using a simple form of “spawn camping,” a much-maligned tactic in multiplayer shooters in which you wait at strategic points on a map and kill players as they spawn in. “It notices certain corridors that have spawn points close by and shoots more,” he explained. This behavior is apparently in the competition version of Clyde, but not as noticeable.
Before the results were published, Ratcliffe said he didn’t think Clyde would be competitive, so a third-place rosette was definitely above expectations. Ratcliffe has already moved on to a new project: 2D platformers. “I had only started looking into deep reinforcement learning around one week before the competition was announced,” he said. “I pretty much had to learn the whole field in the process of competing, and that was the point of me taking part. So I now have a solid foundation to start my own research this year.” While other agents have mastered 2D platformers, he wants to teach one to learn Mario and then try to apply that learning set to other games with minimal retraining.
Tuho
The final prize-winning spot was taken by Anssi “Miffyli” Kanervisto, a master of science student at the University of Eastern Finland’s School of Computing. His agent, Tuho, (Finnish for “doom”) is a one-person effort, created with oversight by Ville Hautamäki, Ph.D., from the same university.
Some of Tuho’s best performances came on Track 1, where it managed to finish in second place, behind F1, in three rounds. It ultimately placed fourth, just outside the prize rankings. On Track 2, it didn’t get close to challenging F1 or Arnold. It put in a solid performance, though, on the first and last map, which was enough to balance out a disastrous showing on the second map. Tuho ended up in third place with 51 frags. That’s despite spending the four middle rounds killing itself more than others.
Kanervisto built a complex agent in Tuho, with a navigation system based on multiple techniques. The most important aspect is a dueling DQN — two networks using different methodologies to provide a better end result. Tuho’s shooting and firing system is largely based on image recognition, matching potential enemies against a manually recorded library of images.
It was trained to prioritize movement speed in order to get it running in straight lines, and the result, Kanervisto says, is a “well-behaving model that was able to move around and not get stuck, although it struggled with doorways.” But the entire training regimen took place on his personal computer with an Ivy Bridge i7 processor and GTX 760 graphics card. You typically need a very powerful computer, or better yet several, to train an AI at a reasonable speed. Because of this, he was limited in the size of the network and input image size.
Everyone’s a winner
It may be a mostly false cliché, but at least with VizDoom, it feels like everyone here is a winner. Arnold’s creators will receive €300 for their agent’s performance on Track 1, and €1,000 for Track 2, leaving them with around $1,450 to share. Ratcliffe earned €200 ($222) for Clyde’s third place. Tuho bagged Kanervisto €500 ($558) for its exploits.
Some are going home with prizes, but all the teams I’ve spoken to have gained a lot from their experience. Take Oliver Dressler, and his agent, Abyss II. Dressler is a Ph.D. candidate in microfluidics (bioengineering) at ETH in Switzerland and had no previous experience in AI. I asked him what he’d learned from participating in VizDoom. “Literally all my machine-learning knowledge” was the answer.
Dressler based Abyss II on the A3C algorithm, and had to learn everything as he went along. This led to some big mistakes, but lots of gained knowledge. One such lesson came in training. “Shooting is required to win,” he explained, “but shooting at the wrong moment (which is nearly every moment) will result in suicide.” The map was full of small corridors, and any explosion nearby will kill the agent. Just overcoming that is a challenge in itself.
Abyss II placed seventh on Track 1, but from speaking to Dressler before the contest, it was apparent he would be happy regardless of the result. “Given the short time frame, I really don’t expect my bot to perform particularly well, but it has been an amazing challenge,” he added. “It has even paid off more than I expected, and I can use this knowledge very well in my current work.”
VizDoom will also have knock-on effects. Google DeepMind and other leaders in machine learning, despite not formally entering the competition, will also have learned a few things. Doom is a highly complex title, and various DQN, DRQN and A3C-based agents have performed to great success.
I don’t know what methods Facebook and Intel employees used to win the top prizes in their categories, but it’s likely we’ll see papers published from them soon. Regardless, as is often the case with AI, the innovative techniques used to win VizDoom will serve to strengthen every researcher’s knowledge of vision-based machine learning.
(18)