How DeepMind is Reinventing the Robot
To train a robot, though, such huge data sets are unavailable. “This is a problem,” notes Hadsell. You can simulate thousands of games of Go in a few minutes, run in parallel on hundreds of CPUs. But if it takes 3 seconds for a robot to pick up a cup, then you can only do it 20 times per minute per robot. What’s more, if your image-recognition system gets the first million images wrong, it might not matter much. But if your bipedal robot falls over the first 1,000 times it tries to walk, then you’ll have a badly dented robot, if not worse.
There are more profound problems. The one that Hadsell is most interested in is that of catastrophic forgetting: When an AI learns a new task, it has an unfortunate tendency to forget all the old ones. “One of our classic examples was training an agent to play Pong,” says Hadsell. You could get it playing so that it would win every game against the computer 20 to zero, she says; but if you perturb the weights just a little bit, such as by training it on Breakout or Pac-Man, “then the performance will—boop!—go off a cliff.” Suddenly it will lose 20 to zero every time.
There are ways around the problem. An obvious one is to simply silo off each skill. Train your neural network on one task, save its network’s weights to its data storage, then train it on a new task, saving those weights elsewhere. Then the system need only recognize the type of challenge at the outset and apply the proper set of weights.
But that strategy is limited. For one thing, it’s not scalable. If you want to build a robot capable of accomplishing many tasks in a broad range of environments, you’d have to train it on every single one of them. And if the environment is unstructured, you won’t even know ahead of time what some of those tasks will be. Another problem is that this strategy doesn’t let the robot transfer the skills that it acquired solving task A over to task B. Such an ability to transfer knowledge is an important hallmark of human learning.
Hadsell’s preferred approach is something called “elastic weight consolidation.” The gist is that, after learning a task, a neural network will assess which of the synapselike connections between the neuronlike nodes are the most important to that task, and it will partially freeze their weights.