Self-Driving Twizy Cars Learning by Trial and Error

Meet the Renault Twizy. Two entrepreneurs from Cambridge University are using a simple electric vehicle to train an algorithm that can navigate a simple course with only one day of training⎯via trial and error.

The pair of AI Ph.D.s from Cambridge University are going all in on machine learning as the foundation of autonomous cars. Their company, Wayve, has just released video of a Renault Twizy teaching itself to follow a lane over the course of a mere 20 minutes.

The missing piece of the self-driving puzzle is intelligent algorithms, not more sensors, rules and maps,” says Shah, Wayve co-founder and CEO. “Humans have a fascinating ability to perform complex tasks in the real world, because our brains allow us to learn quickly and transfer knowledge across our many experiences. We want to give our vehicles better brains, not more hardware.”

“DeepMind have shown us that deep reinforcement learning methods can lead to super-human performance in many games including Go, Chess and computer games, almost always outperforming any rule based system,” reads a Wayve blog post. “We here show that a similar philosophy is also possible in the real world, and in particular, in autonomous vehicles.

Wayve co-founders Alex Kendall and Amar Shah

Wayve’s Amar Shah and Alex Kendall believe there’s been too much hand-engineering going on as people try to solve the self-driving car problem.

Wayve’s apporach to teaching was a a penatly/reward system.   They put the Twizy on a narrow, gently curving lane. A human driver sat in the driver’s seat, then handed full control over to the car, not telling it what its task was, and let it experiment with the controls. Every time the car went to drive off the road, they stopped it and corrected it. The algorithm “penalized” the car for making mistakes, and “rewarded” it based on how far it traveled without human intervention. Within 20 minutes, which represented less than 20 trials, the car had worked out how to follow a lane more or less indefinitely.

“Imagine deploying a fleet of autonomous cars, with a driving algorithm which initially is 95% the quality of a human driver. Such a system would not be wobbly like the randomly initialized model in our demonstration video, but rather would be almost capable of dealing with traffic lights, roundabouts, intersections, etc. After a full day of driving and on-line improvement from human-safety driver take over, perhaps the system would improve to 96%. After a week, 98%. After a month, 99%. After a few months, the system may be super-human, having benefited from the feedback of many different safety drivers.”

There are certainly learning elements – and network learning elements – present in current self-driving operations. Tesla’s Autopliot, for example, logs any mistake a driver has to take over and correct for, and uses it to help educate other Teslas heading through the same area. But the idea of letting a self-driving car build its a model of how to operate in the world, much the same way as a human driver does, may be a leap forward.

With the right amount of funding, testing and marketing, the marriage of Renault’s Twizy  and Wayve’s training algorithm, the advent of driverless vehicles has been pushed forward tremendously, and developers expect to put them on the world’s roads within months.

You can find the entire article by Loz Blain at