Welcome


How a stubborn computer scientist accidentally launched the deep learning boom

So in 2006, Nvidia announced the CUDA platform. CUDA allows programmers to write “kernels,” short programs designed to run on a single execution unit. Kernels allow a big computing task to be split up into bite-sized chunks that can be processed in parallel. This allows certain kinds of calculations to be completed far faster than with a CPU alone.

But there was little interest in CUDA when it was first introduced, wrote Steven Witt in The New Yorker last year:

When CUDA was released, in late 2006, Wall Street reacted with dismay. Huang was bringing supercomputing to the masses, but the masses had shown no indication that they wanted such a thing.

“They were spending a fortune on this new chip architecture,” Ben Gilbert, the co-host of “Acquired,” a popular Silicon Valley podcast, said. “They were spending many billions targeting an obscure corner of academic and scientific computing, which was not a large market at the time—certainly less than the billions they were pouring in.”

Huang argued that the simple existence of CUDA would enlarge the supercomputing sector. This view was not widely held, and by the end of 2008, Nvidia’s stock price had declined by seventy percent…

Downloads of CUDA hit a peak in 2009, then declined for three years. Board members worried that Nvidia’s depressed stock price would make it a target for corporate raiders.

Huang wasn’t specifically thinking about AI or neural networks when he created the CUDA platform. But it turned out that Hinton’s backpropagation algorithm could easily be split up into bite-sized chunks. So training neural networks turned out to be a killer app for CUDA.

According to Witt, Hinton was quick to recognize the potential of CUDA:

In 2009, Hinton’s research group used Nvidia’s CUDA platform to train a neural network to recognize human speech. He was surprised by the quality of the results, which he presented at a conference later that year. He then reached out to Nvidia. “I sent an e-mail saying, ‘Look, I just told a thousand machine-learning researchers they should go and buy Nvidia cards. Can you send me a free one?’ ” Hinton told me. “They said no.”

Despite the snub, Hinton and his graduate students, Alex Krizhevsky and Ilya Sutskever, obtained a pair of Nvidia GTX 580 GPUs for the AlexNet project. Each GPU had 512 execution units, allowing Krizhevsky and Sutskever to train a neural network hundreds of times faster than would be possible with a CPU. This speed allowed them to train a larger model—and to train it on many more training images. And they would need all that extra computing power to tackle the massive ImageNet dataset.