IBM Research Invents Jet Engine of Deep Learning

Videos

Posted on 12/07/2017

by QCM-Technologies

IBM Fellow Hillery Hunter develops new software enabling unprecedented GPU processing speeds

Summary: IBM Research publishes in arXiv close to ideal scaling with new distributed deep learning software which achieved record communication overhead and 95% scaling efficiency on the Caffe deep learning framework over 256 NVIDIA GPUs in 64 IBM Power systems. Previous best scaling was demonstrated by Facebook AI Research of 89% for a training run on Caffe2, at higher communication overhead. IBM Research also beat Facebook’s time by training the model in 50 minutes, versus the 1 hour Facebook took. Using this software, IBM Research achieved a new image recognition accuracy of 33.8% for a neural network trained on a very large data set (7.5M images). The previous record published by Microsoft demonstrated 29.8% accuracy.

A technical preview of this IBM Research Distributed Deep Learning code is available today in IBM PowerAI 4.0 distribution for TensorFlow and Caffe.

Deep learning is a widely used AI method to help computers understand and extract meaning from images and sounds through which humans experience much of the world. It holds promise to fuel breakthroughs in everything from consumer mobile app experiences to medical imaging diagnostics. But progress in accuracy and the practicality of deploying deep learning at scale is gated by technical challenges, such as the need to run massive and complex deep learning based AI models – a process for which training times are measured in days and weeks.

For our part, my team in IBM Research has been focused on reducing these training times for large models with large data sets. Our objective is to reduce the wait-time associated with deep learning training from days or hours to minutes or seconds, and enable improved accuracy of these AI models. To achieve this, we are tackling grand-challenge scale issues in distributing deep learning across large numbers of servers and NVIDIA GPUs.

Most popular deep learning frameworks scale to multiple GPUs in a server, but not to multiple servers with GPUs. Specifically, our team (Minsik Cho, Uli Finkler, David Kung, Sameer Kumar, David Kung, Vaibhav Saxena, Dheeraj Sreedhar) wrote software and algorithms that automate and optimize the parallelization of this very large and complex computing task across hundreds of GPU accelerators attached to dozens of servers.

Our software does deep learning training fully synchronously with very low communication overhead. As a result, when we scaled to a large cluster with 100s of NVIDAI GPUs, it yielded record image recognition accuracy of 33.8% on 7.5M images from the ImageNet-22k dataset vs the previous best published result of 29.8% by Microsoft. A 4% increase in accuracy is a big leap forward; typical improvements in the past have been less than 1%. Our innovative distributed deep learning (DDL) approach enabled us to not just improve accuracy, but also to train a ResNet-101 neural network model in just 7 hours, by leveraging the power of 10s of servers, equipped with 100s of NVIDIA GPUs; Microsoft took 10 days to train the same model. This achievement required we create the DDL code and algorithms to overcome issues inherent to scaling these otherwise powerful deep learning frameworks.

These results are on a benchmark designed to test deep learning algorithms and systems to the extreme, so while 33.8% might not sound like a lot, it’s a result that is noticeably higher than prior publications. Given any random image, this trained AI model will gives its top choice object (Top-1 accuracy), amongst 22,000 options, with an accuracy of 33.8%. Our technology will enable other AI models trained for specific tasks, such as detecting cancer cells in medical images, to be much more accurate and trained in hours, re-trained in seconds.

Need Deep Learning in your environment? Give us a call to learn more at 480-483-4371 or contact us at Info@box2449.temp.domains

SOURCE and Complete Article: IBM

VIDEO: Watch Here

LINK: Deep Learning Module for IBM Spectrum Conductor with Spark

Tagged AI, Caffe, Chainer, cognitive, David Kung, Dheeraj Sreedhar, IBM power systems, IBM research, ImageNet-22k dataset, Minsik Cho, NVIDIA GPU, Sameer Kumar, Tensorflow, Torch, Uli Finkler, Vaibhav Saxena

Minority Technology Firm of the Year – Phoenix MBDA Business Center

Hardware

Software

Hardware

Software

Videos