Deep Learning Technology Isn’t New, So Why Is It A Big Deal Now?

Deep learning is exploding. According to Gartner, the number of open positions for deep learning experts grew from almost zero in 2014 to 41,000 today. Much of this growth is being driven by high tech giants, such as Facebook, Apple, Netflix, Microsoft, Google, and Baidu. These big players and others have invested heavily in deep learning projects. Besides hiring experts, they have funded deep learning projects and experiments and acquired deep learning related companies. And these investments are only the beginning. Gartner predicts that 80% of data scientists will be using deep learning tools by 2018.

Deep learning technology, which is meant to simulate biological neural networks in brains, arose in the 1950s, along with the first computers. So, if computers and deep learning began development together, why is deep learning only now reaching a mainstream computing audience?

The increased processing power afforded by graphical processing units (GPUs), the enormous amount of available data, and the development of more advanced algorithms has led to the rise of deep learning.

The Current State Of Deep Learning

Deep learning is all around us. It’s used to determine which online ads to display in real time, identify and tags friends in photos, translate your voice to text, translate text into different languages on a Web page, and drive autonomous vehicles.

Deep learning is also found in less visible places. Credit card companies use deep learning for fraud detection; businesses use it to predict whether you will cancel a subscription and provide personalized customer recommendations; banks use it to predict bankruptcy and loan risk; hospitals use it for detection, diagnosis, and treatment of diseases.

The range of applications is almost limitless. Other options include text analysis, image captioning, image colorization, x-ray analysis, weather forecasts, finance predictions, and more.

Deep learning is already being widely used to automate processes, improve performance, detect patterns, and solve problems.

What Is Deep Learning?

Deep learning falls under the umbrella of machine learning which is a subset of artificial intelligence (AI). Loosely defined, artificial intelligence encompasses technology that simulates human capabilities while machine learning algorithms learn and adapt to new events.

Deep learning is a term for technologies that use artificial neural network (ANNs) algorithms. Experts consider deep learning and ANNs to be the same thing and use the terms interchangeably. Just like neural networks in brains, ANNs have neurons (nodes) interconnected by synapses (links). Each node receives data, performs an operation, and passes the new data to another node via a link. The links contain weights or biases that influence the next node’s operation.

To illustrate the roles of nodes and links, imagine a company that wants to predict whether a customer will renew a subscription based on two predictors, gender and age. The company’s neural network has two input nodes--one for each predictor--connected via separate links to one output node. Gender and age values are fed into the input nodes. Those values are multiplied by preset weights in the links. If age happens to be a better predictor than gender, then the link that sends age data will have a higher weight.

The output node adds the weighted data from the input nodes and produces a value, which equates to a prediction. In this simplified example, the value could be between 0 and 1. The closer the value is to 1, the more likely the customer is to renew the subscription.

In a real project, ANNs may contain thousands of nodes and billions of links. Each node belongs to a layer, which is a group of nodes. There are input layers, output layers, and layers in between the two, which are known as hidden layers. Adding nodes, links, and layers increases the accuracy of the ANN.

Role of Training

Once built, ANNs require a lot of ‘training’ to work well. An untrained ANN will always fail. This is where the ‘learning’ in deep learning comes into play.

Data scientists can use supervised or unsupervised training. Under supervised training, ANNs process input values from test data and produce output values (predictions), which are compared to the real output values from the test data. Then, a training algorithm, specifically designed to train ANNs, is applied. A few types of training algorithms exist, but the most widely used type is called backpropagation. The backpropagation algorithm identifies the parts of the ANN responsible for an inaccurate prediction by following the error in the output nodes back through the ANN to the hidden and input layers and changes the weights accordingly. This process is repeated over and over until the ANN produces consistent, accurate predictions with the test data. Then, the ANN is ready to process new input values and predict unknown output values.

The purpose of unsupervised training is to model the structure or distribution of data, not to produce a predictor. So with unsupervised training, once an ANN processes input data, the weights do not need to be changed because there is no corresponding output data to compare the ANN's prediction to.

Deep Learning Is Old Technology

The best place to start the AI and deep learning story is with William McCulloch and Walter Pitts. In 1943, they published A Logical Calculus of the Ideas Immanent in Nervous Activity in which they outlined the first computational model of a neural network. This paper served as the blueprint for the first ANNs.

Six years later, Donald Hebb published The Organization of Behavior, which argued that the connections between neurons strengthened with use.This concept proved fundamental to understanding human learning and how to train ANNs.

In 1954, Belmont Farley and Wesley Clark, using the research done by McCulloch and Pitts, ran the first computer simulations of an artificial neural network. These networks of up to 128 neurons were trained to recognize simple patterns. 

In the summer of 1956, computer scientists met “to act on the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” This event, known as the Dartmouth Conference, is considered the birthplace of AI.

Following the Dartmouth Conference, the field of artificial intelligence took off. In 1957, Frank Rosenblatt began to study a type of neural network he called the perceptron and was able to apply the training method Farley and Clark used on their two-layer networks to multi-layer ones.

In 1959, Bernard Widrow and Marcian Hoff developed a single layer neural network they called ADALINE short for Adaptive Linear Elements, which could predict the next bit of information on an incoming phone call based on the prior bits. Their next development, a multilayer neural network called MADALINE, eliminated echoes on phone calls and is said to be the first practical application of an ANN.

Innovations continued through the ‘60s, but funding, research, and advances slowed in the ‘70s. AI scientists’ accomplishments failed to live up to media hype and government expectations. The so-called ‘AI winter’ ensued during which there was little funding and minimal research done on the topic.

Starting in 1986, research resurged for a few years after Geoff Hinton published Learning Representations by Back-propagating Errors, which describes the backpropagation learning procedure. However, true resurgence did not occur until the mid 2000s. Today, deep learning and AI are in deep bloom, and some would say overhyped.

So, Why Are ANNs Becoming Useful Now?

Three factors have unleashed the potential of deep learning:

1. The Exponential Explosion of Available Data

According to Cisco, the global Internet traffic in 1992 was 100 GB per day. In 2015, that number was 17.5 million times greater at 20,235 GB per second. Now, 90% of the world’s data has been created in the last two years.

Without this data, training ANNs containing millions of connections and thousands of nodes could not happen. For an ANN to recognize a face, detect credit fraud, or translate a voice to text in a noisy room it takes more than just a few bits of test data for consistent, accurate predictions. This is why ANNs flourish in the age of big data.

The best and most visible example of data enabling an ANN is a project led by Google X, a somewhat secretive research and development team. Led by Andrew Ng, currently the chief scientist at Baidu Research, and Jeff Dean, a Google Senior Fellow, the team assembled a 16,000 central processing units (CPUs) to power a ANN with over a billion connections.

The ANN then underwent training, processing 10 million images from randomly selected YouTube videos. According to many sources, the ANN trained itself to recognize cats. In reality, only one node in the ANN was responsible for recognizing cat images. Other nodes could identify human bodies and faces. Two decades ago, it would have been impossible to collect 10 million images to train the ANN.

2. The Rise of the Graphics Processing Unit (GPU)

Making a neural network run fast is difficult. Hundreds or thousands of neurons must interact with each other in parallel. Depending on the task, it could take weeks for traditional CPUs to generate a prediction from an ANN. With GPUs, the same task that took weeks may only take days or hours.

GPUs were first built by NVIDIA to handle the massively parallel operations that video games require to render images hundreds of times a second for smooth video display. In 2009, Andrew Ng and several others found they could use GPUs for large-scale deep learning.

To illustrate the power of GPUs, Ng replicated the Google X project with a network with 11 billion connections running on 16 computers powered by just 64 GPUs – the previous project used 1,000 computers with 16,000 CPUs. The new project did not run too much faster or perform better, but Ng made his point. Sixty-four GPUs could handle the same amount of work as 16,000 CPUs in roughly the same amount of time.

3. The Invention of Advanced Algorithms

Although a range of discoveries have increased ANNs capabilities, many consider the discoveries made by Geoffrey Hinton and his colleagues in 2006 to be the turning point for ANNs.

Hinton introduced an algorithm that could fine-tune the learning procedure used to train ANNs with multiple hidden layers. The key was using a ‘greedy’ or gradient descent algorithm which could fine tune each layer of the ANN separately.

The other key discovery optimized the initial setting of the weights. This allowed high-dimensional data, or data with many features, to convert into low-dimensional data, increasing predictive power.

Hinton is credited with putting the ‘deep’ in deep learning because he operationalized multiple hidden layers. Allegedly, Hinton and his team coined the term “deep learning” to rebrand ANNs. At that point in time, many professionals and funders had no interest in supporting ANNs because they were thought to be unprofitable.

What’s The Impact?

Deep learning technology is solving highly complex problems that have eluded computer scientists for decades thanks to heightened processing power, massive amounts of available data, and more advanced neural network algorithms.

For example, deep learning is being used to improve natural language processing tools to consistently comprehend the meaning of a sentence, not just the individual words. So if someone wants to translate ‘take a hike’ or ‘get lost’ it will not take the expression literally. It will translate the expression into a corresponding expression in the other language.

Object recognition software will become more prevalent and accurate. For example, facial recognition software is already operating at a high level. Scientists are now training deep learning algorithms to differentiate between similar objects, such as teacups and bowls, houses and cabins, shoes and boots. This precision allows computers to differentiate between pedestrians on a street, detect anomalies in common objects, piece together panoramic photos, index images, and much more.

Drawbacks

Using ANNs comes with a couple drawbacks, namely the black box problem and overfitting.

The black box problem is the inability to know how an ANN reached a prediction. Users can see the data in the input and output layer, which offers an inkling of what input variables it deems important. However, the hidden layers mask the underlying reasoning behind a prediction. Hence, business leaders are less inclined to trust an untested ANN because they cannot see how it reaches its conclusion unlike other algorithms whose processes are clearly visible.

Overfitting is also a common problem with ANNs. Overfitting happens when an algorithm fits a set of test data so well that it fails to perform accurately with non-test data. This problem is not unique to deep learning and can be seen in other types of machine learning algorithms.

Applications

There are many algorithms that data scientists can use to detect patterns and relationships in underlying data. Deep learning algorithms are some of the most powerful because they can adapt to a wide variety of data, require little statistical training, learn with simple algorithms, and scale to large data sets.

But in practical use, deep learning is overkill if your project uses small data volumes and solves simple problems. If you process large amounts of data and need to produce complex predictions, deep learning technology may be beneficial. And if there doesn’t seem to be a deep learning tool that fits your needs, just wait.

Henry H. Eckerson

Henry Eckerson covers business intelligence and analytics at Eckerson Group and has a keen interest in artificial intelligence, deep learning, predictive analytics, and cloud data warehousing. When not researching and...

More About Henry H. Eckerson