Amazon, Google, Netflix, Facebook, MIT researchers – the lineup of ‘Deep Learning’ (DL) users is expanding every quarter. The yearly market revenue from deep learning software for enterprise applications is expected to go beyond the $10.5 billion mark by 2024 – up from the sub-$110 million figure in 2015. As per a recent study, the total annual income from all types of deep learning tools (software+hardware) is set to touch $100 billion by the end of 2024. The buzz around deep learning is enormous – and the technology has well and truly emerged from the realms of science fiction and ‘just that tool that is very good at the ‘Go’ board game’.
Before we get into analyzing the main points of interest about ‘deep learning’, it would be prudent to get its concept clear. Although the terms ‘artificial intelligence’, ‘machine learning’ and ‘deep learning’ are often used synonymously, the three are far from being one and the same. In essence, ‘deep learning’ can be referred to as Artificial Intelligence 3.0 (third-gen artificial intelligence, if you will) – a subset of ‘Machine Learning’, which itself is a subset of the broad concept of AI. While AI is all about creating programs that help machines display ‘human-like intelligence’, ‘machine learning’ refines things, extracting features from a starting object (picture or text or audio or other forms of media) and then forming a descriptive/predictive model. ‘Deep learning’ adds another layer of efficiency and sophistication, by doing away with feature extraction step, and enabling the analysis of objects with the help of customized deep learning algorithms. To sum up:
Artificial Intelligence → Machine Learning → Deep Learning
Let us now turn our attentions to some interesting facts to get a better understanding of the ‘deep learning’ technology:
Not a new concept
Although the breakthroughs in ‘deep learning’ are often viewed as part of a recent phenomenon, the actual concept is not exactly a new one. In 1965, the first ‘deep learning’ algorithms (perceptrons with several layers for supervised learning) were created – and the concept was used by London’s World School Council later on. Ivakhnenko and Lapa were the creators of this first DL algo. The computer identification setup in which the idea was used was called ‘Alpha’. Of course, the technology has evolved greatly from those days – and is currently right at the heart of internet of things (IoT).
Note: The origin of the term ‘artificial intelligence’ can also be traced back to more than 60 years ago. It was coined at the 1956 Dartmouth Conferences by a team of computer scientists.
Works like the human brain
The way in which ‘deep learning’ models are trained has a lot in common with the working mechanism of our brains. Hence, the underlying computing model in DL applications can be explained through neural networks (artificial). Data is seamlessly transferred/transmitted within these networks by automated neurons. The neural network of a ‘deep learning’ model ‘learns’ things on the basis of gaining ‘experience’ from sample data and observations – and the behaviour of the neurons undergo changes as the model gets more ‘experienced’. The principal purpose of the neural network in particular, and the DL model in general, is the accurate calculation of unknown functions (e.g., differentiating between a cat and a rabbit) from labeled data. As DL gets more and more advanced, correctly identifying the differences between relatively similar objects is becoming possible.
The value of deep learning lies in its accuracy. In select use cases, the performance of DL software can be much higher than human capabilities. However, there are two primary conditions for the optimal performance of deep learning applications. High-power graphic processing units (GPUs), which have a large number of built-in processors, are required for working with DL software. That, in turn, brings the importance of parallel training for GPUs to the fore. Secondly, for the results generated by deep learning to be of any real value, the actual volume of ‘labeled data’ or ‘sample data’ has to be huge. It’s similar to basic statistics – the greater the ‘sample’ size is, the better will be the ability of a model to predict from the ‘population’ or the real-world.
Note: For a complicated and potentially risky task like autonomous (driverless) driving, hundreds of thousands of pictures and videos have to be fed to the concerned DL algorithm.
The importance of structured data
In the previous point, the value of large sample data for deep learning models was highlighted. It has to be kept in mind that the data we are talking about here refers only to ‘structured data’ and not just any random pieces of sound or text or photos. When a business plans to implement deep learning in its IT system – it has to make sure about the availability of huge collections of such structured, organized data first. The deep learning technology processes this data to arrive at judgements (which can vary from voice identifications, to reading traffic signals, or anticipating the opponent’s next move in a board game).
Availability of deep learning frameworks
There are plenty of misplaced notions and myths about deep learning. For starters, it is often considered that DL is a tool for academicians and researchers only, and only people with advanced degrees and decades of experience can sink their teeth into it. The actual scenario is almost the reverse – with deep learning having multifarious practical uses, and there are plenty of infrastructures, networks and frameworks that can be utilized for DL training and implementation. What’s more – many of these frameworks can be used by academicians as well as developers without any problems whatsoever. The easy availability of extensive documentation on DL frameworks eases the learning curve for developers further. Many of the existing frameworks are free-to-use as well.
Note: Theano, TensorFlow and Caffe are some of the central frameworks that are used for deep learning models.
Way of working
The overall function of deep learning models can be explained in two broad steps. In the first, the suitable algorithms are created – after thorough analysis and ‘learnings’ from the available data (remember, feature extraction is here automated, unlike in machine learning). The major characteristics and traits of the object under scrutiny can be described by this algorithm. These DL algorithms are then used in Step 2, for the identification of and predictions from objects on a real-time basis. The nature and quality of the training set/sample dataset determines the quality of algorithms generated – and that, in turn, affects the accuracy/efficiency of the output.
As the scopes of artificial intelligence are expanding, the revenue-earning capacities of ‘insights-driven businesses’ (those that rely heavily on IoT and AI-based processes) are going through the roof. On a year-on-year basis, the total investment on AI tools and processes this year will be a staggering 300% more than the corresponding figure in 2016 (according to a Forrester report). Investments on deep learning software and devices are also rising rapidly. DL, as things currently stand, is a fairly pricey technology – mainly due to the requirement of expensive GPUs in its architecture. The cost of only the GPU graphics card can run up to several thousands of US dollars. Add to this the fact that separate servers will be required for most of these cards – and it can be clearly understood that implementation of ‘deep learning’ architecture involves rather steep expenses indeed. Over the next few years though, as the technology becomes more commonplace and the component prices fall, things should become a lot more competitive.
The importance of transfer learning
If a separate deep learning model (training sets, algos, hardware, et all) had to be created for every different use case – that would have been a problem, both operationally as well as financially. Thankfully, ‘transfer learning’ considerably eases the pain in this regard. To put it simply, ‘transfer learning’ (TL) refers to the practice expanding the scope of a specialized DL model to another (preferably related) use case. Another way of explaining this would be bringing in the capability of one DL model to provide the requisite training to another one. A classic example of TL would be the ability to identify genders, and the detection of male and female voices. Apart from reducing the number of deep learning models required, TL also helps in bringing down the total volume of sample data needed (for the similar yet different purposes).
Number of layers
The underlying neural networks (‘deep neural networks’, or DNN) for deep learning setups typically have multiple layers. The simplest of them has 3 layers (input, output, and a hidden layer – where the processing of data takes place) – while more complex networks can have as many 140-150 layers. The ‘deep’ or ‘hidden’ layers in the neural network of a DL model removes the need for manual feature extraction from objects. Although we are still quite some way from it, experts from the field of software and app development feel that deep learning has the potential to completely replace all types of manual feature engineering in the tech space in future.
Note: Convolutional Neural Networks (CNNs) are a good example of the deep neural networks that are used for deep learning models.
10. DL for business applications
Product recommendations, image tagging, brand identification (both own as well as that of competitors) and news aggregation are all examples of business tasks that can be powered by deep learning modules. The biggest example of DL implementation in businesses, however, have to be for the AI-powered chatbots. These bots, with the help of cutting-edge deep learning capabilities, can easily simulate human conversations – ensuring 24×7 customer services (without companies having to hire additional manpower). AI bots are particularly useful for ecommerce portals, can store, analyze and identify patterns in received information, and can even facilitate secure payments. By the end of this decade, 80% of all firms will be in favour of chatbot integration in their work-processes – a clear indication of the rapid ongoing developments in the deep learning technologies that will support the bots.
11. Key layers of the systems
Like any high-end software system, a DL model has multiple important layers. The rules and algos created after ‘studying’ the sample structured data are known as ‘training algorithms’, all possible functions can be analyzed by these algorithms in the ‘hypothesis space’, and the target function is identified by a set of data points which are named ‘training data’. The component of the DL module that actually performs the necessary action (e.g., identification) is known as the ‘performance element’.
12. The scalability is an advantage
The volume of ‘learning data’ required and the requirement/non-requirement of feature extraction manually are important points in any deep learning vs manual learning (ML) analysis (ML requires much smaller datasets than DL). Yet another advantage of deep learning over ML is the enhanced scalability of the former. The underlying algorithms in a DL model can easily scale up or down, depending on the volume and type of data under consideration – with the performance remaining unaffected at all times. In contrast, the performance and effectiveness of machine learning (can also be referred to as ‘shallow learning’) tends to become flat beyond a certain level.
13. Deep learning use cases
From identifying sounds and fully automated voice recognitions, to applications that boost the powers of computer vision and bioinformatics – the fields in which deep learning can be implemented are seemingly endless. In the domain of electronic tools and services, DL has already proven to be a handy tool for automatic translations (audio) and listening. Detection of cancer cells has been facilitated by deep learning too (it has been implemented in a HD data microscope at the UCLA). Smart driving and industrial safety & automation are two other fields where DL has enormous scopes to grow. The technology will make national defences and important aerospaces safer than ever before.
There are no scopes for doubting the fact that deep learning is the most powerful component/subset of artificial intelligence. Provided that there are no problems in acquiring enough structured data samples, and adequate investments can be made for GPUs – organizations can easily opt to integrate DL in their systems (in the absence of these resources, machine learning would be the better alternative). However, for all its advanced, futuristic capabilities – DL models still require the support of human beings, to resolve any possible confusions. Deep learning takes computing powers to the next level…without quite being able to act as a perfect substitute of the human brain. Yet.