The term "artificial intelligence" (AI) refers to a type of unnatural intelligence that has been programmed to carry out a specific task. Artificial intelligence, also referred to as machine intelligence, is a branch of science that seeks to mimic human cognitive functions and behaviors. A computer system can learn from inputs thanks to a mechanism called machine intelligence rather than being controlled only by linear programming.
In the modern world, artificial intelligence is simplifying and making life easier in a number of ways. The creation of general AI is a goal shared by many researchers. This blog's primary goal is to explore artificial intelligence in depth.
This whitepaper offers a thorough explanation of artificial intelligence. It also explains TensorFlow, deep learning, the concept and types of artificial neural networks, machine learning, types, and methods of machine learning, as well as the distinction between the two.
The general theory of artificial intelligence includes the study of neural-like components and multidimensional neural-like expanding networks, short-term and long-term memory, and the functional organization of the brain of artificially intelligent systems to develop artificial personalities and purposeful behavior that are established through training and education.
The term "artificial intelligence" (AI) designates a field of computer science that employs a wide range of techniques to provide information using logic, processes, and algorithms.
Artificial intelligence was a concept used in programs for natural language understanding, data processing, automated programming, robotics, scenario analysis, game playing, intelligent systems, and proving scientific theorems.
Artificial intelligence (AI) has several subfields, including machine learning, which enables computers to learn and grow without explicit programming. Algorithms and neural network models are used to assist computers in continuously improving their performance.
Making computer programs capable of approaching the data and learning on their own, without constant human assistance, is the focus of machine learning.
With the help of the examples we provide, learning begins with observations of data, including firsthand experience, or instruction, allowing it to look for trends in the data and produce better results in the future.
In order to be able to make decisions and forecasts based on newly added data, algorithms are "trained" in machine learning to find patterns and traits in massive amounts of data.
There are numerous machine learning types, but this section only briefly discusses the three most common and widely used types.
There are many different kinds of machine learning methods, but this section contains a brief discussion of a few of the most popular ones.
An artificial neural network, also known as an ANN, is a type of machine learning algorithm that uses a graph of neurons to represent data. First developed in the 1950s, the perceptron algorithm gave birth to the concept of neural networks.
ANNs are part of a computer system based on this framework that evaluates and processes data in a similar way to the human brain, solving problems that would be expensive or impossible to solve by human or statistical standards. As more data becomes available, artificial neural networks can learn features to achieve better results.
Figure 1. Structure of ANN
An input layer, an output layer, and hidden layers (one to many) make up a neural network, which uses mathematical computation to help determine the conclusion or course of action the computer must take between the input and output layers. Each hidden layer processes the data before moving on to the next based on weighted connections. These hidden layers transform the input data into something that the output or yield unit can use.
The system decides how to move the data to the next layer based on what it knows about the data after one layer has processed it, taking into account the value it receives from analysis.
Depending on the complexity of the issue, it will proceed through higher-level units until it reaches the production layer. Before it can be fully deployed, an ANN needs to be trained.
Comparing a machine's output with an explanation of the anticipated output provided by a human is a necessary part of this training. Using a process known as backpropagation, the computer adjusts the layer weights if they don't fit by taking this information into account. To guide neural networks' upcoming processing, these new learning principles are being used.
A set of data (metadata) is used as an input in the machine learning subfield known as deep learning (DL), which transforms it through multiple layers of nonlinear transformation before computing the result. It implies machine learning, in which computers gain knowledge through experience, analysis, and development of expertise without the need for human interaction.
This algorithm's unique ability to automatically extract functions allows it to automatically extract the pertinent attributes required to solve a problem.
Figure 2. The distinction between ML and DL
A hierarchy of artificial neural networks are used in deep learning to perform the ML process and allow unstructured and unlabeled data to draw its own conclusions (Figure 2).
A deep neural network can have many hidden layers, as opposed to a traditional neural network's one or two hidden layers. In a deep learning neural network, each hidden layer is in charge of training a particular set of features based on how well the layer before it performed. The complexity and abstraction of the data increase along with the number of hidden layers.
The deep learning algorithm can therefore solve more challenging problems involving numerous nonlinear transformational layers that are impossible for a human to solve.
Figure 3. Difference between traditional neural network and deep neural network
Although deep learning increases the capabilities of artificial intelligence, its use has so far been restricted to data scientists. However, deep learning is now on track to become a widely accessible set of technologies with a wide range of business applications.
Deep learning has numerous applications in different fields, including automated driving, fraud detection, object detection, traffic and earthquake prediction, medical research, electronics, automation, aerospace, and defense, to name a few. For instance, if a machine learning system built a model with parameters based on how much credit a user can send or receive, the deep-learning approach will start to build upon the machine learning outcomes.
A retailer, sender, client, online media event, FICO assessment, IP address, and a large number of other features that may take some time to interface together when prepared by a person are added to each layer of the neural network, which builds on the previous layer. Deep learning algorithms are trained to identify trends in all activities.
It also knows when a certain phenomenon calls for a fraud investigation. An expert receives a request from the output layer and may choose to restrict access to the user's account until all inquiries are answered.
Data-flow graphs are used by the software program TensorFlow to carry out numerical operations specifically on neural networks. TensorFlow is well known for implementing machine learning algorithms. It was created by Google and released as an open-source platform in 2015. At the moment, it is the most well-liked platform for developers to create a wide range of impressive projects.
Figure 4. A Diagram of How TensorFlow works
In Figure 4, TensorFlow uses a type of data structure known as a tensor, which represents all of the data we want to use and allows for the accumulation of any kind of data. TensorFlow accepts a multi-dimensional array as the input for the tensor.
TensorFlow enables the creation of dataflow graphs and structures to illustrate how this input data moves through a graph. Making a flowchart of the possible operations that can be carried out on these inputs, which go in one direction and come out in the other, is helpful.
TensorFlow has three functional areas: handling information, building the model, training, and gauging the model.
Figure 5. Schematic of the constructed computational graph in TensorFlow
Computations are possible because of tensor interconnections. While the edge of the tensor describes the input-output relationships between nodes, the tensor's node actually performs the mathematical operations (Figure 5).
Tensor | Type | Example |
---|---|---|
0-Dimensional | Scalar | [1] |
1-Dimensional | Vector | [1,1] |
2-Dimensional | Matrix | [ [1,1],[1,1] ] |
3-Dimensional | 3 tensor | [ [ [1,1],[1,1]], [ [1,1],[1,1] ] ] |
n-Dimensional | N tensor |
As demonstrated in the above Table 1, several types of tensors can be created like scalar is 0-Dimensional, vectors are 1-Dimensional, Matix is 2-Dimensional, and so on.
TensorFlow is written in C++, Python, and Cuda but nowadays it is widely supported by all major programming languages like Java, R, Google Go, JavaScript, and many others. TensorFlow is extremely versatile and cross-platformed, it can run on any kind of platform available in the market that incorporates Web, Mobile device, IoT, Embedded Systems, Cloud, Edge Computing. Alongside this came the help for equipment speed increase for running enormous scale Machine Learning codes and these include CPUs, GPUs, Android and iOS devices, a local machine, Google provided TPUs, a cluster in the Cloud and many others [In Figure 6].
Figure 6: Model Diagram of TensorFlow
TensorFlow's simplicity is one of the key reasons why it has become the most powerful method in deep learning and AI today. Text (document classification, translation, emotion analysis), audio (voice recognition, Siri/Alexa/Google Home/Microsoft Cortana), and visual data (image or video processing, computer vision) all can be processed with TensorFlow. Any Google application or innovation that utilizes AI, utilizes TensorFlow. The presentation of Google Translate amazingly expanded when the organization changed to this innovation. At present most of the tech giants are using TensorFlow to improve their company’s internal operations as well as for the other services these include Airbnb, Airbus, China Mobile, Coca-Cola, Intel, Lenovo, Paypal, Qualcomm and many more. Most would agree that Google the makers of TensorFlow have profited by this innovation as much as every individual who utilizes it.
A description of each methodology used to create the web application can be found in this section.
Figure 7. The architecture of VGG16
VGG16 has a total of 16 layers among which 13 are convolutional and 3 are fully connected and also 5 max pooling. From Figure 7, we can see that at first, It has 2 convolutional layers and a max-pooling layer after that, then again 2 convolutional layers followed by a max-pooling layer, then again 3 convolutional layers followed by a maxpooling layer, then again 3 convolutional layers followed a max-pooling layer, then again 3 convolutional layers after that a max-pooling layer. In the end, there are 3 layers and those are fully connected. This model layers have some weights, a total of 138 million parameters, and an accuracy of 92.7%. It uses a 3 x 3 Kernel for convolution and a 2x2 max pool size.
No of Layer | Convolution | Output Dimension | Pooling | Output Dimension |
---|---|---|---|---|
1 & 2 | Convolution layer of 64 channel of 3x3 kernel with padding 1, stride 1 | 224x224x64 | Max pool stride =2, size 2x2 | Max pool stride =2, size 2x2 |
3 & 4 | Convolution layer of 128 channel of 3x3 kernel | 112x112x64 | Max pool stride =2, size 2x2 | 56x56x128 |
5, 6 ,7 | Convolution layer of 256 channel of 3x3 kernel | 56x56x128 | Max pool stride =2, size 2x2 | 28x28x256 |
8, 9, 10 | Convolution layer of 512 channel of 3x3 kernel | 28x28x256 | Max pool stride =2, size 2x2 | 14x14x512 |
11, 12, 13 | Convolution layer of 512 channel of 3x3 kernel | 14x14x512 | Max pool stride =2, size 2x2 | 7x7x512 |
From above Table 2, we can see that when an image passes through convolutional layers 1 and 2, the image output size is fixed 224x224 RGB. After that, there is a max-pooling where the pool stride is 2 and the size is 2x2 pixel window and after max-pooling, the output dimension is 112x112x64. Now again, after layers 3, 4, and the max-pooling the dimension output becomes 56x56x128. Now next set of convolutional layers available here are 5, 6, 7 with 256 channel of 3x3, and after max-pooling the output dimension is 28x28x256. Again after convolutional layers 8, 9, 10, and max-pooling the output dimension is 14x14x512. Then again we have 3 convolutional layers 11, 12, 13 and after maxpooling, the output dimension becomes 7x7x512. For each max-pooling, the pool stride is 2 and the pixel window size is 2x2.
COCO: |
---|
164K complex images |
80 thing classes, 91 stuff classes and 1 class unlabeled |
Instance-level annotations for things |
5 captions per image |
This has annotations for 80 object detection categories, captioning (interpretation of the pictures in natural language), image segmentation, full scene segmentation, dense pose, and person instances with keypoint. The annotations for the training and validation photos are open to the public.
System | VOC2007 test mAP | FPS (Titan X) | Number of Boxes | Number of Boxes |
---|---|---|---|---|
Faster R-CNN (VGG16) | 73.2 | 7 | ~6000 | ~1000 x 600 |
YOLO (customized) | 63.4 | 45 | 98 | 448 x 448 |
SSD300* (VGG16) | 77.2 | 46 | 8732 | 300 x 300 |
SSD512* (VGG16) | 79.8 | 19 | 24564 | 512 x 512 |
API | Depends | Users | based | Depends |
The object detection of SSD takes place in two parts. At first, to extract features it uses the VGG16 network and then uses the filters of convolutional layers to detect the objects. The primary layers consist of the VGG16 convolutional network, but there are 6 more auxiliary layers added by SSD. Multi-scale feature maps, Convolutional predictors, Default boxes and aspect ratios are the features of these auxiliary layers. For object detection, five of them are used and it can make six predictions instead of four in three of those layers. SSD uses 6 layers to make 8732 predictions in total [In Figure 9].
Figure 9. SSD Architecture
A key feature of the SSD model is the use of multi-scale convolutional bounding box outputs linked to multiple feature maps at the network's top. This representation aids in easily and efficiently modeling the space of possible box shapes.
Alin Bhattacharyya is a “Full Stack” Enterprise Architect heading the Frontend practice at Coforge, with over 20 years of experience in Software Engineering, Web and Mobile Application development, Product development, Architecture Design, Media Analysis and Technology Management. His vast experience in designing solutions, client interactions, onsite-offshore model management, and research and development of POC’s and new technologies allow him to have a well-rounded perspective of the industry.