Applications of Convolutional Neural Network

POOJA BAGAD
5 min readMay 28, 2021

Convolution Neural Networks:

CNN is a type of feedforward artificial neural network. It consists of following layers:

1. Convolutional layer

2. Pooling layer

3. Flattening layer

4. Fully-connected layer

In deep learning, a convolutional neural network is a class of deep neural network, most ordinarily applied to research visual imagery. CNN are also referred to as shift invariant or space invariant artificial neural networks, supported the shared-weight architecture of the convolution kernels or filters that slide along input features and supply translation equivariant responses known as feature maps. Convolutional networks are a specialized sort of neural networks that use convolution in place of general matrix operation in a minimum of one among their layers.

Application:

A. Computer Vision:

Convolutional neural networks are trainable multi-stage architectures with the inputs and outputs of each stage consists of feature maps. If the input may be a color image, each feature map may be a 2D array containing a color channel of the input image, for a video or a volumetric image it might be a 3D array. Each feature extracted in the least locations on the input is represented by a feature map at the output.

1) Face Recognition:

Face recognition constitutes a series of related problems-

1. Identifying all the faces within the picture

2. Focusing on each face in spite of bad lighting or different pose

3. Identifying unique features

4. Comparing identified features to existing database and determining the person’s name.

Faces represent a complex, multi-dimensional, visual stimulus which was earlier presented employing a hybrid neural network combining local image sampling, a self-organizing map neural network and a convolutional neural network.

2) Scene Labelling:

Each pixel is labelled with the kind of the object it belongs to. Recurrent architecture for CNN refers a linear series of networks sharing the same set of parameters. The network automatically learns to smooth its own predicted labels. Because the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Fully convolutional networks trained end-to-end, pixels to-pixels address the shortcomings of prior approaches of CNNs which were used for semantic segmentation during which each pixel was labelled with the category of its encoding object or region.

3) Image Classification:

CNNs achieve better classification accuracy on large scale datasets because of their capability of joint feature and classifier learning. Hierarchical Deep Convolutional neural Networks (HDCNN) are based on the intuition that some classes in image classification are more confusing than other classes. It builds on the conventional CNNs which are N-way classifiers and follows the coarse-to-fine classification strategy and design module. Fine grained image classification systems are based on the principle of identifying foreground objects to extract discriminative features. Applying visual attention to fine grained classification task using deep neural network using the attention derived from the CNN trained with the classification task can be conducted under the weakest supervision setting where only the class label is provided in contrast to other methods that require object bounding box or part landmark to train or test.

4) Human Pose Estimation:

Human-pose recognition is a long-standing problem in computer vision. This is primarily due to high dimensionality of the input file and the high variability of possible body poses. 3D CNN model applies CNN on RGB videos to urge an output in 3 dimensional convolutions. A heterogeneous multi-task learning architecture was proposed with deep neural convolutional network for human pose estimation.

This framework consists of two parts:

1. Pose regression

2. Part detection employing a window classifier.

Based on the face bounding box CNNs are employed to find out the face orientation and alignment. A picture is used because the primary layer of CNN, within the second the output of the primary layer is convolved with a series of filters within the second. Within the third the output of the layer is subsampled. The 3D CNN model is best when the amount of positive samples is less and achieves an overall accuracy of 90.2%.

B. Natural Language Processing:

CNNs have been used to extract information from raw signals. Speech essentially is a raw signal and its recognition is one among the foremost important tasks in NLP. Recently, CNNs have also been applied to the tasks of sentence classification, topic categorization, sentiment analysis and many more.

1) Speech Recognition:

Convolutional Neural Networks have been used in Speech Recognition and has given better results as compared to Deep Neural Networks (DNN). In 2015, researchers in Microsoft Corporation indicated four domains during which CNN gives better results than DNN.

They are:

1. Noise robustness

2. Distant speech recognition

3. Low-footprint models

4. Channel-mismatched training-test conditions

There are certain factors of CNN because of which they provide better results in Speech Recognition. Robustness of CNN is increased when pooling is completed at an area frequency region and over-fitting is avoided by using fewer parameters to extract low-level features. Speech Emotion Recognition (SER) has been an important application in recent times in human-centered signal processing. CNN is used to learn important features of SER.

2) Video Analysis:

Compared to image data domains, there’s relatively little work on applying CNNs to video classification. Video is more complex than images since it’s another (temporal) dimension. Some extensions of CNNs into the video domain are researched. One way is to take space and time as equivalent dimensions of the input and perform convolutions in both time and space. Another method is to mix the features of two convolutional neural networks, one for the spatial and one for the temporal stream. Long STM (LSTM) recurrent units are usually added after the CNN to account for inter-frame or inter-clip dependencies.

--

--