The Deep Learning DS platform offers the possibility to train deep neural networks (DNNs) for computer vision tasks such as image classification and object detection. The platform is designed to facilitate the development of such networks by providing all necessary tools via an easily accessible UI.
This introduction starts with an overview of deep learning for computer vision and explains the steps needed to develop deep neural networks using Deep Learning DS. If you would like to directly jump into developing a network you can follow our Get Started guide as well.
Deep learning refers to a large number of neural network architectures and machine learning algorithms that have reached new levels in the areas of computer vision, machine translation, natural language processing, and more. In the computer vision domain, convolutional neural networks (CNNs) have emerged which allow learning features from images automatically (representation learning). Using these features, tasks such as image classification and object detection can be solved by simply providing labeled images during a training phase. Since this training data must contain annotations which are usually provided by humans, this approach is called supervised learning. On the other hand, there are tasks like anomaly detection which can be solved with deep neural networks that are trained without supervision using only the raw image data (unsupervised learning).
There are a variety of frameworks, libraries, and tools available that support the development and training of these networks. Using underlying technologies such as PyTorch, Tensorflow, ONNX, and the latest Nvidia GPUs, our Deep Learning DS platform provides easy access to deep neural networks for image classification and object detection.
Image classification comprises the assignment of one or more classes to an individual image. A typical task that can be solved with this method is the classification of products into quality classes such as agricultural products based on their degree of ripeness or the automated sorting out of products that do not meet the desired quality requirements. Further examples are the automatic determination of animal and plant species and the optical condition monitoring of machines and plants.
An algorithm for image classification typically outputs not only one class for the provided input image but a confidence value for each available class. The image is assigned to the class with the highest confidence value as shown in the following figure. In the case of multi-class settings, a confidence-threshold is provided for this assignment.
A popular benchmark for evaluating new classification algorithms is the "ImageNet Large Scale Visual Recognition Challenge" (ILSVRC). The goal is to correctly assign the images of a very large data set with more than 1 million images to the 1000 possible classes. Deep neural networks made it possible to develop new algorithms with very high accuracy, as shown in the following figure.
Source: AI Index 2018, ImageNet
Training a deep neural network for image classification requires a dataset that is sufficiently large to cover the variety of scenes and classes that should be handled in the production setting. This dataset requires images and their corresponding label. In the case of image classification, this label is the class to which the image belongs to. This kind of labeling can be done quickly because the images must only be sorted by their class and no further labeling is necessary.
The goal of object detection algorithms is to find objects in images and classify them. Hence, there is a strong interlinking with image classification. Applications for object detection vary from recognizing people to finding scratches on surfaces for quality assurance purposes. Problems such as counting objects, picking products, and tracking items can be solved based on object detection algorithms.
The output of object detection methods which are based on single-image inputs is a list of bounding boxes with their respective class confidences. Typically, the so-called “background confidence” is provided as well. This value indicates that the object candidate might rather belong to the image background instead of being an object of interest
Developing and training a deep neural network for object detection requires more manual work during data labeling. This is because each object in the training images must be labeled with a surrounding bounding box and the correct class label. This procedure is more involving than sorting images for pure classification tasks but the resulting algorithm will not only be able to classify the whole image but will find individual objects within an image.
The development process of deep learning based algorithms comprises four stages that are shown in the figure below. During data acquisition, raw images are collected which are labeled during the annotation stage. Once a sufficiently large and diverse dataset is created, the training of the algorithm starts. During the training phase, the accuracy and performance of the trained models are validated before they are ready for deployment. With Deep Learning DS and Inference DS you can realize all parts of this process.
During data acquisition, it is beneficial to collect data using the specific sensor and camera hardware that will be deployed later on. Nonetheless, you can combine images from various sources to obtain a diverse dataset. Not only the number of images is important, but also their diversity. This diversity should comply with the situations the algorithm will need to handle during model execution (inference) in order to train a network that reaches the desired accuracy.
Deep Learning DS supports you during data acquisition via its dataset management features, several APIs, and its integration with Inference DS. You can upload acquired data to Deep Learning DS using the UI, the API, or directly via Inference DS. For instance, you can integrate uploading to Deep Learning DS into an existing system using the REST API and automate the acquisition process that way. In case you already have labels within the existing system, you add them as well to reduce the annotation effort.
Annotating the collected raw data correctly constitutes an important success factor for developing deep neural networks. Depending on the task (image classification, object detection, etc.), you need different labels resulting in more or less efforts in this phase. Deep Learning DS integrates the tooling for annotating images.
Model Training & Validation
The central part of developing algorithms based on deep neural networks is their training and the validation of their results. During training, the deep neural networks get optimized to accomplish the intended task by learning from the collected data. The training is performed iteratively: one batch of images (for instance 8 images) and the corresponding labels are given to the network in one optimization step (iteration). Depending on the task and data, the training needs more or less iterations until the desired accuracy is reached and until it converges. Within Deep Learning DS, the accuracy of the network is checked periodically on a test dataset. That way, you can directly see how well the algorithm performs during the training process and how the accuracy gets better the more iterations are passed. Once the model is trained and validated, it can be deployed for execution.
Deploying the trained and validated deep neural networks comprises their execution (“inference”) as well as their integration into the desired system. For that purpose, the input interfaces must be configured to obtain images and the outputs of the deep neural network must be further processed as intended. You can download the trained model directly from the Deep Learning DS model view and use it within Inference DS.