Using Pretrained AlexNet Deep Learning Neural Network for Recognition of Underwater Objects

Recently, the growing number of Autonomous Underwater Vehicles (AUVs) can be seen. These vehicles are power supplied and controlled from the sources located on their boards. To operate autonomously underwater robots have to be equipped with the diff erent sensors and software for making decision based on the signals from these sensors. The goal of the paper is to show initial research carried out for underwater objects recognition based on video images. Based on several examples included in the literature, the object recognition algorithm proposed in the paper is based on the deep neural network. In the research, the network and training algorithms accessible in the Matlab have been used. The fi nal software will be implemented on board of the Biomimetic Autonomous Underwater Vehicle (BAUV), driven by undulating propulsion imitating oscillating motion of fi ns, e.g. of a fi sh.


INTRODUCTION / Uvod
In the recent years, a dynamical development of underwater robotics has been noticed. One of the latest innovative constructions in this fi eld are biomimetic autonomous underwater vehicles (BAUVs). They imitate underwater living organisms, e.g. fi shes, marine mammals, etc. They can imitate both the construction and kinematics of motion. BAUVs are driven by undulating propulsion imitating real fi ns, e.g. of a fi sh - Figure 1 [7] or of a seal - Figure 2 [15].
The BAUVs are driven by undulating propulsion consisting usually of two side fi ns and one tail ( Figure 1) or two tail fi ns ( Figure 2). This propulsion system is usually supported by an artifi cial swim bladder, i.e. a ballast tank for changing buoyancy of the BAUV [9]. Additionally, the BAUVs are equipped with diff erent sensors and communication and navigation devices. Generally, the sensors can be divided into two groups: hydroacoustic and photosensitive sensors. The fi rst ones use hydroacoustic waves sent and received (active) or only received (passive). The second sensors receive light rays (passive). Both sensors can create images, i.e. sonar and video images. One of the sensors of the BAUV (Figure 2) is video camera mounted on the fi xed mast. Usually, it is used for video image registration for further analysis by human expert. It can be used for diff erent other tasks in autonomous way after digital image processing. The whole sensors can support autonomy of the BAUV, e.g. video camera and hydrophones can be used for the passive detection of the obstacles [13], adequately at short and long distances. While the forward looking sonar and the echo-sounders can be used for active obstacle detection. Moreover, all the sensors can be used for the other tasks after digital image processing, e.g. object recognition. Regarding the obstacle detection, the general scheme of the autonomy system operation is to detect obstacle and then to avoid it. This scheme can diff er depended on the type of carried out mission and the type of obstacle, e.g. if the BAUV operate in the enemy area, it should submerge on the higher depth when it recognize diver or another underwater vehicle, but it can continue its mission even without maneuver of obstacle avoidance, if it recognizes swarm of fi shes. Therefore, the problem of recognition of underwater objects seems to be quite important, especially in the case of military application of BAUV or classical AUV.
After initial literature analysis [3][4] [6][8] [16], it was concluded that the deep learning seems to be modern and promising techniques for underwater obstacle recognition. Deep learning is a part of broader family of machine learning methods based on learning data representations [11]. The process of learning can be supervised, semi-supervised or unsupervised [1]. Deep learning architectures can be deep neural networks, deep belief networks and recurrent neural networks [12]. In the result of the learning process, the deep learning architectures should work similar to human expert in some learned cases. In general, the deep network means artifi cial neural network with multiply layers between input and output layers. Deep learning is used in many application connected with data processing such as voice recognition, image recognition, drug detection, etc. Deep learning-based software more often produces more accurate results than human experts [2].
One of the software toolbox for deep learning is Neural Network Toolbox included in Matlab 2018 [10]. This paper undertakes problem of recognition of underwater objects using deep learning networks and training algorithms included in Matlab 2018. In the case of obtaining positive fi nal result of the research, the deep neural network properly recognizing underwater objects will be used in the future research on BAUV ( Figure 2). The vehicle has installed inside Nvidia Jetson TX2 and Matlab it allows you to generate the code on this hardware platform.
This paper includes results of numerical research using deep learning techniques from Matlab. In the next section, the structure and training algorithms of deep neural network is described. Then, the research problem is formulated. In the next section, the results of numerical research carried out in Matlab are included. At the end, the conclusions from the research and the schedule of future research are presented.

DEEP LEARNING IN MATLAB / Duboko učenje u Matlabu
The Matlab environment contains 9 deep networks pre-trained using ImageNet database, which is used in the ImageNet Large-Scale Visual Recognition Challenge [14]. The networks consist of dozens of layers. They are trained on more than a million images, therefore they can classify images into 1000 object categories, such as keyboard, mouse, cat, etc. One of the networks is AlexNet containing 25 layers (Table 1). This pre-trained network was used in the numerical research.
There are 3 following gradient methods of training deep networks accessible in Matlab: -SGDM -the stochastic gradient descent with momentum optimizer. -RMSProp -the root mean square propagation optimizer.
-Adam -the derived from adaptive moment estimation optimizer. The stochastic gradient descent algorithm updates the network parameters (weights and biases) to minimize the loss function by taking small steps in the direction of the negative gradient of the loss. The momentum term added to the network parameters update helps to reduce the oscillation, which may appear along the path of steepest descent towards the optimum [11]. The stochastic gradient descent with momentum algorithm uses a single learning rate for all the parameters. This algorithm can be defi ned as [10]: where n means the following steps of iterative process of training, α is the learning rate, θ is the parameter vector, and E(θ) is the loss function, γ is the momentum factor determining how much the previous step infl uences on the current step of iteration. The root mean square propagation algorithm uses a learning rates that are diff erent for diff erent parameters and that can automatically adapt to the loss function being optimized. This algorithm can be defi ned as [10]: where 2 (3) where β 2 is the decay rate of the moving average for squared gradient and ɛ is the constant less or equal to zero.
The derived from adaptive moment estimation (Adam) uses a parameter update that is similar to RMSProp with momentum [5]. The update is calculated based on the following equation [10]: where and ν n = β 2 ν n-1 + (1 -β 2 ) [E(θ n )] 2 (6) where β 1 is the decay rate of the moving average for gradient.
If gradients over many iterations are similar, the moving average of the gradient lets the parameter updates establish momentum in a certain direction. If the gradients contain mostly noise, then the moving average of the gradient and the parameter updates become smaller [10].
Training deep networks is extremely computationally intensive and Matlab enables us to make the calculations on GPU or CPU, if the GPU is not accessible. Moreover, the iterative calculations connected with training deep neural network can be performed on multiple GPUs or CPU cores, in parallel on a cluster. To compare computational effi ciency of CPU and GPU hardwares, the same numerical research presented in the further part of paper was performed on single CPU and single GPU.

RESEARCH PROBLEM / Istraživački problem
The research problem was formulated in the following way. Due to the specifi c military purposes and carried out missions, the BAUV has to recognize underwater objects based on the video images and it has to classify detected objects to the one of three classes: divers, fi shes, AUVs.
To solve this problem deep learning algorithms included in Matlab will be examined, especially pretrained AlexNet deep network and three learning methods described in the previous section. The examined variants of deep network are described in the next section.
To train networks and then to verify them, 150 images were downloaded from the internet: 50 images with divers, 50 with fi shes and 50 with Remotely Operated Vehicles ROVs and AUVs. The fi rst 50 images contain one or two divers in diff erent shots with background of the green-yellow or blue water and yellowbrown and grey bottom ( Figure 3). Sometimes the divers were visualized on the background of underwater infrastructure. The next 50 images include single fi shes or swarm of fi shes (Figure 4). Similarly to the previous pictures there were taken in diff erent waters (colour, visibility, etc.). The whole images of the last group present diff erent constructions of ROVs/AUVs with various equipment (lighting, manipulator, sonar, etc.) ( Figure 5).
As we can see, the collected photos do not present simple examples of divers, fi shes and ROVs/AUVs. They were taken in diff erent waters, by diff erent photographers and in various scale and number. It seems that it will be hard to obtain optimal solution of deep network, taking additionally into account the fact that some people can also have problems with classifi cation of the images.

NUMERICAL RESEARCH / Numeričko istraživanje
The research was carried out in two stages. In each stage 12 variants of deep neural network were trained and then verifi ed. The goal of the fi rst stage was to examine training methods. Therefore, in this stage only two simple training parameters were changed (Table II). All the others parameters were default, especially for the RMSProp and Adam training methods: squared gradient decay factor equal to 0.999 and gradient decay factor equal to 0.9 [10]. The goal of the second stage was to examine additional training parameters (Table III). These parameters infl uence on learning rate, which should adapt to the present state of the training process. Additional goal of this stage was to achieve the better results of training and verifi cation process. Taking into consideration SGDM training method, the initial learn rate equal to 0.001 was accepted. This value allows us to achieve 0.0001 value of learn rate after the fi rst learning rate drop assuming that the learning rate drop factor is equal to 0.1.
All the variants of networks were trained and verifi ed 30 times. It allowed statistical analysis of the obtained results of the numerical research. Each time the training and verifi cation images were randomly chosen from the 150 images. It was assumed that 120 pictures were destined for the training purposes and 30 for the verifi cation of the trained networks.
Each training process lasts 20 epochs. This number of epoch was achieved in the result of initial research. The fi rst stage of research was performed using two hardware platforms: -CPU Intel Core i7-6500U 2.5 GHz.
The research enables us to compare the computational effi ciency of CPU and GPU platforms. The second stage of research was only performed using GPU microprocessor system. To estimate training and validation of the networks the following indicators were accepted: -A vav , A vd -average value and standard deviation of accuracies obtained during 30 validation trials (accuracy indicate what part of the images has been recognized; '1' means that all the images were recognized properly). In the Table IV, results of numerical research for the fi rst 12 variants of deep neural network are presented. As it can be seen, the best estimation was received by the networks trained by SGDM method, especially the mean value and standard deviation of verifi cation accuracy. Also quite good estimation was received for the 8 th and 12 th variants of network trained adequately with RMSProp and Adam training methods with the initial learning rate equal to 0.0001 and mini batch size equal to 20. Variant no. 8 is the best taking into consideration verifi cation process, i.e. 5 of 30 validation trials were fi nished with recognizing all the images.
During all the trails calculation time also was registered and archived. The fi rst stage of research was performed over the time 73.8 h using CPU and over the time 3.7 h using GPU. It indicates that the 19.5 times faster execution of numerical research by GPU than CPU was obtained.
In the Table 5, results of numerical research for the second 12 variants of deep neural network are illustrated. As it can be seen, no variant of the network learned by the SGDM method achieved the accuracy comparable to the accuracies obtained for the variants trained in the fi rst stage of research.
Considering the next training method (variants no. 5-8), the parameters accepted in the second stage give better results than received in the fi rst stage of the research. Taking into accounts this variants no. 7 and 8, it can be concluded that the gradient decay factor in RMSprop method should be quite large equal or greater than 0.9.
Considering the last tested method (variants no. 9-12), the both coeffi cients: squared gradient decay factor and gradient decay factor should be larger than 0.9 because better verifi cation accuracy was received for the greater values default in the fi rst stage of research than the values used in the second stage of research.
The results of research included in the Table 4 and 5 show that the selection of proper training method and its parameters is very important and has large infl uence on the fi nal results.

CONCLUSION / Zaključak
In the paper, the pretrained AlexNet deep neural network and 3 gradient training methods accessible in Matlab were tested for the problem of underwater objects recognition. The AlexNet network and training methods in Matlab allows us to obtain correctly recognizing underwater objects deep neural network. All the training methods give the comparable results. Each of the method needs accurate tuning all the training parameters. Using GPU hardware gives you almost 20 times faster Based on the obtained results from the second stage of research comparing to the results received in the fi rst stage, it can be underlined that the optimization method, e.g. Genetic Algorithm GA should be applied for searching the best values of training parameters. Moreover, to counteract overfi tting the training data the database with larger number of underwater images is needed. Partial solution of too small database is the data augmentation, what also will be applied together with increase of the underwater images' number in the future research.
In the fi nal step of the research, the underwater images registered in the likely area of the BAUV operation should be used for training process. Then, the verifi cation in real environment with the deep neural network implemented on Nvidia Jetson TX2 should be performed.