Recurrent Neural Network-based Path Planning for an Excavator Arm under Varying Environment

This paper proposes an algorithm to generate the reference trajectory based on recurrent neural networks for an excavator arm working in a dynamic environment. Firstly, the dynamic of the plant which includes the tracking controller, the arm, and the pile is appropriated by a recurrent neural network. Next, the recurrent neural network combined with a Model Reference Adaptive Controller (MRAC) is used to calculate the reference trajectory for the system. In this paper, the generated trajectory is changed depending on the variation of the pile to maximize the dug weight. This algorithm is simple but effective because it only needs information about the weight at each duty cycle of the excavator. The efficiency of the overall system is verified through simulations. The results show that the proposed scheme gives a good performance, i.e. the dug weight always remains at the desired value (nominal load) as the pile changes its shape during working time. Keywords-adaptive controller; excavator arm;neural network; path planning; uncertainties


INTRODUCTION
The automatic control of an excavator system is a major issue in the field of excavator research [1,2]. By unmanned operation, the excavator systems not only keep the workers safe but also increase efficiency. However, in order to finish the task without an operator, the trajectory should be designed carefully. Thus, many researches on creating the trajectory for the general manipulator and the excavator have been published. In [3,4], the 3D trajectory is built by using the information from sensors, cameras, and scanners. The advantage of these types of feedback signals is that they can track the change of the working environment. However, if the working environment lacks light or is dusty, the reliability of the obtained imagines can be reduced. In [5], a laser scanner is used to get the shape of the pile. From this information, the model of the pile is built and divided into small layers. The local path is designed according to these layers before creating the global path. In [6], the integrated physics-based model is presented for a mobile excavator. In this work, the current position of the excavator arm is returned to the control system to predict the trajectory for the next cycle. In [7,8], a neural network is used to calculate the optimal trajectory for the excavator arm. However, these trajectories only work well in static environments. In [9], excavation trajectories are generated using the velocity and the acceleration of each hydraulic cylinder. The generated trajectories are optimal and stable but the velocity and the acceleration are difficult to measure. In [10], with the purpose of optimizing the efficiency for a semi-automated or fully automated excavator system, the trajectory is classified into 4 categories based on the location and the angle of the bucket. From these 4 trajectory types, the operator or the automatic controller will make a suitable decision as the environment changes. The problem of optimal working time and torque motion of the excavator in consideration with the boundary of the actuator's ability is the role of path planning [11]. In this scheme, the trajectory is planned based on B-spline technique using information about soil parameters and system dynamics.
Neural networks are known as a good way of dealing with path planning problems. In [12][13][14], neural networks are used to generate collision-free trajectories for robots. In [12], the robot works in a dynamic environment with U-shaped and varying obstacles. The reference trajectory of the robot is generated by using a topologically organized neural network. In this network, the dynamic of each neuron is characterized by a shunting equation. The same neural network topology is used in [13] for a multirobot system with moving obstacles. The trajectory planning for the manipulator robot based on a neural network model of the harmonic function is introduced in [14]. Trajectories are built based on neural networks to optimize the jerk in [15] or the working time in [16]. Neural networks are used in the field of path planning with many different purposes [17][18][19].
In this paper, an algorithm based on neural networks is proposed for an excavator arm working in a dynamic environment. The dynamic model of the inner loop which includes the tracking controller, the excavator arm, and the pile is approximated by a neural network. A second, Recurrent Neural Network (RNN) combined with the Model Reference Adaptive Controller (MRAC) algorithm is used to calculate the reference trajectory for the system. By this combination, the calculated trajectory can be adjusted after each duty cycle to adapt with the change of the pile and the dug weight remains around the nominal value despite the reduction of the pile. The effectiveness of the overall system is verified through simulations. The contributions of the proposed algorithm are concluded as: • It can work in a dynamic environment, something that is restricted in [8].
• The dug weight remains in an acceptable range during the digging process although the material is reducing.
The proposed scheme uses only the feedback signal from the weight sensor to generate the path. This is more reliable than cameras or scanners [3][4][5] and it is easier to measure than velocity and acceleration [9].
II. PROBLEM DESCRIPTION In order to maximize the efficiency of the excavator, one of the requirements is that during the digging process, the dug mass at each time must be maintained in a given acceptable range. However, the shape of the pile is changing while the excavator is digging up the material. Therefore, if the trajectory of the excavator remains the same and the dug stack declines over each period of the process, the requirement will not be guaranteed. Hence, the excavator driver has to observe the excavated weight and trajectories in the previous digging to choose an appropriate trajectory next time with an expectation that the mass in the next period will be acceptable. A trajectory generator based on this structure is proposed to replace the excavator driver, which will completely automate the digging process. Before going to solve the described problem, the following assumptions are made: • The pile lies above the ground and has a triangular shape as shown in Figure 1.
• After a digging cycle, the material on the top of the pile will lie down and fill in the space that was taken, so the pile will maintain the triangular shape with a different slope.
• The dug weight in each period is limited by the volume of the bucket.
• The trajectory in each period is represented by a set of parameters.
The task of the trajectory generator is to observe the weight and the trajectory's parameters in the previous period in order to adjust suitably the trajectory parameters in the next one. In order to execute the algorithm, the excavator's trajectory has a parabolic shape, however, in fact, the excavator's trajectory often has a more complex shape due to many practical conditions. The parabolic shape is described by: where K, h and p are scalars. In this work, we have just change the parameter h, which corresponds to the coordinate of the vertex on the horizontal axis. Therefore, each trajectory will be represented by a single value h. The block diagram of the overall system is illustrated in Figure 2. It can be seen that the desired trajectory which is generated by the module is denoted as q*. The desired trajectory is a reference for the excavator controller, and the output of the excavator is q, which affects the dug weight. If the tracking controller has a good performance, the equivalent model of the controller and the excavator can be viewed as a dynamic model which has a static gain of one. In this paper, the equivalent model is considered as a first order system as follows: This means that the trajectory error reduces exponentially over time. Block diagram of the overall system.

TRAJECTORY
In fact, one does not have complete information about the dynamics of the pile. Therefore, a decision on the appropriate trajectory can only be done in practice by observing the pile's response in accordance with the trajectories that the excavator made. Hence, the pile can be considered as a black box, and designing a controller for a black box leads to the idea of the neural network-based MRAC. The control structure of the neural-network based MRAC can be seen in Figure 3. In this Figure, the trajectory generator block can be seen as a controller of the outer loop. This controller is responsible for keeping the dug weight within the specific range as the shape of the pile changes. This is an offline process, the first step of the design is identifying the plant with the usage of neural network, and the next step is creating a neural network-based controller according to the identified plant in the previous step. Note that, the plant now contains the pile, the excavator, and the tracking controller. The input of the plant is the desired trajectory q*, whereas the practical trajectory created by the excavator is q because of the control error which should affect the dug weight. Then, the robot excavates the pile with the trajectory q and obtains a weight m. In the identification problem, because the user concerns only about the dug weight after each time the excavator finishes, the object can be viewed as a discrete-time system. A discrete-time system can be described as: Discrete-time system identification can be carried out by using a nonlinear autoregressive network with exogenous inputs denoted as NARX. The NARX's architecture is illustrated in Figure 4, where the input and feedback output of the network pass through the Tapped-Delay-Lines (TDLs) which make up the dynamics of the network. A feature that makes the NARX become regularly usable in identification problems is that it has only a feedback from the network's output. Therefore, in the training process, one can remove this feedback and consider the feedback output as a second input to the network. The opened architecture, referred as a series-parallel architecture (Figure 4), is used in one-stepahead prediction. The training process of the opened network is much simpler than that of the closed network, because one can utilize the traditional back-propagation algorithm for the opened network training. After the opened-network training process is completed, the network can predict one-step ahead, however the objective is to identify the system, which means that the network can make a multi-step-ahead prediction. Therefore, the performance of the closed network is often not good enough. Hence, one should continue training the closed network based on the opened network. However, training a closed network can still confront the problem of gradient vanishing, which is a well-known problem in the training of RNNs. Thus, the original training set is divided into subsets which have a smaller size. The network is trained with these subsets. When the training process with these subsets is over, the size of each subset is increased by a small number, and the training restarts with the new bigger subset. These steps are repeated until the size of the subset becomes equal with the size of the original training set. The algorithm is shown in Figure 5, which begins with the size of each subset equal to the maximum number of delays (MD) in the TDL of the network plus one that is equivalent to the training opened network. In Figure 5, MD is the Maximum number of Delays of both input's TDL and feedback output's TDL. Q, c, n are the length of the original batch, the length of the mini batch, and the number of subsets' respectively, x is the network's parameters, e ik represents the error which corresponds with the k th element of the i th subset, l i (x) is the squared error which corresponds to the i th subset, L(x) is the loss function which is the Mean Square Error of the original batch. Note that each adjacent subset has some similar elements, with size equivalent to MD in the TDL. These elements are used as the initial conditions for the network. To guarantee that the objective function includes all elements in the original training batch, a subset is created to overlap the adjacent one. Once the closed network has finished training, we will create the controller network based on the object network. To train the controller network, the whole structure that contains the plant network and the controller network must be considered as a major network. Then, this network is used to approximate the given reference model. However, during the training process, only the controller network's parameters are adjusted in order to optimize the objective function, while the object network's parameters are kept unchanged. The well-known backpropagation algorithm is used to train the major network, the error between reference model's output and this network's output back-propagates through layers of the network, and the controller network's parameters are updated.

IV. NETWORK TRAINING AND SIMULATION
A. Plant Network Neural networks are known as a powerful technique to model a system with a mathematical model unknown or difficult to build [20,21]. In this section, a neural network is used to model the plant which consists of the pile, the excavator, and the tracking controller. The first step of training the plant network is collecting data from the object. The performance of the network depends significantly on the way the data cover and describe the operating range of the object. In system identification problems, the data are often obtained by generating the input signal in the form of a sequence of step functions that have random durations and amplitudes. In this case, the reference input signal is produced with a random amplitude within the intervals [0.3, 0.5] and [0.5, 0.8] in the first 100 and the next 100 steps respectively. The duration of each step function is d∈ [1,10]. The main reason is that the collected data need to cover the operation range of the object. In the first 100 steps, because the size of the pile is still large, one should use a trajectory with small h. In contrast, in the next 100 steps, because a large amount of the pile is already taken, one needs to shift the range of the trajectory deep inside the pile with the expectation that the excavated weight is still acceptable. This signal is applied to the plant, then the plant's output is collected. The obtained data are shown in Figure 6. Note that there are 40 mini batches whose length is 200 samples, however Figure 6 only shows an instance of the mini batches. Before training, the input data are normalized in the range [-1, 1]. Note that h is the representation of the desired trajectory, the control error in the controlling robot affects the practical trajectory, so the dug weight could not be the same as expected. In this case, the collected data include this control error. When the training data are available, the plant network can be designed. This network is illustrated in Figure 7. The input of the network passes through a 0:1 TDL. The TDL of the input has a current element (zero delay) due to the pile feature. When one adjusts the excavator's trajectory, the dug weight immediately changes depending on the shift of the trajectory, therefore the current component is added to demonstrate this feature. The TDL of network's feedback output is 1:5. The Bayesian regularization algorithm is used for training. The training result is shown in Figure 8. Then, the network is tested with the testing data. The test result is illustrated in Figure 9.

B. Controller Network
First of all, one must choose an appropriate reference model. Note that the plant network is of the 5th order, thus, a fifth-order dead-beat system is utilized as a reference model. The transfer function of the chosen reference model is: The next step is collecting data from the reference model for the controller training set. The training data for the controller is demonstrated in Figure 10. The left side in Figure  10 shows the input data of the training set. The right side shows the response of the reference model. Note that the initial value of the reference model's output is the limited dug weight, this means that the dug weight can reach the maximum value in the first period. It can be seen that although the controller only operates with unit set-point, the input has different values that range from 0.4 to 1. The reason is that the controller has to work with different states of the object, so this type of input helps training the controller with practical states of the object, e.g. the controller can be trained to give a suitable trajectory that will help the dug weight increase from 0.5, which is its current value, to the desired value of 1. The training data have 8 mini batches, each mini batch having a length of 90 samples. Figure 10 illustrates an example of the total training set. The major neural network's architecture, which includes the plant network and the controller network, is shown in Figure 11. The transfer function of the output layer of the controller is the tansig function, which means that the output of the controller is within the range from -1 to 1 which is equal to the normalized input range of the plant network.   The used training algorithm is the Bayesian regularization. Figure 12 describes the training result of the major network. It can be seen that the training error is very small, therefore the controller can be tested on the practical object.

C. Stability Analysis
The system contains a plant neural network and a controller neural network. It is a five-layer discrete-time RNN. To analyze the stability of the RNN, there are several stability criteria [22][23][24][25]. In this case, the stability criteria from [22] were applied. To do so, the recurrent neural network will be put in the standard form as: where x(k) is defined as a vector of layers' output in the past (such as a^1(k-1), a^2(k-1), a^2(k-2), a^2(k-3) ..., where a^i is the output of layer i, i=1,2…5), W^1 and W^2 are the weight matrices of layer 1 and 2. The size of the state vector is 41 in this case. We did obtain the matrices W^1 and W^2, b and f, but the upper bound and lower bound matrices for the function vector f(41×1) must be found as the requirement of the stability criteria in [20]. This leads to the problem that the equilibrium point of the system in (5) must be determined. It is not easy to find the equilibrium point in this case because its size is 41 and the function vector includes nonlinear functions such as the tansig. Thus, in this work the stability of the system is verified through simulations.

D. Simulation Results
In order to verify the effectiveness of the proposed scheme, simulations are done in Matlab/Simulink. In the simulations, the initial path of the arm is set so that the dug mass is 0.4. The trajectory generator should calculate the reference trajectory for the plant to meet the requirement. The dug weight desired value is tracked during working time despite the change of the pile. Figure 13(a) shows the change of the pile during working time. Initially, the slope of the pile is about 62 degrees. Because of the digging, this slope will decrease gradually and reach the value of 46 degrees at the 42 th step. Figure 13(b) shows the values of h for each step. The initial value of h is set at -0.9 and after 50 steps its value is 0.8. Corresponding with these 50 values of h, 50 trajectories will be generated to adapt to the change of the pile.   Figure 14 illustrates the dug weight response of the system during working time. In the first step, the excavator works with the given trajectory and the dug weight is 0.4. The information about the trajectory and the weight will be given to the controller to calculate the reference trajectory for the next step. After 5 steps, the dug weight accomplishes the desired value and remains at this state during the working time despite the change of the pile. V. CONCLUSION A simple but efficient algorithm has been proposed to design the reference trajectory for an excavator arm in a dynamic environment. The model of the tracking controller, the arm, and the pile is approximated, at first by an RNN. A second RNN combined with the MRAC algorithm is used to calculate the reference trajectory of the system. By this combination, the calculated trajectory can be adjusted after each duty cycle to adapt with the change of the pile so that the dug weight remains around the nominal value despite the reduction of the pile. The effectiveness of the overall system was verified through simulations. The results show that the proposed scheme gives a good performance, i.e. the dug weight always tracks the nominal value as the pile changes its shape during working time.