Development and Assessment of Feed Forward Back Propagation Neural Network Models to Predict Sunshine Duration

The duration of sunshine is one of the important indicators and one of the variables for measuring the amount of solar radiation collected in a particular area. Duration of solar brightness has been used to study atmospheric energy balance, sustainable development, ecosystem evolution and climate change . Predicting the average values of sunshine duration (SD) for Duhok city, Iraq on a daily basis using the approach of artificial neural network (ANN) is the focus of this paper. Many different ANN models with different input variables were used in the prediction processes. The daily average of the month, average temperature, maximum temperature, minimum temperature, relative humidity, wind direction, cloud level and atmospheric pressure were used as input parameters in order to obtain the daily average of sunshine duration (SD) as the output. The eight-year data were divided into two categories. The first category covers whole years (annually) and the second category is seasonal. To recognize and assess the influence of different input parameters on sunshine duration, six models of ANN have been evolved. The findings showed that in the annual models, the outcomes of RMSE, MAE and R for the model with input parameters (month, cloud level and average temperature) were the best results 1.82, 1.175 and 0.89, respectively. As for the season models, the outcomes of RMSE, MAE and R for the autumn season were the best results 1.450, 1.009 and 0.94, respectively. Accordingly, the performance of the artificial neural network is considerably effective in predicting the sunshine duration.


Introduction
Solar energy is considered one of the most important energy resources in the world. The request for solar energy has recently increased because of the limitation of energy resources and the increased demand for energy. As result, researchers have been conducting many studies, and they will keep on searching to get the advantages of solar energy [1]. In general, solar radiation measurements are made with actinography which are often unreliable due to the usual calibration requirements for the thermal sensitivity of mechanical additives in their sensors [2]. The most precise measurements are possible to be gained via building networks that calibrate modern pyranometers; however, this is not the case for many nations due to the fact that this sort of tool is expensive [3]. Many researchers have proved that the relationship between solar radiation and sunshine duration (SD) is highly correlated [4]. This implies that when someone possesses knowledge about the sunshine duration over a certain region, one can also have knowledge concerning the solar radiation over the same region, and the opposite can be true as well [5]. Accordingly, long-term precise measurements of global SD have become very necessary and significant for climatologic and other uses [6]. Indeed, SD measurements have been fulfilled more precisely than solar radiation measurements at several locations across the world using less expensive tools for long periods of time [7]. Although SD can be observed practically everywhere, meteorological station networks are still inadequate and/or restricted in some areas of the world because of issues relevant to geography and sometimes economics [8]. This implies that a number of countries have restricted or sometimes unreliable SD maps and databases [9]. Methods of estimation are being developed for the locations where SD measurements are unavailable or unreliable, particularly in remote and difficult-to-reach areas, in order to address this problem [10]. So far, many researchers have focused their efforts on estimating global solar radiation; many research papers can be found in the literature [11]. In contrast, only a few studies have been conducted to estimate SD, its relationship with factors relevant to the geography and atmosphere of the region, and its difference across space and time [12]. Lately, artificial neural network (ANN) has been widely employed to solve a wide range of issues in numerous fields, including engineering, the science of climate, estimation, and economics. ANNs are powerful tools for simulating nonlinear systems [13]. Many prior types of research have demonstrated that the use of ANN approaches is a substitute and powerful key for predicting solar radiation globally in comparison with standard regression models. Several meteorological variables have been used as input parameters in ANN-based solar radiation studies to obtain solar radiation as an output [14]. Using a neural network model, a radial basis function (RBF) has been used to construct contour maps for Oman's sunshine proportion and sunshine hours [15]. In their investigation, they employed latitude, the month of the year, and the latitude and longitude were used as input variables. In Saudi Arabia, a study calculated the SD utilizing the two algorithms of neural network and the input variables were the maximum length of the possible day, the number of months, longitude, latitude, height and extraterrestrial solar radiation in a specific location [16]. In Central Africa, the research assessed the long-term differences in sunshine duration and estimated its interplay with meteorological variables from 1950 to 2010 [17]. A new method has been proposed to estimate sunshine duration utilizing hourly cloud data from the satellite of geostationary metrological in China [18]. In Saharan Algeria, the approach of ensemble learning has been used to measure and estimate sunshine duration [19].
The aim of the proposed paper is to estimate the average daily sunshine duration (SD) for future planning utilizing ANNs based on multilayer preceptor (MLP) feedforward (FF) techniques in Duhok city, Iraq.

Study area
Duhok governorate lies in the north of Kurdistan, region of Iraq, at (43.20 -44.10) longitude, (36.40 -37.20) latitude [20]. Duhok city is close to Syrian and Turkish borders and surrounded by more than one mountain, the mountain of Bekher from the north and the mountain of Zawa from the south, while it is plain from the west and east. The city has four seasons and each season has a different climate. Based on the Koppen climate classification system, summer is sunny hot, cold in winter and fall and spring are wet and partly sunny.

Data collection
Meteorological data (inputs and output) were gathered from the Directorate of Meteorology and Seismology based in Duhok city for the period from 2013 to 2020.
The annual average of weather data is reported. The measured climatic data consists of daily records of the month (M), average temperature (Tave. (°C)), maximum temperature (Tmax (°C)), minimum temperature (Tmin (°C)), relative humidity (RH (%)), wind direction (WD), cloud level (CL (1/8)), atmospheric pressure (AP (mbar)) for inputs and sunshine duration (SD) for output. In the proposed paper, data were classified into two groups: the first group contained the whole data from 2013 to 2020, while the second group arranged the data taking three months of each of the four seasons (winter, spring, summer, and autumn) for the whole period from 2013 to 2020.

Artificial Neural Network (ANN)
In this study, ANN of the multilayer perceptron (MLP) sort that consists of the single input layer, single hidden layers and a single output layer, was used for estimating sunshine duration using M, Tave, Tmax, Tmin, RH, WD, CL and AP. MLP networks are made up of neurons organized in layers (input, hidden and output layers), and front line connection is used to link neurons to the next layers [21]. The number of parameters for input and output determines the number of nodes for the input and the output layers, respectively [22]. The ANN's performance is determined by the number of nodes in the hidden layer. Since there are no formal guidelines for determining the optimal number of hidden nodes for a given task, this network variable is frequently modified using a set of trial and error criteria [23]. Fig.1 depicts the general configuration of a three-layer neural network employed in this paper. In this configuration, there are eight neurons (M, Tave, Tmax, Tmin, RH, WD, CL and AP) in the input layer, in a hidden layer there are neurons and in the output layer there is one neuron representing sunshine duration. A neural adjusts the weights of the links between the components to fulfill a specific task and each link has its weight. The processing part includes two sections. The first section assembles the weighted and biased inputs. The second section is basically a non-linear filter, known as an activation function or a transfer function. The activation function works as a squashing function so that a neuron's output in a neural network is between particular values (-1 and 1 or 0 and 1). Fig.2 describes this process. The tangent transfer activation function is employed in this study for both the hidden layer and the output layer. This function considers one of the most popular activation functions. The tangent function is a continuous function that gradually changes between two values, often -1 and 1 that is defined using the equation below: where: and represent the weighted sum of inputs to the kth hidden neuron and output from that neuron, respectively [24]. The MLP network is trained by determining the connection weights and biases that reduce an error function between the output of the current network and the values of the corresponding target in the training set [25]. The algorithm of feed-forward (FF) was used to train the MLP network in this paper. In this study, Levenberg Marquardt (LM) was chosen among the several FF training algorithms available. The LM algorithm is extensively used in a variety of disciplines since it is quicker and gives better outcomes than other training techniques [26]. For the purpose of calculating, the annually daily sunshine in this study, six combinations of input parameter or meteorological predictor variables were evaluated for the whole period 2013-2020, as indicated in Table 1, to get their impacts on SD, and for four seasons, one model that incorporated all input parameters was studied. The dataset was divided into 70% for training, 15% is used for testing and 15% used for validation. For every combination of input parameters, the training algorithms of Levenberg-Marquardt and tangent transfer functions with one hidden layer and varying numbers of neurons were examined. The goal is to find the best architecture of MLP-FF with the highest coefficient of determination, the lowest proportion error of mean absolute error and the lowest proportion error of root mean square [27,28].
Since the objective of the proposed study was to estimate the duration of the sun's brightness, ANN has one parameter. The measured sunshine duration values were used as the target product. To ensure model consistency, the source variables were completely normalized within the range ((-1.0) -1.0) and then restored to their original values after simulation using the following formula: where: denotes the normalized value, denotes the real value, and and represent the lowest and highest of real values, respectively [29].

Statistical criteria
There are three types of statistical criteria that are employed to check the performance of MLP-FF models and assessment of prediction sunshine duration which are Root Mean Square Error (RMSE) [30], Mean Absolute Error (MAE) and correlation coefficient (R). The statistical indicators RMSE, MAE and R are applied using the following Equations respectively: where: , and , are the predicted and actual values of sunshine duration, is the number of samples, ̅̅̅̅ and ̅̅̅̅ are the medium of the measured and predicted of .

Results and discussion
In this study, multilayer preceptor (MLP) feed-forward (FF) techniques with the Levenberg-Marquardt Back Propagation training algorithm using MATLAB R2016a were utilized for the purpose of estimating the average Sunshine Duration on a daily basis (SD) for the Directorate of Meteorology and Seismology station in Duhok city [20]. The input data of the models were the daily average values of M, Tave, Tmax, Tmin, RH, WD, CL and AP, and the output was the daily average of sunshine duration.
Datasets for the period from 2013 to 2020 were categorized into two categories. The first category covers all years and the second covers seasonally for all years. Daily average SD values were calculated for each category (annually, seasonally) by the suggested ANNs and a comparison was made between their averages and the recorded values at the meteorological stations from 2013 to 2020) as the following: 1. The first part (annually): First, a feed-forward ANN was used to predict the daily average SD values based on the daily average M, Tave., Tmax, Tmin, RH, WD, CL and AP of all years (annually). After several experiments, it was found that a network with M, CL and T input parameters, 15 hidden neurons in a single layer and a single output unit was adequate for such application. Table 2 presents the six models for predicting daily sunshine duration with different input parameters for the whole year, and the RMSE, MAE and R are calculated for the six models utilizing Eqs. (3), (4) and (5) in the same mentioned order. As evident from Table 2, the model (M3) is the most reliable among the estimated models.
The values of RMSE and MAE for the model (M3) are the lowest values of 1.82 and 1.175, respectively. The correlation coefficient (R) for the same model is 0.89 which means that this model gives a precise estimation of SD for Duhok city. 2. The second part (seasonally): Daily average SD values were predicted for each season by the suggested ANNs and a comparison was made between their averages and the values recorded at the meteorological station. First, for the summer season MLP-FF has been employed to estimate the SD using the M, Tave, Tmax, Tmin, RH, WD, CL, and AP input parameters of the summer season for all years. After many tests, it was concluded that a model with 8 inputs, 7 hidden neurons in one layer and one output unit was adequate for such application as shown in Fig.3.  Fig.4 shows a comparison between the values of the predicted SD and the measured SD values for the testing process. In general, it can be seen that predicted values were near to those recorded at meteorological stations.
Another MLP-FF was used to estimate SD for the autumn season based on the same input as the summer season. A network of eight inputs, 17 hidden neurons in one layer and one output was seen to be better for this case. Fig.5 illustrates the predicted and recorded values of SD for the autumn season. In general, it can be seen that predicted values were quite near to those recorded at meteorological stations. Finally, a model of 8 inputs, 20 hidden neurons in one layer and one output unit was trained on the M, Tave., Tmax, Tmin, RH, WD, CL, and AP to predict the SD for the winter and spring seasons as shown in Figs. 6 and 7. The figures show the testing data of the recorded and predicted SD. In general, it can be seen that the expected values were very close to those recorded by meteorological stations.    Furthermore, to explain these results, R, MAE and RMSE were used for the calculations for each season as shown in Table 3; the graphic representation for the autumn season (best case) was given in Fig.8. R identifies the relationship between the results and the target values for the autumn season as depicted in Fig.8. The R value for the autumn season in the test data was 0.94, which indicates that the expected value is close to the actual value. Fig.8 shows that the R values are 0.928, 0.761 and 0.913 for training and validation and all data are in the same order listed. Table 3 displays the values of RMSE, MAE and R where they were computed using Eqs. (3), (4) and (5), respectively. The results given in Table 3 show that the minimum value of RMSE recorded in autumn is 1.450, with an MAE of 1.009 and a calculated R of 0.94. This implies that the autumn season is slightly superior to winter, spring and summer, and its accuracy is also higher than the accuracy of the other seasons since such large RMSE values usually indicate poor performance.

Conclusions
In this study, MLP-FF was used to predict the SD in Duhok city, Iraq. Six models with different input combinations were modeled using data for whole years between 2013-2020 together and one model for every four seasons. To train the neural network, meteorological data recorded were divided into 70% for training, 15% used for testing and 15% used for validation. The best value obtained for the RMSE, MAE and R for whole years were 1.82, 1.175 and 0.89, respectively, and for the various seasons were: summer 1.961, 1.057 and 0.66, respectively; autumn 1.450, 1.009 and 0.94, respectively; winter 2.068, 1.284 and 0.86, respectively; and spring 1.806, 1.453 and 0.89, respectively. The numerical criteria indicated that the performance of the MLP models is more efficient for the model with input parameters of the month, cloud level, mean temperature and autumn season than other models. In summary, the city of Duhok consists of different geographical and climatic regions, and the results obtained here appear to be sufficient because the range of SD distribution across the city is not uniform. It can be concluded that the approach of ANN can be successfully used in predicting sunshine duration from existing climate data.