Evolutionary Algorithm-based Feature Selection for an Intrusion Detection System

Keeping computer reliability to confirm reliable, secure, and truthful correspondence of data between different enterprises is a major security issue. Ensuring information correspondence over the web or computer grids is always under threat of hackers or intruders. Many techniques have been utilized in intrusion detections, but all have flaws. In this paper, a new hybrid technique is proposed, which combines the Ensemble of Feature Selection (EFS) algorithm and Teaching LearningBased Optimization (TLBO) techniques. In the proposed, EFSTLBO method, the EFS strategy is applied to rank the features for choosing the ideal best subset of applicable information, and the TLBO is utilized to identify the most important features from the produced datasets. The TLBO algorithm uses the Extreme Learning Machine (ELM) to choose the most effective attributes and to enhance classification accuracy. The performance of the recommended technique is evaluated in a benchmark dataset. The experimental outcomes depict that the proposed model has high predictive accuracy, detection rate, false-positive rate, and requires less significant attributes than other techniques known from the literature. Keywords-classification; feature selection; teaching learningbased optimization; intrusion detection

INTRODUCTION Despite the increasing alertness in cyber-security problems, the present ongoing solutions are not suitable for shielding computer applications or undertaking security frameworks against the risk from consistently ever-propelling organized assaults [1]. Adaptable security methods were developed to solve this issue, but they turned out to have further issues. Typical cyber-security methods are insufficient to entirely defend web computer security, since they face problems from intruder attempts at the initiation of the security procedure, e.g. user authentication and firewalling [2,3]. Thus, a different line of threat protection is acutely mentioned in the Intrusion Detection Systems (IDSs). An IDS is a program that observes the internet for venomous actions and policy contraventions [4]. At present, IDS along with safe-guard applications have resolved into an indispensable element of computer security of most companies. The union of the above-mentioned security machines provides increased opposition in network-attacks and improves network security.
II. RELATED WORK Gradually, inexhaustible applications, e.g. choice and order models have been put in to intrusion datasets (i.e. KDD CUP 1999) for detecting network problems and attacks. Attribute selection with learning algorithms couldn't control or scale to very large volumes of datasets [5]. To beat this impediment, authors in [6,7] proposed another hybrid feature selection technique that diminishes the non-applicable features and selects the best ideal component subsets. The recurring pattern study in [8] indicated that individual hunt calculation locates the most suitable subsets that amplify information over-fitting, while a probing interest is less prone to information over-fitting in the part assurance, developing a modest number of tests [9]. Authors in [10] proposed the use of ELM and alpha profiling to diminish the required time while superfluous highlights were disposed of utilizing a group of separated, relationship and consistency-based feature selection procedures.

A. Filter-based Methods
Optimum and appropriate feature subset selection is a task accomplished by choosing the qualities dependent on the high connections of concerning classes and uncorrelated features. From the Conditional Mutual Information Maximization (CMIM) method, Feature/attributes Subset Selection (FSS) is conducted depending on maximizing conditional mutual information [11] regarding the class. In addition, it is extremely close with class attributes and uncorrelated to attributes. It makes a compromise between the predictive power of the nominated competitor (significance for the class carrier) and its freedom from all recently chosen attributes. Mutual Information (MI) estimation between the class label y and attributes X is calculated in: where ‫ܪ‬ሺ‫ݕ‬ሻ and ‫ܪ‬ሺ ௬ ሻ show the entropy and conditional entropy of the class change respectively. Some writers have mentioned issues using the Mutual Information-based Feature Selection (MIFS) technique [7,12]. Therefore, we used this strategy to decrease the readability between class y and data attributes as shown in (2). The primary objective of CMIM is to choose the final feature subset that conveys as much information as possible from the sample S: where ‫ܯ‬ ெூெ estimates the mutual information between the full features set ‫ݔ‬ and certain features ‫ݔ‬ regarding class label y, whereas S shows the subsets of the selected features. ‫ܫ‬ ቀ‫;ݕ‬ ‫ݔ‬ ‫ݔ‬ ൗ ቁ measures the quantity of the classification information that ‫ݔ‬ affords when ‫ݔ‬ has been selected [13]. Selected feature subset S cannot provide this information. As comparison to ‫ܫ‬ ሺ‫;ݕ‬ ‫ݔ‬ ሻ , ‫ܫ‬ ቀ‫;ݕ‬ ‫ݔ‬ ‫ݔ‬ ൗ ቁ does not contain the superfluous data of pair wise attributes for categorization.
The importance of the input attributes defined by the JMI is shown in (3): where ‫ܫ‬ ൫‫ݔ‬ ; ‫ݔ‬ ; ‫ݕ‬൯ signifies the mutual information between the novel attribute subset ‫ݔ‬ and the selected attributes ‫ݔ‬ regarding class y. In linguistics of mutual information, the determination of attribute choice is to reduce attribute subsets S with N attributes with a maximum holding on the target class c. This structure, called Max-Dependency, has the form of: In (3), the holding among attribute ܺ is resultant and can have a high value [14]. The correspondence between readability between attributes is expressed in (5) and (6): The incorporation (i.e. integration) of (5) and (6) is known as the Minimal-Redundancy-Maximal-Relevance (mRMR) [15]: where ‫ݔ‬ is a selected subset of attributes S and ‫ݔ‬ is a native feature set.

B. The Proposed Ensemble Feature Selection
The pre-owned Feature Selection strategies are mRMR, JMI, and CMIM which can relegate the position of the IDS datasets and the output is aggregated utilizing a combination strategy [7].

C. Frequency Vote
Frequency Vote (FV) is a cooperative decision making framework that has been proposed as more useful than other increasingly complex plans [16]. Thus, we can follow the most voted prediction as to the last prediction or expectation as per (8): where £ shows the number of attribute choice methods, and L is a selection of some attributes. For attribute j, the sum ∑ ݀ , £ ୀଵ tabulates the number of votes for j.

D. Using Teaching Learning-Based Optimization (TLBO)
TLBO [17][18][19] is the best and most powerful metaheuristic method to apply high convegence rate with less adjusting parameters. It is an easy and simple computation of tuning the control parameters with less memory requirments. The working methodology of the TLBO algorithm can create better evaluation outcome [20]. The position of the ith learner is : ܺ , = ൛ܺ ,ଵ , ܺ ,ଶ , … , ܺ , ൟ (9) where ‫ܮ‬ shows the lower limit and ܷ shows the upper limit of the ‫ܦ‬ dimension in the search area ܺ , ∈ [ ‫ܮ‬ , ܷ ] [21]. The learner ܺ is unplanned, initialized in the search area [22]. The development (i.e. evolution) of ܺ , is generated by: where i=1, 2, 3, ..., nܲop, ݇ = 1,2,3, … , ‫ܦ‬ , ‫ݎ‬ ଵ signifies the unplanned variable, ‫ܮ‬ , shows the lower limit and ܷ , shows the upper limit value, and nPop denotes the population count [23]. The simulation of an old-style initiation procedure is arranged into two critical stages of the TLBO calculation: the teacher stage and the learner stage.
In TLBO algorithm, the teacher is a quantification of obtaining an ideal output gained from optimization problems. Therefore, the teacher can grow the mean result of a classroom to a specific result which relies on the ability of the complete classroom. Let ‫ܯ‬ , = (1/nܲop) (Σܺ , ) be the mean value of the particular subject where k=1, ‫.ܦ,…,2‬ Equation (11) shows the updating equation process: where ܺ ௧ is the greatest begineer of the embrance group (i.e. population) at the current duplication of the algorithm, ‫ݎ‬ ଶ represents random numbers, ܶ behaves as a teaching element that chooses the merit of the mean to be changed. In each iteration, ܺ , ௪ is updated from the old merit ܺ , ௗ . ܺ , ௪ and ܺ , ௗ denote the k-th beginer choice after or before it is modernized by the teacher.

E. The Fitness Function
The fitness function must maximize the categorization Accuracy of the calculations accomplished by the best attributes during the progresive (i.e. evolutionary) process, which is defined as: Accuracy = ்ା்ே ்ା்ேାிାிே TP, TN, FP, and FN stand for True Positive, True Negative, False Positive, and False Negative respectively.

III. EXPERIMENTAL RESULTS AND DISCUSSION
During the experiments, every record had features such as feature name, records, and feature portrayal. Most IDS numerical studies have been performed on NSL-KDD [24]. This data set have varying data importance and feature integrity. Authors in [25] analyzed the deliberate intrusion dataset called KDD Cup 1999 [36]. Every record is tagged as

A. Results and Discussion
Different exploratory tools and techniques were used on the NSL-KDD dataset (see Table I) [14,25,26]. The classification performance is estimated with the assistance of the support vector machine categorization with four execution variables. These exhibition measures, along with Accuracy, are [27][28][29]

B. Result Comparison
Tenfold cross-validation was applied to ELM [7] and other classifiers, namely SVM [30,31] and NB in the IDS dataset. Table I shows the comparison of the performance of the proposed algorithm with existing known algorithms. The result shows that the proposed algorithm performs better on the basis of parameters like feature, DR, FPR and Accuracy in the same data set. Only five of the attributes have been selected by the proposed method which can identify intrusion attacks in the network with maximum Accuracy. IV. CONCLUSION In this study, a novel hybrid model called EFS-TLBO with ELM is proposed, to easily identify threats by using the attribute choice algorithm [7] which increases the perceptive power for better class distinction. For exhibiting the superiority of the proposed technique, the NSL-KDD database of intruders was employed. The results show that the proposed technique provides an important depletion to the required features and outperforms the advanced attribute selection techniques from the literature. The practical results show that the suggested technique achieved an accuracy of 99.95% in the NSL-KDD data set of intruders [36,37], surpassing the other techniques.
Future work is going to be focused on multi-objective algorithms that combine ensemble filter and classification methods for pattern analysis and intrusion attack detection. Also, some different optimization algorithms for ELM parameter optimization are going to be researched.