太原理工大学 博士学位论文 基于非均衡数据分类的高速网络入侵检测研究 姓名:赵月爱 申请学位级别:博士 专业:计算机应用技术 指导教师:陈俊杰 20100301 太原理工大学博士研究生学位论文 I 基于非均衡数据分类的高速网络入侵检测研究 摘 要 随着计算机技术和网络技术的飞速发展,层出不穷的网络攻击所造成 的危害越来越大,网络安全面临着严峻的挑战如何在高速网络环境下及 时、高效地处理大量的网络数据包和降低误报率是目前网络入侵检测系统 面临的一个主要难题 对高速网络环境下的入侵检测系统模型进行了研究,提出了基于负载 均衡机制的两阶段入侵检测模型—TSMBLB 模型基于该模型,提出了面 向分层检测的攻击分类方法由于入侵检测系统中所要处理的数据是海量 的、非平衡的,因此,在 TSMBLB 模型的离线建模阶段采用了非平衡数据 分类技术建立检测模型本文的主要创新成果如下: 1.针对高速网络入侵检测系统在处理速度上的不足,提出了 TSMBLB 基于负载均衡机制的两阶段入侵检测模型整体上分为检测和离线建 模两个阶段,检测阶段通过负载均衡器把从网络中截获的数据按负载 均衡算法分流给多个检测器,各个检测器检测的结果提交给分析主机进行 分析处理。
离线建模阶段是对已有的数据集经过预处理器,采用与检 测阶段相同的负载均衡算法进行分流,然后对各部分数据分别学习建模, 建立的模型用于检测该模型通过负载均衡的机制加快数据处理速度, 用异常检测的方法来检测新的攻击 2.为了使攻击检测系统化,构造高效的检测方法,提出了面向分层检 太原理工大学博士研究生学位论文 II 测的攻击分类方法基于 TSMBLB 模型,根据系统能及时检测攻击的先后 顺序对攻击分类,把检测任务分阶段交给各自检测器完成,各检测器之间 遵循:如果能在高层检测到,在低层中就不要再检测的原则,这样可以保 证分类无重复,而且也简化了检测过程,提高了检测效率 3.针对网络入侵检测系统对少数类攻击检测率低的问题,设计了一种 基于 TSMBLB 模型的非均衡数据分类框架 采用 Relief 方法进行特征选取、 改进的 SMOTE 过抽样增加少数类,采用集成学习 AdaBoost 和随机森林算 法建立分类器, 分层 10 折交叉验证方法对预测模型进行了评估, 用精确度、 召回率、F 度量值和 ROC 曲线对分类性能进行了比较实验结果表明,该 框架可有效提高少数类攻击的检测率 4.由于网络数据包中存在大量的“无用”和“噪音”样本,提出了基于最 近邻的快速分层重抽样方法 FHNN。
把原数据集划分成几个子集并对每个 子集分别重抽样,把重抽样结果合并即为目标样本集对每个子集中重抽 样时,先从子集的每个类中随机抽出一个样本作为构造子集,然后用构造 子集对子集上的每个样本最近邻学习,分类不正确则加入构造子集结果 表明,FHNN 方法不仅可以很好的删除噪声数据和冗余信息,尤其是类区 域内样本,减小数据的不平衡度和样本总量,而且由于算法时间复杂度是 线性阶的,在样本数量很大的情况下,运行速度非常快另外,当新的数 据到来时,只需把新的数据集作为一个子集进行 FHNN 重抽样,然后把抽 样结果合并到原目标样本集即可完成更新 关键词:高速网络, 入侵检测, 攻击分类, 非均衡数据, 集成学习, 重抽样 太原理工大学博士研究生学位论文 III RESEARCH ON HIGH-SPEED NETWORK INTRUSION DETECTION BASED ON IMBALANCED DATA SETS CLASSIFICATION ABSTRACT With the rapid development of the network technologies and applications, more and more network attack techniques bring a serious challenge to the network security. How to make sure that the network intrusion detection system has the ability of real-time data analyzing network data packets and reducing false positives under the environment of the high-speed network traffic is becoming a very important problem. Thorough researches of IDS modules are performed, this paper proposed a novel Two-stage Strategy High-speed Intrusion Detection System Model Based Load Balancing. Based this model we proposed a new way of attack classification oriented Hierarchical Detection. As the number of intrusions on the network is typically a very small fraction of the total traffic. In the off-line phase, we use imbalanced data approach to High-speed network intrusion detection. The main contributions of the dissertation are as follows: 太原理工大学博士研究生学位论文 IV 1. In order to solve the problem of efficiency of high-speed network intrusion detection, this paper presents a Two-stage Strategy High-speed Intrusion Detection System Model Based Load Balancing. There are two phases in the framework, in the on-line phase, the system captures the packets from network and split into small according the load balancing algorithm, the results of detection by each sensor should be handed over to the control and analysis host. In the off-line phase, we split the training dataset by the same algorithm from the on-line phase, and then build classification patterns for each sub sets. After create the patterns for intrusion, the modules outputs the patterns as the input of the corresponding sensor. We use the model of load balancing to improve the processing speed and use anomaly detection techniques to detected new attacks. 2. In order to build eficient and realtime intrusion detection method.We proposed a way of attack classification oriented hierarchical detection. Based the TSMBLB model, It classified the attack by order of detecting time, by way of detecting hierarchically and taking task to each sensor, if can be detected in higher layer then unnecessary detected again in lower, to do this not only can ensure that there is no repetition of the classification, but also simplify the way of detection and improve the efficiency of detection. 3. In order to solve the problem of low detection rate of the minority attack in network intrusion detection, we designed a classification frame for imbalanced data sets based the TSMBLB model. We employed the feature 太原理工大学博士研究生学位论文 V selection by the relief algorithm, the SMOTE over-sampling approaches to improve the number of attacks on the network, AdaBoost algorithm by using C4.5 or random forest as weak classifier are used to improve the detection of rare class, classification ability of the learning methods was measured with precision, recall and F-measure and ROC curves for classes from 10-fold cross-validation. Experiments have shown that the frame can reduce the time to build patterns dramatically and increase the detection rate of the minority intrusions. 4. As the network data packets exist in a large number of "useless" and "noise" sample, we proposed a resampling method for learning from imbalanced datasets: Fast Hierarchical Nearest Neighbor.The basic idea of our method is divide the current set more or less equally i。