1、 I摘 要 电子鼻是模拟人和哺乳动物嗅觉系统的一种仪器。电子鼻系统一般由传感器阵列、预处理系统和模式识别系统三部分组成。特征提取方法是一种降维的数据预处理方法,它通过一种变换,将高维样本空间映射到低维空间,以提取有用信息并降低噪音信息,对降低数据处理量,提高模式识别效率有积极作用,能有效改善电子鼻性能。特征提取方法可以将原始高维样本空间映射到二维或三维的低维空间,在低维空间中对原始高维空间中数据的结构和分布进行初步探索。同时还可以通过映射减小或者降低原始样本空间中的噪音,得到一个容量更小,信息含量更高的样本集,以此来解决模式识别过程中的样本空间维数过高引发的问题,从而提高识别效率,改善电子鼻系

2、统的识别性能。 本文通过对可燃性液体的定性识别和四种挥发性有机化合物的定量分析, 比较了四种典型特征提取方法的特点,并对其在定性识别和定量分析中的降维效果进行了分析和讨论。四种特征提取方法为:主成分分析法(Principle Component Analysis, PCA) 、Fisher判别法(Fisher Discriminant Analysis, FDA) 、Sammon映射法(Sammon mapping)和自组织映射法(Self- organization Mapping, SOM) 。 在可燃性液体的定性识别实验中,采用了 6个金属氧化物传感器组成传感器阵列,对四种可燃性液体(汽

3、油、酒精、煤油、柴油)和三种饮料(橙汁、可乐、冰红茶)进行测试。并用四种特征提取方法分别对原始数据集进行了处理,然后通过三种模式识别方法对经特征提取的样本集进行了识别。结果表明,在各种特征提取预处理方法下,可燃类和不可燃类两类样本的正确识别率都可以达到 1 0 0 % ;仅在F i s h e r判别法预处理方法下,经分步识别后,各个可燃性液体类子类和不可燃液体类子类的样本类别的识别率高达 9 6 % 。 各特征提取方法的最佳的投影维数与高维空间中样本分布的特点和各特征提取方法本身的映射准则有关,最优的模式识别方法则与样本数据的分布有关。 在对四种挥发性有机化合物的定量分析实验中,利用 1 4

4、个金属氧化物传感器组成传感器阵列,对 100ppm、200ppm、300ppm、400ppm四个浓度的苯、丙酮、甲醇、正戊烷四种有毒有害挥发性有机化合物进行了测试。根据定量分析中样本分布的特点,用主成分分析法将原始高维样本空间降维到 6 维的低维空间、Fisher 判别法降维II到 3 维空间、Sammon映射法降维到 3 维空间、自组织映射法降维到 2 维空间,并对各降维空间中的各浓度样本进行了识别和分析。模式识别结果表明,原始样本集经各特征提取方法处理后, 不同类别不同浓度的样本的识别效果都很好。 其中, 经 Fisher判别法处理的样本集的识别率达 1 0 0 。 特征提取方法可以用少数

5、几个特征重新描述样本的浓度和类别信息,达到比较好的识别效果。 本文主要分为四个部分:第一部分介绍电子鼻系统的工作原理和应用现状、特征提取方法的研究背景、目的和意义以及国内外的研究现状,并阐述本课题的研究内容、研究方法;第二部分详细介绍四种特征提取方法的原理及实现步骤,并从原理上比较四种特征提取方法的异同;第三部分通过可燃性液体定性识别实验,比较四种特征提取方法各自的特征及降维效果,并分析了各个特征提取方法的最佳投影空间维数及对应的最佳的模式识别方法。第四部分通过对四种挥发性有机化合物的定量分析实验,探讨特征提取方法应用于定量分析的可行性,并比较了四种特征提取方法的特点及其提取样本空间中浓度信息

6、的能力。 关键词:特征提取; 电子鼻; 主成分分析法; Fisher 分析法; Sammon映射法; 自组织映射法; 定性识别; 定量分析 IIIAbstract Electronic nose is an instrument, which mimics human smelling sense. Electronic nose system is generally composed of sensor array, signal preprocessing process and pattern recognition. Feature extraction is a signal pr

7、eprocessing method for dimensionality- reduction by projecting high- dimensional space to a low dimensional space. Noises could be reduced or eliminated by the projecting of feature extraction which could make subsequent pattern recognition more efficient. As a first step for data analysis, feature

8、extraction methods could conceive most of the samples information for classification in a new 2 or 3 dimensions object space and show the samples distribution directly. Otherwise, a smaller size input space with larger information, obtained by feature extraction, could improve the computation speed.

9、 Problems arised from high dimensions and large data size could also be solved. As the advantage of feature extraction techniques, it plays an important role in improving the ability of electronic nose. Characteristics of four feature extraction methods were separately discussed in this paper throug

10、h a qualitative analysis of four flammable liquids and a quantitative analysis of four kinds of volatile organic compounds in two experiments. The characteristics, dimensionality reduction power and best dimension of object space of the four feature extraction techniques were compared. Four feature

11、extraction methods involve principle component analysis (PCA), Fisher discriminant analysis (FDA), self- organizing mapping (SOM) and Sammon mapping. Three pattern- recognition methods were applied to judge the property of the different feature extraction techniques by correct recognition rate. The

12、k nearest neighbor method (KNN), probabilistic neural net (PNN) and the error back propagation method (BP) were chosen as the pattern recognition methods. In the experiment of qualitative analysis of four flammable liquids, 6 metal oxide sensors (MOS) constituted the sensor array of the electronic n

13、ose. The samples were four flammable liquids (gasoline, alcohol, kerosene and diesel oil) and three normal beverages (orange juice, Coca Cola, black tea). The original sample dataset usually with high dimension was first preprocessed by feature extraction. The ability of feature extraction to IVreco

14、gnize the different kinds of the samples was judged by pattern recognition techniques. The results showed that flammable liquids and beverages could be 100% classified. Each sample of the flammable liquids and beverages could be recognized by the FDA, utilizing Hierarchical Classification. The best

15、dimension of object space was depended on the feature extraction techniques, while the best pattern recognition technique had big relationship with the data sets. In the experiment of quantitative analysis of four volatile organic components (VOCs), the sensor array was composed of 14 MOS gas sensor

16、s. The sample is four VOCs involving benzene, acetone, carbinol and pentane with four concentrations of 100ppm, 200ppm, 300ppm, 400ppm. The results of pattern recognition techniques indicated that all the samples could be correctly recognized, and FDA showed the better work than other methods. Fewer parameters obtained by feature extraction methods could represent characteristics of the concentration and the species character of the sample efficiently. The paper was divided into four p



