中科院模式识别第三次(第五章)_作业_答案_更多

资源描述

《中科院模式识别第三次(第五章)_作业_答案_更多》由会员分享，可在线阅读，更多相关《中科院模式识别第三次(第五章)_作业_答案_更多（6页珍藏版）》请在金锄头文库上搜索。

1、第第 5 章：线性判别函数章：线性判别函数第一部分：计算与证明第一部分：计算与证明1 有四个来自于两个类别的二维空间中的样本，其中第一类的两个样本为(1,4)T和(2,3)T，第二类的两个样本为(4,1)T和(3,2)T。这里，上标 T 表示向量转置。假设初始的权向量 a=(0,1)T，且梯度更新步长k固定为 1。试利用批处理感知器算法求解线性判别函数 g(y)=aTy 的权向量。解：首先对样本进行规范化处理。将第二类样本更改为(4,1)T和(3,2)T. 然后计算错分样本集： g(y1) = (0,1)(1,4)T = 4 0 (正确) g(y2) = (0,1)(2,3)T = 3

2、0 (正确) g(y3) = (0,1)(-4,-1)T = -1 0 (正确) g(y4) = (-7,-2)(-3,-2)T = 25 0 (正确) 所以错分样本集为 Y=(1,4)T , (2,3)T . 接着，对错分样本集求和：(1,4)T+(2,3)T = (3,7)T 第二次修正权向量 a，以完成二次梯度下降更新：a=(-7,-2)T+ (3,7)T=(-4,5)T 再次计算错分样本集： g(y1) = (-4,5)(1,4)T = 16 0 (正确) g(y2) = (-4,5)(2,3)T = 7 0 (正确) g(y3) = (-4,5)(-4,-1)T = 11 0 (正确

3、) g(y4) = (-4,5)(-3,-2)T = 2 0 (正确) 此时，全部样本均被正确分类，算法结束，所得权向量 a=(-4,5)T。2 在线性感知算法中，试证明引入正余量 b 以后的解区(aTyib)位于原来的解区之中 (aTyi0)，且与原解区边界之间的距离为 b/|yi|。证明：设 a*满足 aTyib,则它一定也满足 aTyi0，所以引入余量后的解区位于原来的解区 aTyi0 之中。注意，aTyib 的解区的边界为 aTyi=b,而 aTyi0 的解区边界为 aTyi=0。aTyi=b 与 aTyi=0 两个边界之间的距离为 b/|yi|。（因为 aTyi=0 过坐标原点

4、，相关于坐标原点到 aTyi=b 的距离。） 3 试证明感知器准则函数正比于被错分样本到决策面的距离之和。证明：感知器准则函数为：( )()TYJ yaa y决策面方程为 aTy=0。当 y 为错分样本时，有 aTy0。此时，错分样本到决策面的距离为 aTy/|a|。所有样本到决策面的距离之和为()TYr ya ya结论得证。 4 对于多类分类情形，考虑 one-vs-all 技巧，即构建 c 个线性判别函数：，0( ),1,2,.,T iiigwicxw x此时的决策规则为：对 j i, 如果 gi(x) gj(x), x 则被分类 i 类。现有三个二维空间内的模式分类器，其判别函

5、数为g1(x) = x1 + x2g2(x) = x1 + x2 1g3(x) = x2 试画出决策面，指出为何此时不存在分类不确定性区域。解：根据上述决策规则，属于第一类 1的区域应满足： g1(x) g2(x) 且 g1(x) g3(x) 所以1的决策界面为：g1(x) g2(x) = 2x1 + 1 = 0。 g1(x) g3(x) = x1 + 2x2 = 0。同样地，属于第二类 2的区域应满足： g2(x) g1(x) 且 g2(x) g3(x) 所以2的决策界面为：g2(x) g1(x) = 2x1 1 = 0。 g2(x) g3(x) = x1 + 2x2 1 = 0。属于

6、第三类 3的区域应满足： g3(x) g1(x) 且 g3(x) g2(x) 所以2的决策界面为：g3(x) g1(x) = x1 2x2 = 0。 g2(x) g3(x) = x1 2x2 + 1 = 0。下图给出了决策边界：0.51.00.51.0g3(x) g1(x) =x1 2x2 = 0g2(x) g3(x) =x1 + 2x2 1 = 0g1(x) g2(x) =2x1 + 1 = 03类判别区域1类判别区域2类判别区域由于三个决策边界交于一点，因此，不存在不确定性区域。这是因为直线 g1(x)g2(x) =0 与直线 g1(x)g3(x)=0 的交点一定位于 g1(x)g2(x)

7、 (g1(x)g3(x) = g2(x)g3(x) =0 的直线上，即 g2(x)g3(x) =0 过它们的交点。 5 已知模式样本集：1 = (0,0)T, (1,1)T, 2 = (0,1)T, (1,0)T。采用误差平方准则算法（即 Ho-kashyap 算法）验证它是线性不可分的。（提示：迭代时k固定取 1,初始 b=(1,1,1,1)T）解：首先对第二类样本，进行齐次表示，然后再进行规范化表示，得到如下规范化增广训练数据矩阵：001111011101 YY 的伪逆矩阵为：122221()222243111TT YY YY进行第一次迭代 a=Y+b=(0,0,0)T 计算误差 e=

8、Ya-b=(-1,-1,-1,-1) T 此时，不必再更新 b 即可知道不等式组 Ya0 无解。因为 e 中部分元素为负（此时全为负）。根据 Ho-kashyap 算法相关（收敛性）原理，可知原样本集线性不可分。 6.Consider the hyperplane used in discrimination: (a) Show that the distance from the hyperplane g(x) = wTx + w0 = 0 to the point xa is |g(xa)|/|w| by minimizing |x xa|2 subject to the constr

9、aint g(x) = 0. （提示需要证明两点：其一，点 xa到超平面 g(x) = 0 的距离为|g(xa)|/|w|；其二，该距离是位于超平面 g(x) = 0 上使目标函数|x xa|2最小的点 x 到点 xa的距离。） (b) Show that the projection of xa onto the hyperplane is given by (即证明点 xa到超平面 g(x) = 0 的投影 xp为如下公式)：2() |a pagxxxww证明 (b) 根据对(a)的证明的第二个公式，结论显然成立。注意，在以下表达中，注意，在以下表达中，x 要换成要换成 xa第二部分

10、：计算机编程题本章所使用的数据：1Write a program to implement the “batch perception” algorithm (see page 44 or 45 in PPT).(a). Starting with a = 0, apply your program to the training data from 1 and 2. Note that the number of iterations required for convergence（即记录下收敛的步数）。(b). Apply your program to the training da

11、ta from 3 and 2. Again, note that the number of iterations required for convergence. (c). Explain the difference between the iterations required in the two cases. 2. Implement the Ho-Kashyap algorithm and apply it to the training data from 1 and 3. Repeat to apply it to the training data from 2 and

12、4. Point out the training errors, and give some analyses. 3. Consider relaxation methods as described in the PPT. (See the slides for the “Batch Relaxation with Margin“ algorithm and page 62 in PPT for the “Single Sample Relaxation with Margin“ algorithm): (a) Implement the batch relaxation with mar

13、gin, set b = 0.1 and initialize a = 0, and apply it to the data in 1 and 3. Plot the criterion function as a function of the number of passes through the training set. (b) Repeat for b = 0.5 and a0 = 0 (namely, initialize a = 0). Explain qualitatively any differences you find in the convergence rates. (c) Modify your program to use single sample learning. Again, Plot the criterion function as a function of the number of passes through the training set.

展开阅读全文