矩阵运算梯度和求导公式资料

资源描述

《矩阵运算梯度和求导公式资料》由会员分享，可在线阅读，更多相关《矩阵运算梯度和求导公式资料（29页珍藏版）》请在金锄头文库上搜索。

1、Appendix D Matrix calculus From too much study, and from extreme passion, cometh madnesse. Isaac Newton 150,?5 D.1Directional derivative, Taylor series D.1.1Gradients Gradient of a diff erentiable real function f(x) : RKR with respect to its vector argument is defi ned in terms of partial derivative

2、s f(x) , f(x) x1 f(x) x2 . . . f(x) xK RK(1719) while the second-order gradient of the twice diff erentiable real function with respect to its vector argument is traditionally called the Hessian; 2f(x) , 2f(x) x2 1 2f(x) x1x2 2f(x) x1xK 2f(x) x2x1 2f(x) x2 2 2f(x) x2xK . . . . . . . . 2f(x) xKx1 2f(

3、x) xKx2 2f(x) x2 K SK(1720) ?2001 Jon Dattorro. coD.1 2h(x) , h1(x) x1 h2(x) x1 hN(x) x1 h1(x) x2 h2(x) x2 hN(x) x2 . . . . . . . . . h1(x) xK h2(x) xK hN(x) xK = 2h1(x) 2h2(x) 2hN(x) RKNK (1724) where the gradient of each real entry is with respect to vector x as in (1719). D.1 The word matrix come

4、s from the Latin for womb; related to the prefi x matri- derived from mater meaning mother. D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES659 The gradient of real function g(X) : RKLR on matrix domain is g(X) , g(X) X11 g(X) X12 g(X) X1L g(X) X21 g(X) X22 g(X) X2L . . . . . . . . . g(X) XK1 g(X) XK2 g(X

5、) XKL RKL = X(:,1)g(X) X(:,2)g(X) . X(:,L)g(X) RK1L (1725) where the gradient X(:,i)is with respect to the ithcolumn of X .The strange appearance of (1725) in RK1Lis meant to suggest a third dimension perpendicular to the page (not a diagonal matrix). The second-order gradient has representation 2g(

6、X) , g(X) X11 g(X) X12 g(X) X1L g(X) X21 g(X) X22 g(X) X2L . . . . . . . . . g(X) XK1 g(X) XK2 g(X) XKL RKLKL = X(:,1)g(X) X(:,2)g(X) . X(:,L)g(X) RK1LKL (1726) where the gradient is with respect to matrix X . 660APPENDIX D. MATRIX CALCULUS Gradient of vector-valued function g(X) : RKLRNon matrix do

7、main is a cubix g(X) , X(:,1)g1(X) X(:,1)g2(X) X(:,1)gN(X) X(:,2)g1(X) X(:,2)g2(X) X(:,2)gN(X) . X(:,L)g1(X) X(:,L)g2(X) X(:,L)gN(X) = g1(X) g2(X) gN(X) RKNL(1727) while the second-order gradient has a fi ve-dimensional representation; 2g(X) , X(:,1)g1(X) X(:,1)g2(X) X(:,1)gN(X) X(:,2)g1(X) X(:,2)g2

8、(X) X(:,2)gN(X) . X(:,L)g1(X) X(:,L)g2(X) X(:,L)gN(X) = 2g1(X) 2g2(X) 2gN(X) RKNLKL(1728) The gradient of matrix-valued function g(X) : RKLRMNon matrix domain has a four-dimensional representation called quartix (fourth-order tensor) g(X) , g11(X)g12(X)g1N(X) g21(X)g22(X)g2N(X) . . . . . . . . . gM1

9、(X)gM2(X)gMN(X) RMNKL(1729) while the second-order gradient has six-dimensional representation 2g(X) , 2g11(X)2g12(X)2g1N(X) 2g21(X)2g22(X)2g2N(X) . . . . . . . . . 2gM1(X)2gM2(X)2gMN(X) RMNKLKL (1730) and so on. D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES661 D.1.2Product rules for matrix-functions G

10、iven dimensionally compatible matrix-valued functions of matrix variable f(X) and g(X) X f(X)Tg(X) = X(f)g + X(g)f(1731) while 51,?8.3 309 Xtrf(X)Tg(X)= X trf(X)Tg(Z)+ trg(X)f(Z)T fl fl fl ZX (1732) These expressions implicitly apply as well to scalar-, vector-, or matrix-valued functions of scalar,

11、 vector, or matrix arguments. D.1.2.0.1Example.Cubix. Suppose f(X) : R22R2= XTa and g(X) : R22R2= Xb. We wish to fi nd X f(X)Tg(X) = XaTX2b(1733) using the product rule. Formula (1731) calls for XaTX2b = X(XTa)Xb + X(Xb)XTa(1734) Consider the fi rst of the two terms: X(f)g = X(XTa)Xb = (XTa)1(XTa)2

12、Xb (1735) The gradient of XTa forms a cubix in R222; a.k.a, third-order tensor. (XTa)1 X11 ? ? ? ? ? ? (XTa)2 X11 ? ? ? ? ? ? (XTa)1 X12 (XTa)2 X12 (XTa)1 X21 ? ? ? ? ? ? (XTa)2 X21 ? ? ? ? ? ? (XTa)1 X22 (XTa)2 X22 X(XTa)Xb = (Xb)1 (Xb)2 R 212 (1736) 662APPENDIX D. MATRIX CALCULUS Because gradient

13、of the product (1733) requires total change with respect to change in each entry of matrix X , the Xb vector must make an inner product with each vector in the second dimension of the cubix (indicated by dotted line segments); X(XTa)Xb = a10 0a1 a20 0a2 b1X11+ b2X12 b1X21+ b2X22 R212 = a1(b1X11+ b2X

14、12)a1(b1X21+ b2X22) a2(b1X11+ b2X12)a2(b1X21+ b2X22) R22 = abTXT (1737) where the cubix appears as a complete 222 matrix. In like manner for the second term X(g)f X(Xb)XTa = b10 b20 0b1 0b2 X11a1+ X21a2 X12a1+ X22a2 R212 = XTabT R22 (1738) The solution XaTX2b = abTXT+ XTabT(1739) can be found from Table D.2.1 or verifi ed using (1732).2 D.1.2.1Kronecker product

展开阅读全文