《矩阵求导 分子布局 分母布局 matrix differentiation numerator layout denominator layout》由会员分享,可在线阅读,更多相关《矩阵求导 分子布局 分母布局 matrix differentiation numerator layout denominator layout(34页珍藏版)》请在金锄头文库上搜索。
1、Matrix Diff erentiation CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng(NUS) Matrix Diff erentiation1 / 34 Linear Fitting Revisited Linear Fitting Revisited Linear fi tting solves this prob
2、lem: Given n data points pi= xi1 xim, 1 i n, and their corresponding values vi , fi nd a linear function f that minimizes the error E = n X i=1 (f(pi) vi)2.(1) The linear function f(pi) has the form f(p) = f(x1,.,xm) = a1x1+ + amxm+ am+1.(2) Leow Wee Kheng(NUS) Matrix Diff erentiation2 / 34 Linear F
3、itting Revisited The data points are organized into a matrix equation Da = v,(3) where D = x11x1m1 . . . . . . . . xn1xnm1 , a = a1 . . . am am+1 ,and v = v1 . . . vn . (4) The solution of Eq. 3 is a = (DD)1Dv.(5) Leow Wee Kheng(NUS) Matrix Diff erentiation3 / 34 Linear Fitting Revisited Denote each
4、 row of D as d i . Then, E = n X i=1 (d i a vi)2= kDa vk2.(6) So, linear least squares problem can be described very compactly as min a kDa vk2.(7) To show that the solution in Eq. 5 minimizes error E, need to diff erentiate E with respect to a and set it to zero: dE da = 0.(8) How to do this diff e
5、rentiation? Leow Wee Kheng(NUS) Matrix Diff erentiation4 / 34 Linear Fitting Revisited The obvious (but hard) way: E = n X i=1 m X j=1 ajxij+ am+1 vi 2 .(9) Expand equation explicitly giving E ak = 2 n X i=1 m X j=1 ajxij+ am+1 vi xik, for k 6= m + 1 2 n X i=1 m X j=1 ajxij+ am+1 vi , for k = m + 1
6、Then, set E/ak= 0 and solve for ak. This is slow, tedious and error prone! Leow Wee Kheng(NUS) Matrix Diff erentiation5 / 34 Linear Fitting Revisited Which one do you like to be? Leow Wee Kheng(NUS) Matrix Diff erentiation6 / 34 Linear Fitting Revisited At least like these? Leow Wee Kheng(NUS) Matri
7、x Diff erentiation7 / 34 Matrix Derivatives Matrix Derivatives There are 6 common types of matrix derivatives: TypeScalarVectorMatrix Scalar y x y x Y x Vector y x y x Matrix y X Leow Wee Kheng(NUS) Matrix Diff erentiation8 / 34 Matrix Derivatives Derivatives by Scalar Numerator Layout NotationDenom
8、inator Layout Notation y x y x y x = y1 x . . . ym x y x = ?y 1 x ym x ? y x Y x = y11 x y1n x . . . . . ym1 x ymn x Leow Wee Kheng(NUS) Matrix Diff erentiation9 / 34 Matrix Derivatives Derivatives by Vector Numerator Layout NotationDenominator Layout Notation y x = ? y x1 y xn ? y x = y x1 . . . y
9、xn y x = y1 x1 y1 xn . . . . . ym x1 ym xn y x = y1 x1 ym x1 . . . . . y1 xn ym xn y x y x Leow Wee Kheng(NUS) Matrix Diff erentiation10 / 34 Matrix Derivatives Derivative by Matrix Numerator Layout NotationDenominator Layout Notation y X = y x11 y xm1 . . . . . y x1n y xmn y X = y x11 y x1n . . . .
10、 . y xm1 y xmn y X y X Leow Wee Kheng(NUS) Matrix Diff erentiation11 / 34 Matrix Derivatives Pictorial Representation numerator layout denominator layout . . . . Leow Wee Kheng(NUS) Matrix Diff erentiation12 / 34 Matrix Derivatives Caution Most books and papers dont state which convention they use.
11、Reference 2 uses both conventions but clearly diff erentiate them. y x = ? y x1 y xn ? y x = y x1 . . . y xn y x = y1 x1 y1 xn . . . . . ym x1 ym xn y x = y1 x1 ym x1 . . . . . y1 xn ym xn It is best not to mix the two conventions in your equations. We adopt numerator layout notation. Leow Wee Kheng
12、(NUS) Matrix Diff erentiation13 / 34 Matrix DerivativesCommonly Used Derivatives Commonly Used Derivatives Here, scalar a, vector a and matrix A are not functions of x and x. (C1) da dx = 0(column matrix) (C2) da dx = 0(row matrix) (C3) da dX = 0(matrix) (C4) da dx = 0(matrix) (C5) dx dx = I Leow We
13、e Kheng(NUS) Matrix Diff erentiation14 / 34 Matrix DerivativesCommonly Used Derivatives (C6) dax dx = dxa dx = a (C7) dxx dx = 2x (C8) d(xa)2 dx = 2xaa (C9) dAx dx = A (C10) dxA dx = A (C11) dxAx dx = x(A + A ) Leow Wee Kheng(NUS) Matrix Diff erentiation15 / 34 Matrix DerivativesDerivatives of Scala
14、r by Scalar Derivatives of Scalar by Scalar (SS1) (u + v) x = u x + v x (SS2) uv x = u v x + v u x (product rule) (SS3) g(u) x = g(u) u u x (chain rule) (SS4) f(g(u) x = f(g) g g(u) u u x (chain rule) Leow Wee Kheng(NUS) Matrix Diff erentiation16 / 34 Matrix DerivativesDerivatives of Vector by Scalar Derivatives of Vector by Scalar (VS1) au x = au x where a is not a function of x. (VS2) Au x = Au x where A is not a function of x. (VS3) u x