《cmu高级机器学习变分推理》由会员分享,可在线阅读,更多相关《cmu高级机器学习变分推理(19页珍藏版)》请在金锄头文库上搜索。
1、1Eric Xing Eric Xing CMU, 2006-20091Advanced Machine LearningAdvanced Machine LearningVariationalVariational InferenceInferenceEric XingEric XingLecture 12, August 12, 2009Reading:Eric Xing Eric Xing CMU, 2006-20092An Ising model on 2-D image?Nodes encode hidden information (patch- identity).?They r
2、eceive local information from the image (brightness, color).?Information is propagated though the graph over its edges.?Edges encode compatibility between nodes.?air or water ?2Eric Xing Eric Xing CMU, 2006-20093Why Approximate Inference?Tree-width of NxN graph is O(N)?N can be a huge number(1000s o
3、f pixels)?Exact inference will be too expensive+= =0?KL(Q1|Q2)=0 iff Q1=Q2?But, KL(Q1|Q2) KL(Q2|Q1) =FfaaaXfZXP)(/1)()()(log()()|(21 121XQXQXQQQKLX=4Eric Xing Eric Xing CMU, 2006-20097Which KL?Computing KL(P|Q) requires inference!?But KL(P|Q) can be computed without performing inference on P?Using )
4、()(log()()|(XPXQXQPQKLX=)(log)()(log)(XPXQXQXQXX=) )(/1log()()|( =FfaaQQaXfZEXHPQKL =FfaaaXfZXP)(/1)( =FfaaQQaXfEZXH)(log/1log)()(log)(XPEXHQQ=Eric Xing Eric Xing CMU, 2006-20098The Objective?We will call the “Energy Functional” *?=?F(P,Q) = F(P,P)ZXfEXHPQKLFfaaQQalog)(log)()|(+= ),(QPF),(QPF*also c
5、alled Gibbs Free Energy),(PPF5Eric Xing Eric Xing CMU, 2006-20099The Energy Functional?Let us look at the functional?can be computed if we have marginals over each fa?is harder! Requires summation over all possible values?Computing F, is therefore hard in general.?Approach 1: Approximate with easy t
6、o compute =FfaaQQaXfEXHQPF)(log)(),( FfaaQaXfE)(log=XQXQXQH)(log)(),(QPF),(QPFEric Xing Eric Xing CMU, 2006-200910Tree Energy Functionals?Consider a tree-structured distribution?The probability can be written as:?involves summation over edges and vertices and is therefore easy to compute()( )1=iii E
7、jijiijxbxxbb,)(x()()( )( ) +=ijixiiii iEjixxjiijjiijtreexbxbxxbxxbHln,ln, ,1()()( )( )()()( )( )()() ()( )( ) ( )( )( ) += +=iijiijiijixiiii iixiiii ii Ejijijijiijxxjiijixiiii Ejijiji xxjiij xiiii iEjijiij xxjiijTreexbxbxfxbxbxxfxxbxxbxfxbxxfxxbxbxbxxbxxbFlnln,ln,ln,ln, ln,ln, , ,211X2X3X4X5X6X7X8X7
8、3625178672312.FFFFFFFFFF+=6Eric Xing Eric Xing CMU, 2006-200911Tree Energy Functionals?Consider a tree-structured distribution?The probability can be written as:?involves summation over edges and vertices and is therefore easy to compute()( )idiii aaaxbbb=1xx)()()()( )( )+=iaiiii ii aaaaatreebbdbbHx
9、xxxxxlnln1()() ()()( )( )+=iaiiii ii aaaaa aaTreebbdfbbFxxxxxxxlnln11X2X3X4X5X6X7X8X73625178672312.FFFFFFFFFF+=Eric Xing Eric Xing CMU, 2006-200912Bethe Approximation to Gibbs Free Energy?For a general graph, choose?Called “Bethe approximation” after the physicist Hans Bethe?Equal to the exact Gibbs
10、 free energy when the factor graph is a tree?In general, HBetheis not the same as the H of a treeBethaFQPF= ),(1X2X3X4X5X6X7X8X862517867231222FFFFFFFFFFbethe+=.()() ()()( )( )()bethaaaiiii ii aaaaa aaBetheHfbbdfbbFia=+=xxxxxxxxlnln1()()()( )( )+=iaiiii ii aaaaaBethebbdbbHxxxxxxlnln17Eric Xing Eric X
11、ing CMU, 2006-200913Bethe Approximation?Pros:?Easy to compute, since entropy term involves sum over pairwise and single variables?Cons:?may or may not be well connected to?It could, in general, be greater, equal or less than ?Optimize each b(xa)s. ?For discrete belief, constrained opt. with Lagrangi
12、an multiplier ?For continuous belief, not yet a general formula?Not always convergebetheFQPF= ),(),(QPF),(QPFEric Xing Eric Xing CMU, 2006-200914Undirected graph (Markov random field)Directed graph (Bayesian network) =iijjiijiixxxZxP)()(),()(1)(ij)(iix),()(jiijxx)|()()(parents=iiixxPxPiParents(i)fac
13、tor graphsinteractionsvariablesFrom GM to factored graphs8Eric Xing Eric Xing CMU, 2006-200915Recall Beliefs and messages in FGi )()()()(iNaiiaiiiixmxfxb“beliefs”“messages”a )( )()()()(aNiaiNciicaaaaxmXfXbThe “belief” is the BP approximation of the marginal probability.Eric Xing Eric Xing CMU, 2006-
14、200916Bethe Free Energy for FG()() ()()( )( )+=iaiiii ii aaaaa aaBethabbdfbbFxxxxxxxlnln1()()()( )( )+=iaiiii ii aaaaaBethebbdbbHxxxxxxlnln1()bethaaaBetheHfF=x9Eric Xing Eric Xing CMU, 2006-200917( )()( ) +=axXiiaa aNixiaixii iiBetheiaiixbXbxxbFL)(1)(0)(=iixbL )()(11exp)(iNaiai iiixdxb0)(=aaXbL + )(
15、)()(exp)(aNiiaiaaaaxXEXbConstrained Minimization of the Bethe Free EnergyEric Xing Eric Xing CMU, 2006-200918Bethe = BP on FG?Identify?to obtain BP equations:i )()()()(iNaiiaiiiixmxfxb“beliefs”“messages”a )( )()()()(aNiaiNciicaaaaxmXfXbThe “belief” is the BP approximation of the marginal probability. =aiNbiibiaixmx)()(ln)(10Eric Xing Eric Xing CMU, 2006-200919Using, )()(=iaxXaaiiaXbxbwe getai =iaxXiaNjajNbjjbaaii