《人工智能的幻灯片ch14-bayesian-network-part1》由会员分享,可在线阅读,更多相关《人工智能的幻灯片ch14-bayesian-network-part1(28页珍藏版)》请在金锄头文库上搜索。
1、XIV. Bayesian networks (section 1-3),Autumn 2012 Instructor: Wang Xiaolong Harbin Institute of Technology, Shenzhen Graduate School Intelligent Computation Research Center (HITSGS ICRC),Outlines,Syntax Semantics Parameterized distributions,Bayesian networks,A simple, graphical notation for condition
2、al independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link “directly influences“) a conditional distribution for each node given its parents: P (Xi | Parents (Xi) In the simplest case, conditiona
3、l distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values,Example,Topology of network encodes conditional independence assertions: Weather is independent of the other variables Toothache and Catch are conditionally indep
4、endent given Cavity,Example,Im at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesnt call. Sometimes its set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects “causal“ knowledge: A burglar
5、can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call,Example contd.,Compactness,A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values Each row requires one number p for Xi = true (the numbe
6、r for Xi = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n 2k) numbers I.e., grows linearly with n, vs. O(2n) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31),Semantics,The semantics of Bayesian network
7、s: A representation of the joint probability distribution. (Numerical semantics) An encoding of a collection of conditional independence statements. (Topological semantics),Numerical semantics,The full joint distribution is defined as the product of the local conditional distributions:,Numerical sem
8、antics,The full joint distribution is defined as the product of the local conditional distributions:,Topological semantics,Topological semantics: Each node is conditionally independent of its non-descendants given its parents,Markov blanket,Each node is conditionally independent of all others given
9、its Markov blanket: parents + children + childrens parents Theorem: Topological semantics Numerical semantics,Constructing Bayesian networks,1. Choose an ordering of variables X1, ,Xn 2. For i = 1 to n add Xi to the network select parents from X1, ,Xi-1 such that P (Xi | Parents(Xi) = P (Xi | X1, .
10、Xi-1) This choice of parents guarantees: P (X1, ,Xn) = i =1 P (Xi | X1, , Xi-1) (chain rule) = i =1P (Xi | Parents(Xi) (by construction),n,n,Example,Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)?,Example,Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(
11、A | J)? P(A | J, M) = P(A)?,Example,Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)?,Example,Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A |
12、 J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? P(E | B, A, J, M) = P(E | A, B)?,Example,Suppose we choose the ordering M, J, A, B, E P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J,
13、 M) = P(B)? No P(E | B, A ,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes,Example contd.,Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed
14、,Compact conditional distributions,CPT grows exponentially with number of parents (O(2k) ) CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case:,Compact conditional distributions contd.,Nois
15、y-OR distributions model multiple noninteracting causes 1) Parents U1Uk include all causes (can add leak node) 2) Independent failure probability qi for each cause alone Number of parameters linear in number of parents (O(k),Hybrid (discrete+continuous) networks,Discrete (Subsidy? and Buys?); contin
16、uous (Harvest and Cost) Option 1: discretizationpossibly large errors, large CPTs Option 2: finitely parameterized canonical families 1) Continuous variable, discrete+continuous parents (e.g., Cost) 2) Discrete variable, continuous parents (e.g., Buys?),Continuous child variables,Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Most common is the linear Gaussian model, e.g.,: Mean Cost v