Chose dag pattern because was the more probable
8.2. MODEL AVERAGING 451
is not small, to find the maximizing DAG patterns by exhaustively considering all DAG patterns is computationally unfeasible. That is, Robinson [1977] has shown the number of DAGs containing n nodes is given by the following recurrence:
f(n)  =  ´  

f(0)  =  
f(1)  = 

It is left as an exercise to show f(2) = 3, f(3) = 25, f(5) = 29, 000, and f(10) = 4.2 × 1018. There are less DAG patterns than there are DAGs, but this number also is forbiddingly large. Indeed, Gillispie and Pearlman [2001] show that an asymptotic ratio of the number of DAGs to DAG patterns equal to about 3.7 is reached when the number of nodes is only 10. Chickering [1996a] has proven that for certain classes of prior distributions the problem of finding the most probable DAG patterns is NPcomplete.
Example 8.6 Recall that given the Bayesian network structure learning schema and data discussed in Example 8.2,
P(gp1d) = .51678
452 

Case  X1  X2  X3 



P(X(9) 1  = 2X(9)  = 1, d) =  X  P(X(9) 1  = 2X(9)  = 1, gpi, d)P(gpiX(9)  

(8.3) 
Note that we now explicitly show that this inference concerns the 9th case using a superscript. To compute this probability, we need P(gpiX(9) = 1, d), but we have P(gpid). We could either approximate the former probability by the latter one, or we could use the technique which will be discussed in Section 8.3 to compute it. For the sake of simplicity, we will approximate it by P(gpid). We have then
P(X(9) 1 = 2X(9) = 1, d) ≈ X P(X(9) 1 = 2X(9) = 1, gpi, d)P(gpid)
As is the case for model selection, when the number of possible structures is large, we cannot average over all structures. In these situations we heuristically search for high probability structures, and then we average over them. Such techniques are discussed in Section 9.2.
8.3 Learning Structure with Missing Data
Section 6.5.  Table 8.1 shows such a data set. 
