8.2. MODEL AVERAGING 451
is not small, to find the maximizing DAG patterns by exhaustively consider-ing all DAG patterns is computationally unfeasible. That is, Robinson  has shown the number of DAGs containing n nodes is given by the following recurrence:
It is left as an exercise to show f(2) = 3, f(3) = 25, f(5) = 29, 000, and f(10) = 4.2 × 1018. There are less DAG patterns than there are DAGs, but this number also is forbiddingly large. Indeed, Gillispie and Pearlman  show that an asymptotic ratio of the number of DAGs to DAG patterns equal to about 3.7 is reached when the number of nodes is only 10. Chickering [1996a] has proven that for certain classes of prior distributions the problem of finding the most probable DAG patterns is NP-complete.
Example 8.6 Recall that given the Bayesian network structure learning schema and data discussed in Example 8.2,
P(gp1|d) = .51678
|P(X(9) 1||= 2|X(9)||= 1, d) =||X||P(X(9) 1||= 2|X(9)||= 1, gpi, d)P(gpi|X(9)|
Note that we now explicitly show that this inference concerns the 9th case using a superscript. To compute this probability, we need P(gpi|X(9) = 1, d), but we have P(gpi|d). We could either approximate the former probability by the latter one, or we could use the technique which will be discussed in Section 8.3 to compute it. For the sake of simplicity, we will approximate it by P(gpi|d). We have then
P(X(9) 1 = 2|X(9) = 1, d) ≈ X P(X(9) 1 = 2|X(9) = 1, gpi, d)P(gpi|d)
As is the case for model selection, when the number of possible structures is large, we cannot average over all structures. In these situations we heuristically search for high probability structures, and then we average over them. Such techniques are discussed in Section 9.2.
8.3 Learning Structure with Missing Data
|Section 6.5.||Table 8.1 shows such a data set.|