The classification error produced withpn
Asymptotic Properties of MLE with Labeled and Unlabeled Data 85
Note that pn(C, X) cannot induce a distribution parameterized by Θ, as oth
The optimal “plugin” classification rule depends on the estimate ˆη as fol
lows:
pn(C = c′, X)  =  

=  
pn(C = c′′, X)  = 


=  


marginal pn(X) is a mixture of Gaussians with mixing factor 0.2.  
estimate with fully unlabeled data is η∗u= 0.2, which yields the worst perfor mance. For labeled data, we estimate η using a sequence of Bernoulli trials 
86  Theory: Semisupervised Learning 

varying amounts of unlabeled data. In Example 4.7, we have:
+(1 − λ)�(0.2N(0, 1) + 0.8N(3, 1)) log(ηN(0, 1) + (1 − η)N(3, 1))dx.
For λ = 1, we have the function for labeled data only, with zero at η∗l. For λ = 0, we have the function for unlabeled data only (obtained by numerical 
In many aspects, this example has the same structure as Example 4.6. In both examples, the estimates from labeled data are simple Bernoulli trials, while the estimates from unlabeled data have more complex behavior. In both examples the estimates move from θ∗lto θ∗uas λ goes from 1 to 0.
5.5 Distribution of Asymptotic Classification Error Bias