Asymptotic Properties of MLE with Labeled and Unlabeled Data 85
Note that pn(C, X) cannot induce a distribution parameterized by Θ, as oth-
The optimal “plug-in” classification rule depends on the estimate ˆη as fol-
|pn(C = c′, X)||=|
|pn(C = c′′, X)||=||
|marginal pn(X) is a mixture of Gaussians with mixing factor 0.2.|
estimate with fully unlabeled data is η∗u= 0.2, which yields the worst perfor-
mance. For labeled data, we estimate η using a sequence of Bernoulli trials
|86||Theory: Semi-supervised Learning|
varying amounts of unlabeled data. In Example 4.7, we have:
+(1 − λ)�(0.2N(0, 1) + 0.8N(3, 1)) log(ηN(0, 1) + (1 − η)N(3, 1))dx.
For λ = 1, we have the function for labeled data only, with zero at η∗l. For
λ = 0, we have the function for unlabeled data only (obtained by numerical
In many aspects, this example has the same structure as Example 4.6. In both examples, the estimates from labeled data are simple Bernoulli trials, while the estimates from unlabeled data have more complex behavior. In both examples the estimates move from θ∗lto θ∗uas λ goes from 1 to 0.
5.5 Distribution of Asymptotic Classification Error Bias