Rating : ⭐⭐⭐⭐⭐
Price : \$10.99
Language:EN
Pages: 2

# The classification error produced withpn

Asymptotic Properties of MLE with Labeled and Unlabeled Data 85

Note that pn(C, X) cannot induce a distribution parameterized by Θ, as oth-

The optimal “plug-in” classification rule depends on the estimate ˆη as fol-

lows:

pn(C = c′, X) =
=
pn(C = c′′, X) =

pn(C = c′′|X)pn(X)
0.2N(3, 1)
0.8N(0, 1) + 0.2N(3, 1)(0.2N(0, 1) + 0.8N(3, 1)).

=

While the joint distribution pn(C, X) is not a mixture of Gaussians, the

marginal pn(X) is a mixture of Gaussians with mixing factor 0.2.

estimate with fully unlabeled data is η∗u= 0.2, which yields the worst perfor-

mance. For labeled data, we estimate η using a sequence of Bernoulli trials

86 Theory: Semi-supervised Learning

Using Theorem 4.4, we can study the behavior of classification error for

varying amounts of unlabeled data. In Example 4.7, we have:

+(1 − λ)� pn(X) log(ηN(0, 1) + (1 − η)N(3, 1))dx

= arg

+(1 − λ)�(0.2N(0, 1) + 0.8N(3, 1)) log(ηN(0, 1) + (1 − η)N(3, 1))dx.

The derivative (with respect to η) of the quantity to be maximized is

For λ = 1, we have the function for labeled data only, with zero at η∗l. For

λ = 0, we have the function for unlabeled data only (obtained by numerical

In many aspects, this example has the same structure as Example 4.6. In both examples, the estimates from labeled data are simple Bernoulli trials, while the estimates from unlabeled data have more complex behavior. In both examples the estimates move from θ∗lto θ∗uas λ goes from 1 to 0.

5.5 Distribution of Asymptotic Classification Error Bias

How It Works      