4.2 Statistical gisting 97
ing algorithms to estimate maximum-likelihood parameter values for a model of translation between the two languages. For instance, the Candide system at IBM  used the pro-ceedings of the Canadian parliament—maintained in both French and English—to learn an English-French translation model. In an entirely analogous way, one can use Open Di-rectory’s “bilingual corpus” of web pages and their summaries to learn a mapping from web pages to summaries. Probably the fundamental difference between ocelot’s task and natural language translation is a degree of difficulty: a satisfactory translation of a sentence must capture its entire meaning, while a satisfactory summary is actually expected to leave out most of the source document’s content.
This chapter proposes two methods for word selection. The simpler of the strategies is to select words according to the frequency of their appearance in the document d. That is, if word w appears with frequency λ(w | d) in d, then it should appear in a gist g of that document with the same frequency:
E[λ(w | g)] = E[λ(w | d)].
In general, the probability of a word appearing at a specific position in a gist depends on the previous words. If the word platypus already appeared in a summary, for instance, it’s not likely to appear again. And although the might appear multiple times in a summary, it is unlikely to appear in position k if it appeared in position k − 1. The gisting model which ocelot uses takes into account the ordering of words in a candidate gist by using an n-gram model of language.
Given a document d, the optimal gist for that document is, in a maximum likelihood sense,