Over 10 Million Study Resources Now at Your Fingertips

Download as :
Rating : ⭐⭐⭐⭐⭐
Price : $10.99
Pages: 2

Section hypothesizes few forms the model and applies traditional statistical

4.2 Statistical gisting 97

ing algorithms to estimate maximum-likelihood parameter values for a model of translation between the two languages. For instance, the Candide system at IBM [6] used the pro-ceedings of the Canadian parliament—maintained in both French and English—to learn an English-French translation model. In an entirely analogous way, one can use Open Di-rectory’s “bilingual corpus” of web pages and their summaries to learn a mapping from web pages to summaries. Probably the fundamental difference between ocelot’s task and natural language translation is a degree of difficulty: a satisfactory translation of a sentence must capture its entire meaning, while a satisfactory summary is actually expected to leave out most of the source document’s content.

This chapter proposes two methods for word selection. The simpler of the strategies is to select words according to the frequency of their appearance in the document d. That is, if word w appears with frequency λ(w | d) in d, then it should appear in a gist g of that document with the same frequency:

E[λ(w | g)] = E[λ(w | d)].

In general, the probability of a word appearing at a specific position in a gist depends on the previous words. If the word platypus already appeared in a summary, for instance, it’s not likely to appear again. And although the might appear multiple times in a summary, it is unlikely to appear in position k if it appeared in position k − 1. The gisting model which ocelot uses takes into account the ordering of words in a candidate gist by using an n-gram model of language.


Given a document d, the optimal gist for that document is, in a maximum likelihood sense,

g⋆ =

arg max

How It Works
Login account
Login Your Account
Add to cart
Add to Cart
Make payment
Document download
Download File
PageId: ELI7B7E03E
Uploaded by :
Page 1 Preview
section hypothesizes few forms the model and appli
Sell Your Old Documents & Earn Wallet Balance