This obviates the need translate the entire collection
|
|
85 |
---|---|---|
NaiveRank |
Relevant: | 5845 | |
---|---|---|
Rel.ret.: | 4513 |
at 0.00 |
|
|
---|---|---|
at 0.10 |
|
|
at 0.20 | ||
at 0.30 |
|
|
at 0.40 |
|
|
at 0.50 | ||
at 0.60 |
|
|
at 0.70 |
|
|
at 0.80 | ||
at 0.90 |
|
|
at 1.00 |
|
|
Avg.: |
5 docs: |
|
|
---|---|---|
10 docs: | ||
15 docs: | ||
20 docs: |
|
|
30 docs: | ||
100 docs: | ||
200 docs: |
|
|
500 docs: | ||
1000 docs: | ||
R-Precision: |
|
|
To simplify the exposition, focus on a two-language scenario: a user issues a query q in a source language S, and the system ranks documents in a target language T by their relevance to q.
Some popular strategies for this problem are:
Performing retrieval across languages within the framework described in Section 3.3 is a straightforward matter. One can use model (3.12) as before, but now interpret σ(w | q) as a measure of the likelihood that a word q in the language S is a translation of a word w in the language T . In other words, σ is now a model of translation rather than semantic proximity.
Presented with a bilingual collection of (q, d) pairs, where each q is in the language S and each d is in T , applying the EM-based strategy of Section 3.4 would work without modification. Nothing in the models described in Section 3.3 assumes the queries and documents are in the same language.