This option enables all the words contribute fairly
362 J.I. Serrano‚ M.D. Del Castillo
Copy Operator. This operator selects some of the best chromosomes of a population and duplicates them in the next generation. Since the evolution of a population over time can produce worse chromosomes than the original set‚ this operator provides a mechanism for remembering chromosomes that were previously useful.
The objective of the fitness function is to compute some measurement of the profit or goodness a chromosome would have as a centroid document of a class. According to the assumptions‚ every chromosome of a concrete population is a centroid. The closer to every preprocessed document the centroid is‚ the better it will be.
Obviously‚ the chromosome taking the highest fitness value will be the best centroid. The main point of the fitness function is to find the measurement of similarity or‚ inversely‚ the measurement of distance among documents. The more similar a document is to another‚ the less distance will exist between them.
where
is the number of appearances of word
in document X‚ and
is the number of appearances of word
in document Y. This function calculates the similarity between document X and document Y. The degree of similarity between two documents is obtained by multiplying the number of occurrences of the words that are common to both documents. Thus‚ if a centroid contains many relevant words that are present in many documents‚the centroid will take a high average similarity value with every document and therefore a low average distance value.
|
|
---|