Wednesday, October 28, 2015

Find the top N most frequent words

1- Let the mapper run as usual writing (key, 1) for reduce phase.

Reduce phase:

1- We override two methods: reduce() and cleanup().
2- at the beginning of the method, we compute the sum of all the values received from the mappers for this key, which is the number of occurrences of this word inside the book; then we put the word and the number of occurrences into a HashMap.
3- We sort the hashmap by count in the map.sortByValues(countMap);
4- in the cleanup() method first we sort the HashMap by values , then we loop over the keyset and output the first 20 items.


Source

No comments:

Post a Comment