Thursday, October 29, 2015

Hive Properties

https://murshedsqlcat.wordpress.com/2014/04/18/useful-hive-settings/
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

Join two CSV data set using mapreduce

http://www.codeproject.com/Articles/869383/Implementing-Join-in-Hadoop-Map-Reduce

Diff between Writable and WritableComparable Interface

org.apache.hadoop.io.Writable is a Java interface. Any key or value type in the Hadoop Map-Reduce framework implements this interface. Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.
org.apache.hadoop.io.WritableComparable is a Java interface. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface. WritableComparable objects can be compared to each other using Comparators.

Wednesday, October 28, 2015

Im-mapper Combining for word count

https://vangjee.wordpress.com/2012/03/07/the-in-mapper-combining-design-pattern-for-mapreduce-programming/

Find the top N most frequent words

1- Let the mapper run as usual writing (key, 1) for reduce phase.

Reduce phase:

1- We override two methods: reduce() and cleanup().
2- at the beginning of the method, we compute the sum of all the values received from the mappers for this key, which is the number of occurrences of this word inside the book; then we put the word and the number of occurrences into a HashMap.
3- We sort the hashmap by count in the map.sortByValues(countMap);
4- in the cleanup() method first we sort the HashMap by values , then we loop over the keyset and output the first 20 items.


Source

Finding the top 10 list from a set

http://blog.pivotal.io/pivotal/products/how-hadoop-mapreduce-can-transform-how-you-build-top-ten-lists