1. If no custom partitioner is defined in Hadoop then how is data partitioned before it is sent to the reducer?
Ans: The default partitioner computes a hash value for the key and assigns the partition based on this result.
2. Distributed Cache
DistributedCache is a facility provided by the Map-Reduce framework to cache files needed by applications. Once you cache a file for your job, hadoop framework will make it available on each and every data nodes (in file system, not in memory) where you map/reduce tasks are running. Then you can access the cache file as local file in your Mapper Or Reducer job. Now you can easily read the cache file and populate some collection (e.g Array, Hashmap etc.) in your code.
3.
No comments:
Post a Comment