Lets talk Hadoop & Netezza: Impala Notes

Wednesday, October 28, 2015

Impala Notes

1-

Cloudera Impala is an addition to tools available for querying big data. Impala does not replace the batch
processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are
best suited for long running batch jobs, such as those involving batch processing of Extract, Transform, and
Load (ETL) type jobs.

2-

Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or
HBase.

Impala daemons

a. Impalad process
b. Impalad process also receive broadcast messages from the catalogd daemon (introduced in Impala 1.2) whenever any
Impala node in the cluster creates, alters, or drops any type of object, or when an INSERT or LOAD DATA statement
is processed through Impala.

Impala Metastore

The Impala component known as the statestore checks on the health of Impala daemons on all the nodes in a
cluster, and continuously relays its findings to each of those daemons.

It is physically represented by a daemon process named statestored. You only need such a process on one node in the cluster.

The Impala Catalog Service

The Impala component known as the catalog service relays the metadata changes from Impala SQL statements to all the nodes in a cluster. It is physically represented by a daemon process named catalogd; you only need such a process on one node in the cluster. Because the requests are passed through the statestore daemon, it makes sense to run the statestored and catalogd services on the same node.

This new component in Impala 1.2 reduces the need for the REFRESH and INVALIDATE METADATA statements.

http://blog.cloudera.com/blog/2014/12/the-impala-cookbook/

Lets talk Hadoop & Netezza

Wednesday, October 28, 2015

Impala Notes

No comments:

Post a Comment