1.Locks
Hive also has support for table- and partition-level locking. Locks prevent, for example, one process from dropping a table while another is reading from it. Locks are managed transparently using ZooKeeper, so the user doesn’t have to acquire or release them, although it is possible to get information about which locks are being held via the SHOW LOCKS statement. By default, locks are not enabled.
2. SerDe
A SerDe will deserialize a row of data from the bytes in the file to objects used internally by Hive to operate on that row of data.
the table’s SerDe will serialize Hive’s internal representation of a row of data into the bytes that are written to the output file.
3. Use EXPLAIN and you will see the abstract syntax tree, the dependency graph, and the plan of each stage.
4.LEFT SEMI JOIN is used as replacement to IN, Exisits clause in hive quries. Although with HIVE 0.13 it support IN and Exists clause.
- Hive allows uncorrelated subqueries, where the subquery is a self-contained query referenced by an IN or EXISTS statement in the WHERE clause.
5. A UDF must satisfy the following two properties:
-A UDF must be a subclass of org.apache.hadoop.hive.ql.exec.UDF.
-A UDF must implement at least one evaluate() method.
6. Hive Delta or Incremental load
https://pkghosh.wordpress.com/2012/07/08/making-hive-squawk-like-a-real-database/
7. Bulk Insert Update Delete in Data lake
https://pkghosh.wordpress.com/2015/04/26/bulk-insert-update-and-delete-in-hadoop-data-lake/
8. Hive Join Optimization
http://www.datascience-labs.com/hive/hiveql-joins/
9. Use Vectorization
Vectorized query execution improves performance of operations like scans, aggregations, filters and joins, by performing them in batches of 1024 rows at once instead of single row each time.
Introduced in Hive 0.13, this feature significantly improves query execution time, and is easily enabled with two parameters settings:
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
10. We could used TEZ instead of mapreduce execution engine.

No comments:
Post a Comment