Hive Query Performance Tuning

There are several parameters that we can tune in Hive to improve the overall query performance.

For instance,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- refers to http://hortonworks.com/community/forums/topic/mapjoinmemoryexhaustionexception-on-local-job/
-- before running your query to disable local in-memory joins and force the join to be done as a distributed Map-Reduce phase.
-- After running your query you should set the value back to true with: set hive.auto.convert.join=true;

SET hive.exec.reducers.bytes.per.reducer=100000000;
SET mapreduce.input.fileinputformat.split.maxsize=100000000;
SET hive.auto.convert.join=false;

SET hive.exec.dynamic.partition.mode=nonstrict;

SET hive.vectorized.execution.enabled=false;
SET hive.groupby.skewindata=true;

Reference

Hortonworks
Hortonworks: 5 WAYS TO MAKE YOUR HIVE QUERIES RUN FASTER
Cloudera: Tuning Hive
Hortonworks: Chapter 4. Query Optimisation
HadoopTutorial: Hive Performance Tuning

Reference#

Reference