There are several parameters that we can tune in Hive to improve the overall query performance.
For instance,
1
2
3
4
5
6
7
8
9
10
11
12
| -- refers to http://hortonworks.com/community/forums/topic/mapjoinmemoryexhaustionexception-on-local-job/
-- before running your query to disable local in-memory joins and force the join to be done as a distributed Map-Reduce phase.
-- After running your query you should set the value back to true with: set hive.auto.convert.join=true;
SET hive.exec.reducers.bytes.per.reducer=100000000;
SET mapreduce.input.fileinputformat.split.maxsize=100000000;
SET hive.auto.convert.join=false;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.vectorized.execution.enabled=false;
SET hive.groupby.skewindata=true;
|
Reference#