There are several parameters that we can tune in Hive to improve the overall query performance.
For instance,
123456789101112
-- refers to http://hortonworks.com/community/forums/topic/mapjoinmemoryexhaustionexception-on-local-job/
-- before running your query to disablelocal in-memory joins and force the join to be done as a distributed Map-Reduce phase.
-- After running your query you should set the value back to true with: set hive.auto.convert.join=true;SET hive.exec.reducers.bytes.per.reducer=100000000;SET mapreduce.input.fileinputformat.split.maxsize=100000000;SET hive.auto.convert.join=false;SET hive.exec.dynamic.partition.mode=nonstrict;SET hive.vectorized.execution.enabled=false;SET hive.groupby.skewindata=true;