Hive Query Performance Tuning

There are several parameters that we can tune in Hive to improve the overall query performance. For instance, 1 2 3 4 5 6 7 8 9 10 11 12 -- refers to http://hortonworks.com/community/forums/topic/mapjoinmemoryexhaustionexception-on-local-job/ -- before running your query to disable local in-memory joins and force the join to be done as a distributed Map-Reduce phase. -- After running your query you should set the value back to true with: set hive....

May 11, 2015 · 1 min · 108 words · Eric

SQL Query Performance Optimisation

Try not to use SELECT * and provide specific fields Bad Example 1 SELECT * FROM user; Good Example 1 SELECT id, username, tel FROM user; Reason Specify fields will save computational resources, and reduce network expense. It’s possible using pre-built table index to reduce table returns, and improve query efficiency. Avoid using OR in the where clause to connect conditions Bad Example 1 2 SELECT id, username, tel FROM user WHERE id=1 OR salary=500; Good Example Use UNION ALL to concatenate results 1 2 3 4 5 SELECT id, username, tel FROM user WHERE id=1 UNION ALL SELECT id, username, tel FROM user WHERE salary=500; Use two separate SQL 1 2 SELECT id, username, tel FROM user WHERE id=1 1 2 SELECT id, username, tel FROM user WHERE salary=500; Reason The use of OR may invalidate the table index, and thus will do full table scan....

August 11, 2013 · 4 min · 673 words · Eric