Apache Spark Job Optimisation

Spark Job optimisation 1 spark-submit --py-files ./rs_commons_util.zip --executor-cores 4 --num-executors 4 ./main.py Reference How We Optimise Apache Spark Jobs Apache Spark: Config Cheatsheet What I Learned From Processing Big Data With Apache Spark Cloudera: How-to: Tune Your Apache Spark Jobs (Part 1) Cloudera: How-to: Tune Your Apache Spark Jobs (Part 2) Hortonworks: Spark num-executors setting Best Practices Writing Production-Grade PySpark Jobs Github: ekampf/PySpark-Boilerplate Github: snowplow/spark-example-project

October 28, 2018 · 1 min · 65 words · Eric

Read Note - Learning Spark

Book Info Name: Learning Spark Author: Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia Publisher: O’Reilly Media Release Date: 2015 Topic: Learning Apache Spark 1.x Preface Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm. Apache Spark provides multiple components that can achieve lots of things: Spark SQL is the competitor of Hive for interactive queries MLlib is the competitor of Mahout for machine learning Spark Streaming is the competitor of Storm for streaming GraphX is the competitor of Neo4J for graph processing Apache Spark offers three main benefits: 1) easy to use....

April 1, 2016 · 3 min · 565 words · Eric