Best practices of AWS EMR  [draft]

Reference AWS Big Data Blog: Best practices for resizing and automatic scaling in Amazon EMR

July 3, 2018 · 1 min · 15 words · Eric

Create vs Apply in Kubernetes

kubectl create is so called Imperative Management. This approach will tell Kubernetes API what you want to create, replace or delete, not how you want your Kubernetes cluster world to look like. kubectl apply is part of Declarative Management approach, where changes that you may have applied to a live object (i.e. through scale) are maintained even if you apply other changes to the object. Both approaches are valid ways to work in production....

June 16, 2018 · 1 min · 74 words · Eric

Airflow in Practice - Interactive with Airflow Internal Storage

Problem Definition One typical Airflow usage scenario is to continuously execute some workflow with regular base, and the output data of last iteration will be the input data for the next iteration. One way we can do that is to keep your output data as a local file or store that into database table, and read and update those data in every iteration. However, with those solutions you need to manual handle database connections and that is not convenient sometime....

June 11, 2018 · 2 min · 332 words · Eric

Execute Scripts When User Logon Linux Server

When you logon a Linux server, there must be couples of things that you have to do everyday, e.g. change to your project directory, check disk space, or check the server load. Of course you can create some shortcuts and alias in your .bashrc file, but you still need to manual execute your shortcut command and I am too lazy to even execute that command. Well you can solve this problem by modifying your profile file....

June 4, 2018 · 2 min · 233 words · Eric

Airflow Concept

What is Airflow? Airflow is a platform to programmatically author, schedule and monitor your workflows and pipelines. What are the benefits for using Airflow? Programmatically author workflow In Airflow, you can define your workflow programmatically with Python scripts and that would put you in a very good position by leveraging all the convenience and sweet that Python provide. This is a huge improvement if you experienced with Oozie or other GUI-typed (or even without a GUI) scheduling tools....

June 1, 2018 · 2 min · 286 words · Eric