Snowflake Concepts

Basic Concept Snowflake is a new model cloud-based enterprise-level data warehouse. Architecture Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing databases architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire dataset locally....

May 13, 2020 · 3 min · 625 words · Eric

Airflow Notes I - Basic Concept

What is Airflow? Airflow is a platform to programmatically author, schedule and monitor your workflows and pipelines. What are the benefits for using Airflow? Programmatically author workflow In Airflow, you can define your workflow programmatically with Python scripts and that would put you in a very good position by leveraging all the convenience and sweet that Python provide. This is a huge improvement if you experienced with Oozie or other GUI-typed (or even without a GUI) scheduling tools....

June 1, 2018 · 2 min · 286 words · Eric

AWS Concept

CloudWatch CloudWatch’s Free Tier metric update frequency is 5 minutes In the Detailed monitoring data available for your EBS volumes, provisioned IOPS volumes automatically send 1 minute metrics to CloudWatch. EBS EC2 ec2-revoke RevokeSecurityGroupIngress means remove one or more rules from a security group. The value you specify in the revoke request must be existing rule’s value for the rule to be removed. ec2-create-group CreateSecurityGroup means create a security group for use with your account....

May 7, 2017 · 3 min · 503 words · Eric

Advanced SQL Concepts

Query Execution Order Most people would write their SQL queries starting from SELECT part, because it’s more intuitive and close to our natural language. But actually that’s not the way that SQL queries been executed in query engine. Below is the execution order of a SQL query: FROM, JOIN. Tables are joined to get the base data. WHERE. The base data is filtered. GROUP BY. The filtered based data is grouped....

March 14, 2013 · 2 min · 386 words · Eric

Basic SQL Concepts

Join Full Join Inner Join Left Join Right Join Aggregation Function COUNT(). MAX(). MIN(). SUM(). AVG(). The GROUP BY statement groups rows that have the same values into summary rows, like ‘find the number of customers in each country’. The GROUP BY statement is often used with above aggregation functions to group the result-set by one or more columns.

February 14, 2013 · 1 min · 59 words · Eric