Snowflake Concepts

Basic Concept Snowflake is a new model cloud-based enterprise-level data warehouse. Architecture Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing databases architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire dataset locally....

May 13, 2020 · 3 min · 625 words · Eric

Airflow Concept

What is Airflow? Airflow is a platform to programmatically author, schedule and monitor your workflows and pipelines. What are the benefits for using Airflow? Programmatically author workflow In Airflow, you can define your workflow programmatically with Python scripts and that would put you in a very good position by leveraging all the convenience and sweet that Python provide. This is a huge improvement if you experienced with Oozie or other GUI-typed (or even without a GUI) scheduling tools....

June 1, 2018 · 2 min · 286 words · Eric

System Design Note - Core Concept  [draft]

Vertical Scaling vs Horizontal Scaling Vertical Scaling scale-up, add more CPU/Memory/Disk on the same server. Horizontal Scaling scale-out, add more servers. Load Balancer (LB) A Load Balancer is a device/service that sits between the user and the server group, and act as an invisible facilitator, ensuring that all resource servers are used equally. Load Balancing can optimise the resource time and avoid unevenly overloading some compute nodes while other compute nodes are left idle....

October 31, 2017 · 1 min · 134 words · Eric

AWS Concept

CloudWatch CloudWatch’s Free Tier metric update frequency is 5 minutes In the Detailed monitoring data available for your EBS volumes, provisioned IOPS volumes automatically send 1 minute metrics to CloudWatch. EBS EC2 ec2-revoke RevokeSecurityGroupIngress means remove one or more rules from a security group. The value you specify in the revoke request must be existing rule’s value for the rule to be removed. ec2-create-group CreateSecurityGroup means create a security group for use with your account....

May 7, 2017 · 3 min · 503 words · Eric

Machine Learning Glossary  [draft]

A B C D E F G H I J K L M N O One-hot encoding One-hot encoding is a way to represent categorical variables as numerical data, so that it can be used in machine learning algorithm. It involves creating a new binary column for each unique category in the categorical feature. For example, if a categorical feature has three categories, A, B, and C. Then three new columns, one for each category would be created....

August 20, 2013 · 2 min · 254 words · Eric