Machine Learning Glossary

A B C D E F G H I J K L M N O One-hot encoding One-hot encoding is a way to represent categorical variables as numerical data, so that it can be used in machine learning algorithm. It involves creating a new binary column for each unique category in the categorical feature. For example, if a categorical feature has three categories, A, B, and C. Then three new columns, one for each category would be created....

August 20, 2023 · 2 min · 254 words · Eric

Snowflake Concepts

Basic Concept Snowflake is a new model cloud-based enterprise-level data warehouse. Architecture Snowflake’s architecture is a hybrid of traditional shared-disk database architectures and shared-nothing databases architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire dataset locally....

May 13, 2020 · 3 min · 625 words · Eric

Airflow Concept

What is Airflow? Airflow is a platform to programmatically author, schedule and monitor your workflows and pipelines. What are the benefits for using Airflow? Programmatically author workflow In Airflow, you can define your workflow programmatically with Python scripts and that would put you in a very good position by leveraging all the convenience and sweet that Python provide. This is a huge improvement if you experienced with Oozie or other GUI-typed (or even without a GUI) scheduling tools....

June 1, 2018 · 2 min · 286 words · Eric

AWS Concept

CloudWatch CloudWatch’s Free Tier metric update frequency is 5 minutes In the Detailed monitoring data available for your EBS volumes, provisioned IOPS volumes automatically send 1 minute metrics to CloudWatch. EBS EC2 ec2-revoke RevokeSecurityGroupIngress means remove one or more rules from a security group. The value you specify in the revoke request must be existing rule’s value for the rule to be removed. ec2-create-group CreateSecurityGroup means create a security group for use with your account....

May 7, 2017 · 3 min · 503 words · Eric

Advanced SQL Concepts

Query Execution Order Most people would write their SQL queries starting from SELECT part, because it’s more intuitive and close to our natural language. But actually that’s not the way that SQL queries been executed in query engine. Below is the execution order of a SQL query: FROM, JOIN. Tables are joined to get the base data. WHERE. The base data is filtered. GROUP BY. The filtered based data is grouped....

March 14, 2013 · 2 min · 386 words · Eric