Snowflake Note II - Data Ingestion

In this note, I am going to create my first Database and table in Snowflake, and load csv data files from AWS S3 into our table.

Typical Data Ingestion Process

  1. Prepare your files
  2. Stage the data
  3. Execute COPY command
  4. Managing regular loads

Loading data from AWS S3 Bucket

Step 1: Database and table initialization

So first of all, we need to create our database. Here we are creating a new database called OUR_FIRST_DATABASE and our table called OUR_FIRST_TABLE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Step 0: create database OUR_FIRST_DATABASE
CREATE DATABASE OUR_FIRST_DATABASE;


-- Step 1: create table OUR_FIRST_TABLE
CREATE TABLE OUR_FIRST_TABLE
(
first_name STRING,
last_name STRING,
address STRING,
city STRING,
state STRING
);

SELECT * FROM OUR_FIRST_TABLE;

Step 2: Load data from AWS S3 bucket

Now we need to load the CSV file from AWS S3 into our table OUR_FIRST_TABLE, here we can use the command COPY INTO to achieve that:

1
2
3
4
5
6
7
8
9
10
11
-- Step 2: load data into table
CREATE OR REPLACE stage my_s3_stage URL='s3://snowflake-essentials';

COPY INTO OUR_FIRST_TABLE
FROM s3://snowflake-essentials/our_first_table_data.csv
FILE_FORMAT = (
type = CSV
field_delimiter = '|'
skip_header = 1
)
;

snowflake-copy-into-from-s3

Step 3: Data Validation

After the data loading processing is completed, we need to validate our data is correct or not.

1
2
-- Step 3: validate result
SELECT * FROM OUR_FIRST_TABLE;

snowflake-first-query

Load data from AWS S3 Bucket as Stage

Snowflake Note I - General Introduction

General Introduction

Basic ideas about Snowflake

  • Separation of storage and compute
  • Built for Cloud environment
    • built from scratch
    • optimized for cloud
    • storage & compute is decoupled
  • Software as a Service
    • No software, infrastructure or upgrade to manage
  • Pay only for used compute & storage
    • Storage & compute charged independently, only for use
  • Scalable
    • Virtual warehouse enable compute scaling
  • Authentication
  • Configuration
  • Resource Management
  • Data Protection
  • Availability
  • Optimization

Unique features in Snowflake

  • Time travel
  • Cloning

snowflake key features

More details can be found at Post not found: snowflake-online-talk Snowflake Online Talk

Basic components in Snowflake

  • Databases
    • data storage layer
  • Warehouse
    • data computation layer
  • Worksheet
    • query window

Price Model

Storage Cost

  • AWS S3 or Azure Blob Storage
  • Based on actual usage
  • Columnar Compression (e.g. Apache Parquet) or other techniques to save cost

Compute Cost

  • Loading data
  • Processing data
  • Only for actual usage
  • Virtual warehouse can help minimize costs

Data Transfer

  • No cost for transferring into Snowflake
  • Costs on transfer out to other regions and to other cloud platforms

Metadata

  • no cost
  • Database definitions
  • table definitions
  • users permission configuration

snowflake-price-model

AWS Services in 5 Minutes - EC2 - EBS

Capability

Amazon EBS provide Block-level storage services.It’s like a hard disk on your computer.

IOPS SSD

A provisioned IOPS SSD should be used for critical business applications that require sustained IOPS performance, or more than 10,000 IOPS or 160 MiB/s of throughput per volume.

Comparison

EBS vs S3

  • EBS is block-level storage. It is designed to support interactive operations.
  • S3 is object-level storage. It is designed to host long-term objects.

AWS Cloud Practitioner Exam Preparation

aws

General Information

  • Introduction to AWS Certified Cloud Practitioner
  • Examination time is around 90 minutes.
  • Examination score is between 100-1000, and the minimum passing score is 700.

Exam Content Outline

Domain 1 - Cloud Concepts (26%):

1.1 Define the AWS Cloud and it’s value proposition
1.2 Identify aspects of AWS Cloud economics
1.3 List the different cloud architecture design principles

Domain 2 - Security and Compliance (25%):

2.1 Define the AWS Cloud shared responsibility model
2.2 Define AWS Cloud security and compliance concepts
2.3 Identify AWS access management capabilities
2.4 Identify resources for security support

Domain 3 - Technology (33%)

3.1 Define method of deploying and operating in the AWS Cloud
3.2 Define the AWS global infrastructure
3.3 Identify the core AWS services
3.4 Identify resources for technology support

Domain 4 - Billing and Pricing (16%)

4.1 Compare and contrast the various pricing models for AWS
4.2 Recognize the various account structures in relation to AWS billing and pricing
4.3 identify resources available for billing support