Contents

Snowflake Note II - Data Ingestion

In this note, I am going to create my first Database and table in Snowflake, and load csv data files from AWS S3 into our table.

Typical Data Ingestion Process

  1. Prepare your files
  2. Stage the data
  3. Execute COPY command
  4. Managing regular loads

Loading data from AWS S3 Bucket

Step 1: Database and table initialization

So first of all, we need to create our database. Here we are creating a new database called OUR_FIRST_DATABASE and our table called OUR_FIRST_TABLE

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
-- Step 0: create database OUR_FIRST_DATABASE
CREATE DATABASE OUR_FIRST_DATABASE;


-- Step 1: create table OUR_FIRST_TABLE
CREATE TABLE OUR_FIRST_TABLE
(
  first_name STRING,
  last_name  STRING,
  address    STRING,
  city       STRING,
  state      STRING
);

SELECT * FROM OUR_FIRST_TABLE;

Step 2: Load data from AWS S3 bucket

Now we need to load the CSV file from AWS S3 into our table OUR_FIRST_TABLE, here we can use the command COPY INTO to achieve that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Step 2: load data into table
CREATE OR REPLACE stage my_s3_stage URL='s3://snowflake-essentials';

COPY INTO OUR_FIRST_TABLE
  FROM s3://snowflake-essentials/our_first_table_data.csv
  FILE_FORMAT = (
    type = CSV
    field_delimiter = '|'
    skip_header = 1
  )
;

../../static/snowflake-copy-into-table-from-s3.png

Step 3: Data Validation

After the data loading processing is completed, we need to validate our data is correct or not.

1
2
-- Step 3: validate result
SELECT * FROM OUR_FIRST_TABLE;

../../static/snowflake-first-query.png

Load data from AWS S3 Bucket as Stage