Table Creation

Add essential field checking rules when creating table

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE IF NOT EXISTS time
(
  start_time  TIMESTAMP CONSTRAINT time_pk PRIMARY KEY,
  hour INT NOT NULL CHECK (hour >= 0),
  day INT NOT NULL CHECK (day >= 0),
  week INT NOT NULL CHECK (week >= 0),
  month INT NOT NULL CHECK (month >= 0),
  year INT NOT NULL CHECK (year >= 0),
  weekday VARCHAR NOT NULL
);

Column Referencing

If you already knew there are some foreign key referencing acrossing different tables, you can specify that when creating your table.

For example, in this songplays table, there are several columns that is linking with other tables:

  • start_time is refering to time table
  • user_id is referring to users table
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE IF NOT EXISTS songplays (
songplay_id    SERIAL           CONSTRAINT     songplay_pk        PRIMARY KEY,
start_time     TIMESTAMP        REFERENCES     time(start_time),
user_id        INT              REFERENCES     users(user_id),
level          VARCHAR          NOT NULL,
song_id        VARCHAR          REFERENCES     songs(song_id),
artist_id      VARCHAR          REFERENCES     artists(artist_id),
session_id     INT NOT NULL,
location       VARCHAR,
user_agent     TEXT
);

Data Ingestion

When ingesting data into your table, you might meet with data conflict errors like the following

1
duplicate key value violates unique constraint "artists_pkey"

To let your data ingestion process run seamlessly, you can either

  • Use DO NOTHING arguments to skip these problemic rows
  • Use Upsert argument to update existing rows with latest data

Use DO NOTHING to Skip Rows

1
2
3
INSERT INTO users (user_id, first_name, last_name, gender, level)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (user_id) DO NOTHING

Use Upsert to Update Rows

1
2
3
4
INSERT INTO users (user_id, first_name, last_name, gender, level)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (user_id) DO UPDATE
  SET level = EXCLUDED.level