Table Creation#
Add essential field checking rules when creating table#
1
2
3
4
5
6
7
8
9
10
| CREATE TABLE IF NOT EXISTS time
(
start_time TIMESTAMP CONSTRAINT time_pk PRIMARY KEY,
hour INT NOT NULL CHECK (hour >= 0),
day INT NOT NULL CHECK (day >= 0),
week INT NOT NULL CHECK (week >= 0),
month INT NOT NULL CHECK (month >= 0),
year INT NOT NULL CHECK (year >= 0),
weekday VARCHAR NOT NULL
);
|
Column Referencing#
If you already knew there are some foreign key referencing acrossing different tables, you can specify that when creating your table.
For example, in this songplays
table, there are several columns that is linking with other tables:
- start_time is refering to time table
- user_id is referring to users table
- …
1
2
3
4
5
6
7
8
9
10
11
| CREATE TABLE IF NOT EXISTS songplays (
songplay_id SERIAL CONSTRAINT songplay_pk PRIMARY KEY,
start_time TIMESTAMP REFERENCES time(start_time),
user_id INT REFERENCES users(user_id),
level VARCHAR NOT NULL,
song_id VARCHAR REFERENCES songs(song_id),
artist_id VARCHAR REFERENCES artists(artist_id),
session_id INT NOT NULL,
location VARCHAR,
user_agent TEXT
);
|
Data Ingestion#
When ingesting data into your table, you might meet with data conflict errors like the following
1
| duplicate key value violates unique constraint "artists_pkey"
|
To let your data ingestion process run seamlessly, you can either
- Use
DO NOTHING
arguments to skip these problemic rows - Use
Upsert
argument to update existing rows with latest data
Use DO NOTHING
to Skip Rows#
1
2
3
| INSERT INTO users (user_id, first_name, last_name, gender, level)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (user_id) DO NOTHING
|
Use Upsert
to Update Rows#
1
2
3
4
| INSERT INTO users (user_id, first_name, last_name, gender, level)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (user_id) DO UPDATE
SET level = EXCLUDED.level
|