Recently we need to heavy lifting some datasets from AWS Redshift to AWS Aurora in daily basis.
Intuitively I was thinking this progress should be very straightforward because both Redshift and Aurora are nothing but Postgres instances, and we could utilise all the Postgres toolings (pg_dump, pg_restore, COPY etc) to transfer the data. But in reality, nothing is hard until you start to implement and write the actual code to do the work.
There are few things that are not been considered at the beginning of this project.
From Redshift side, the source data is dynamically changing quite frequently
From Aurora side, there are lots of user operations applying to the data, e.g. creating customerised views linked with source data, granting various permissions to different users
The most important and criterial issue is that the query concurrency scenario is out of scope at the beginning. Schemas and tables will be in Table Lock mode if there is a user query against those table, and hence the data updating process will hang there until the schemas and tables are released.
It seems to the table lock problem is the most criterial issue. Luckily we could use the system table pg_locks to detect the current Table Lock information. pg_locks provides a global view of all locks in the database cluster, not only those relevant to the current database. Although its relation column can be joined against pg_class.oid to identify locked relations, this will only work correctly for relations in the current database (those for which the database column is either the current database’s OID or zero).
We could even use the pg_locks and pg_stat_activity tables to check the query age with the following statement
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SELECT a.datname as db_name, c.relname as relation_name, relation::regclass as table_name, -- l.transactionid, l.mode, l.GRANTED, a.usename, -- a.query, a.query_start, age(now(), a.query_start) AS query_age, a.pid FROM pg_stat_activity a JOIN pg_locks l ON l.pid = a.pid JOIN pg_class c ON c.oid = l.relation ORDERBY a.query_start;