0 - General
1 - Who is an Analytics Engineer?
Traditional Data Teams
- Data Engineers
- Data Engineers are in charge of building the infrastructure that data is hosted on, usually databases.
- DE also manage the ETL process to ensure data is where it needs to be, and in tables for Data Analysts to query.
- Skill set for Data Engineers include SQL, Python, java, other functional programming languages.
- Data Analysts
- Data Analysts tend to work a little bit closer to business decision makers in finance, marketing and other departments.
- They typically query tables that built by Data Engineers, for their dashboards or different types of reports.
- Skill set for Data Analysts include Excel, SQL, Dashboards, Reporting, etc.
There is a natural gap there, where Analysts know what to build, while Engineers know how to build. Because this gap here, there is opportunity to improve the work efficiency and introduce the concept of Analytics Engineers .
ETL vs ELT
ETL
(extract, transform and load) is the process of creating new database objects by extracting data from multiple data sources, transforming it on a local or third party machine, and loading the transformed data into a data warehouse.ELT
(extract, load and transform) is a more recent process of creating new database objects by first extracting and loading raw data into a data warehouse and then transforming that data directly in the warehouse.- Cloud-based data warehouse is a coupling of a database and a supercomputer that can run transformations, run code against that database.
- Scalable compute
- Scalable storage
- Reduction of transfer time
- The new ELT process is made possible by the introduction of cloud-based data warehouse technologies.
Modern Data Team
- Data Engineers can focus on the
E
andL
of raw data and the larger data infrastructure. - Analytics Engineer owns the transformation of raw data up to the Business Intelligence Layer. They are primarily in charge of the
T
in ELT. - Data Analyst can work to deliver dashboards and reporting to stakeholders.
Modern Data Stack
- Loader
- Data Platform
- BI Tools
- dbt connects directly to your data platform to model data. dbt is the
T
part of ELT. - YAML files are used for configuring generic tests.
dbt workflow
Common dbt Command
dbt run
. run dbt command to build database models.dbt test
. run the predefined tests against your database models.dbt docs generate
. generate documents based on in-code documentation.dbt build
. dbt build = dbt run + dbt test.
2 - Setup dbt Cloud
dbt is designed to handle the transformation layer of the ’extract-load-transform’ framework for data platforms. dbt creates a connection to a data platform and runs SQL code against the warehouse to transform data.
The following are dbt supported databases link
- Azure Synapse
- BigQuery
- Databricks
- Dremio
- Postgres
- Redshift
- Snowflake
- Spark
- Starburst & Trino