dbt Fundamentals  [draft]

0 - General dbt Fundamentals dbt Developer Hub 1 - Who is an Analytics Engineer? Traditional Data Teams Data Engineers Data Engineers are in charge of building the infrastructure that data is hosted on, usually databases. DE also manage the ETL process to ensure data is where it needs to be, and in tables for Data Analysts to query. Skill set for Data Engineers include SQL, Python, java, other functional programming languages....

December 2, 2022 · 3 min · 488 words · Eric

Data Engineering Project Template  [draft]

I will use it to explain some of the fundamentals that we are talking about and eventually bring them to life in a tutorial series. Will also extend the template with missing MLOps parts so tune in! Recap: Data Producers - Python Applications that extract data from chosen Data Sources and push it to Collector via REST or gRPC API calls. Collector - REST or gRPC server written in Python that takes a payload (json or protobuf), validates top level field existence and correctness, adds additional metadata and pushes the data into either Raw Events Topic if the validation passes or a Dead Letter Queue if top level fields are invalid....

January 7, 2021 · 2 min · 394 words · Eric

DataOps  [draft]

What is DataOps? DataOps is a methodology that combines technology, processes, principles, and personnel to automate data orchestration throughout an organization. Data Platform Design Data Model: Kimball Model. Data File Format Comparison: Apache Parquet, Avro, ORC, and Arrow. Open Table Formats: Delta Table, Apache Iceberg, Hudi, and Hive. Data Governance & Management Data Governance and Trust establishes the rules of engagement for the organisation. This includes how data will be managed across roles, responsibilities, decision rights, policies, and standards....

January 7, 2021 · 2 min · 293 words · Eric

Uninstall Anaconda on macOS

Sometimes you need to re-config your local Anaconda environment, and need to uninstall Anaconda distribution completely. Automatic Uninstallation Step 1 Install the anaconda-clean package 1 conda install anaconda-clean Step 2 Clean your environment The anaconda-clean command will remove all Anaconda-related files and directories with a confirmation prompt before deleting each one. The --yes argument will help you to skip all confirmation and will remove all these files files and directories without confirmation....

October 10, 2020 · 1 min · 178 words · Eric

Pants  [draft]

Reference Pants Official document Getting started with Pants

March 22, 2019 · 1 min · 8 words · Eric