Effective Retrospective  [draft]

Reference How to do effective retrospective

January 19, 2021 · 1 min · 6 words · Eric

Data Engineering Project Template  [draft]

I will use it to explain some of the fundamentals that we are talking about and eventually bring them to life in a tutorial series. Will also extend the template with missing MLOps parts so tune in! Recap: Data Producers - Python Applications that extract data from chosen Data Sources and push it to Collector via REST or gRPC API calls. Collector - REST or gRPC server written in Python that takes a payload (json or protobuf), validates top level field existence and correctness, adds additional metadata and pushes the data into either Raw Events Topic if the validation passes or a Dead Letter Queue if top level fields are invalid....

January 7, 2021 · 2 min · 394 words · Eric

DataOps  [draft]

What is DataOps? DataOps is a methodology that combines technology, processes, principles, and personnel to automate data orchestration throughout an organization. Data Platform Design Data Model: Kimball Model. Data File Format Comparison: Apache Parquet, Avro, ORC, and Arrow. Open Table Formats: Delta Table, Apache Iceberg, Hudi, and Hive. Data Governance & Management Data Governance and Trust establishes the rules of engagement for the organisation. This includes how data will be managed across roles, responsibilities, decision rights, policies, and standards....

January 7, 2021 · 2 min · 293 words · Eric

Shared-nothing Architecture

Reference Wikipedia: Shared-nothing Architecture

November 13, 2020 · 1 min · 4 words · Eric

Building Modern Data Lake on AWS  [draft]

Reference https://aws.amazon.com/blogs/architecture/lets-architect-modern-data-architectures/ https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32 https://medium.com/pythonistas/complete-guide-to-aws-data-lake-4cc85259deb0

November 10, 2020 · 1 min · 4 words · Eric