What is DataOps?
DataOps is a methodology that combines technology, processes, principles, and personnel to automate data orchestration throughout an organization.
Data Platform Design
- Data Model: Kimball Model.
- Data File Format Comparison: Apache Parquet, Avro, ORC, and Arrow.
- Open Table Formats: Delta Table, Apache Iceberg, Hudi, and Hive.
Data Governance & Management
Data Governance and Trust establishes the rules of engagement for the organisation.
This includes how data will be managed across roles, responsibilities, decision rights, policies, and standards.
Data Discovery & Curation
Data Sourcing and Discovery understands the legacy data landscape within the organisation – how to identify and acquire the data sets relevant to the customer, transaction and product data sets defined in the rules framework.
Data Quality & Assurance
Data Quality and Assurance establishes the fitness-for-use of the data sets – identifying and resolving gaps, inconsistencies, and errors in data before datasets are either shared with market participants or merged with market data and used for analytics, automation or pricing.
- Uber: Data Quality at Uber - How to get data right at Uber scale
- DataQualityPro: Creating a Data Quality Firewall and Data Quality SLA
- ScenSoft: Your Guide to Data Quality Management
=======
Data Observability
Five key pillars of Data Observability
- Recency. Freshness
- Volume
- Schema
- Distribution
- Lineage
Data Sharing & Architecture
Data sharing and Architecture delivers the infrastructure and mechanics for consolidating, mastering, and securely administering data requests from customers, accredited data recipients or within the organisation.
Data Lifecycle Management
Data retention, disposal, and decommissioning ensures that conditions of customer consent are adhered to, and that data is de-identified and/or deleted in alignment with the conditions under which the consent has been supplied.