Databricks Lakehouse Fundamentals

What is a Data Lakehouse? History of Data Warehouse Pros Business Intelligence (BI) Analytics Structured & Clean Data Predefined Schemas Cons Not support semi or unstructured data Inflexible schemas Struggled with volume and velocity upticks Long processing time History of Data Lake Pros Flexible data storage Structured, semi-structured, and unstructured data Steaming support Cost efficient in the cloud Support for AI and Machine Learning Cons No transactional support Poor data reliability Data Lake are not supportive of transactional data, and cannot enforce data quality Primarily due to multiple data types Slow analysis performance Because large volume of data, the performance of analysis is slower the timeliness of decision-making results has never manifested Data governance concerns Governance over the data in a data lake creates challenges with security, and privacy enforcement due to the unstructured nature of the contents of a data lake Data Warehouse still needed Problems with Complex Data Environment Data Lake didn’t fully replaced Data Warehouse for reliable BI insights, Business has implemented complex systems to have Data Lake, Data Warehouse, and additional systems to handle streaming data, machine learning and artificial intelligence requirements....

February 8, 2023 · 7 min · 1327 words · Eric