Should you be implementing the Data Lakehouse Architecture?

Nnaemezue Obi-Eyisi
4 min readNov 22, 2021

Everything I am about to say in this article is strictly my opinion. I am a big fan of Databricks, and Delta Lake and appreciate the value they bring to modern data analytics. I am currently working with an Oil & Gas client utilizing these technological tools. However, I do have my reservations which I will explain in this article.

The goal of this article is to raise doubts on the validity and efficiency of some of the data platform architectures involving data lakes, lakehouses and Spark that I see implemented in Azure Cloud environment.

As a data engineer working with various clients, I have been bewildered with some design solutions.

Below is a sample scenario that I have encountered in the industry that made me question the reasoning behind it

Building a self service Data platform on Azure Data Lake Gen2

In this project the goal was to build a self service data lake filled with cleaned and prepared data that various business teams could consume for their analytical reporting. As a data engineer, I was tasked with creating ETL pipelines to extract and load data from SQL Server and Oracle Database/Source systems into Azure Data Lake Storage as parquet files. Subsequently, I would create databricks…

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini