As you may know, the demand for Data Engineers has quadrupled in the last 2 years. There are more jobs than candidates in the job market, especially for US work authorized candidates. I have seen companies offering up to $1500 in referral bonuses.

My goal in writing this post is to give a self-motivated beginner a “simpler” learning path to becoming a data engineer. I have outlined the most important foundational skills needed to break into a Data Engineer role (junior role at least). The data engineer journey or landscape is truly overwhelming and I can say confidently there is…


In this article, I will discuss a very important ETL programming concept called Checkpointing. If you are reading my blog for the first time and not familiar with ETL/ELT please review my prior post about ETL. Checkpoint is nothing new in Software engineering a checkpoint literary means “a point where a check is performed”. In the real world, checkpoints are associated with security spots(points) where a traveler either is searched or identified.

The goal of writing this article is to help Data Engineers think and apply these concepts as they build data pipelines. …

Many organizations aspire to streamline their data analytics architecture. Many look to the latest technological solutions that may have the magic formula to solve all their data challenges. However, I have come across few organizations that create a purpose-driven data analytics architecture. What I mean by purpose-driven is a data architecture that is designed to suit the specific organization data needs and is well suited to meet all their unique business needs. This is very different from implementing a solution provided in Microsoft reference data architecture documentation like the below.

Characteristics of a Poor Data Analytics Architecture

  1. Poor data quality…

This article is targeted at organizations that are recently migrating their data analytics to the Cloud or want to improve their existing architecture.

The goal of this article is to present some points around the use, pros, and cons of Data Lake and Data warehouse solutions in the cloud. If you are CIO, Data Architect, or Data Engineer, after reading this article, you should hopefully have an idea of what solution best fits your company’s use case. However, I would still go ahead and give you my opinion on why it is important to have both a data lake and…

The intended audience of this article are IT enthusiasts and beginner-level data engineers interested in understanding some data engineering principles

In the previous post (read here), I described in detail an incremental load design pattern and solutions. In this article, I will explore the methods of inserting and updating data in the target table and discuss near real-time incremental data extraction processes using Change Data Capture (CDC) and streaming changed data with Kafka.

Methods of Applying Source Table Transactions


When a brand new record is created in the source table and is picked up by the incremental extraction process…

In the previous post (read here), I described the various incremental data extraction design considerations, and requirements. In this article, I will continue from where I left off to explain the sample solution approach.

Example of Incremental Source Database Table Extraction to a Staging Table using ETL tool

ETL Architecture

One of the best practices when designing an incremental extraction process from a source system that involves multiple tables is to use an ETL control table.

ETL Control Table

This is the…

In the previous post (read here), I explored the rationale behind ETL and talked about how we can achieve full data extraction from a source system to a target system. In this post, I would focus our attention on incremental data extraction and some common design patterns.

We know that as we run analytics on our downstream systems, it needs to be on updated data. However, there are different ways of updating our downstream analytical system. One of them is…

Purpose: The goal of this article is to give an introductory guide on some basics of ETL as it relates to Data Engineering.


One of the most important and often overlooked core facets of data engineering is the creation of ETL pipelines. With the popularity of AI and ML projects and the concentration of demand for data scientists. It is easy to deem ETL as an old-fashioned approach to modern data analytics solutions. I have seen so many training programs overlook or give little attention to this subject area when teaching students about data engineering. ETL or ELT is actually more fundamental and necessary before any AI/ML or Data analytics project can be kicked off. …

The motivation for writing this is to explain the major difference between SMP and MPP platforms. I will also explain their appropriate use cases, pros, and cons.

Symmetric Multi-Processor (SMP) Architecture

Symmetric multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes” reference


Let’s start with some history on analytical databases

Relational databases (ex SQL Server…

The goal of this article to illustrate step-by-step how to set up an Azure Application Gateway for your web services using the Azure Portal.


Before we begin with this tutorial, I would like to level set us with some basic understanding of the various components we will be configuring. I think it is crucial to understand what we are doing before doing it. Here are some basic questions one should know

  1. What is an application gateway?
  2. What is a reverse proxy?

3. What are the real-world use cases for Application Gateway?

4. What is the difference between HTTP…


I am passionate about empowering and encouraging people of color in the data analytics career path

