Data Engineering Beginner’s Guide: Modularity and Checkpointing

Nnaemezue Obi-Eyisi
5 min readAug 30, 2021
reference: https://www.mysecuritysign.com/stop-security-checkpoint-ahead-sign/sku-k2-4293

In this article, I will discuss a very important ETL programming concept called Checkpointing. If you are reading my blog for the first time and not familiar with ETL/ELT please review my prior post about ETL. Checkpoint is nothing new in Software engineering a checkpoint literary means “a point where a check is performed”. In the real world, checkpoints are associated with security spots(points) where a traveler either is searched or identified.

The goal of writing this article is to help Data Engineers think and apply these concepts as they build data pipelines. I have purposely not gone into implementation details because there are so many ETL tools in the market and various ways to implement these concepts.

In the context of Data Engineering, where data is traveling from one point to another, a checkpoint is a logic in our data pipeline that keeps track of the successfully completed steps in our ETL code. The purpose of a checkpoint is to save or keep track of all (or the last) successfully completed step(s) in our Data pipeline (ETL) so that in an event of a data pipeline failure, the ETL job will skip successfully completed steps and resume from that failed step.

Modular Code

Before we implement Checkpoints we need to first discuss ETL code modularity

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini