Beginner’s Guide: Extract Transform Load (ETL) Playbook- Incremental load
The intended audience of this article are IT enthusiasts and beginner-level data engineers interested in understanding some data engineering principles
In the previous post (read here), I explored the rationale behind ETL and talked about how we can achieve full data extraction from a source system to a target system. In this post, I would focus our attention on incremental data extraction and some common design patterns.
We know that as we run analytics on our downstream systems, it needs to be on updated data. However, there are different ways of updating our downstream analytical system. One of them is performing a full data extraction (Full Load) like discussed in the previous post another method that will be discussed in detail in this post is called Incremental Load.
What is Incremental data load/extraction?
Incremental data extraction is the process of copying only changed or new data from source to target system based on a preset time interval. Some people would call it batch data load because you copy new or modified data after a certain time interval has elapsed. For example, in many organizations, incremental data extraction occurs hourly, daily, weekly, monthly, etc. These ETL jobs are designed to extract only the new or changed records from the last extraction date/time to the current date/time.