Beginner’s Guide: Extract Transform Load (ETL) Playbook- Incremental load design pattern Part2
--
The intended audience of this article are IT enthusiasts and beginner-level data engineers interested in understanding some data engineering principles
In the previous post (read here), I described the various incremental data extraction design considerations, and requirements. In this article, I will continue from where I left off to explain the sample solution approach.
Example of Incremental Source Database Table Extraction to a Staging Table using ETL tool
One of the best practices when designing an incremental extraction process from a source system that involves multiple tables is to use an ETL control table.
ETL Control Table
This is the configuration table that will control and drive the ETL process. This table normally will be at the granularity of each table that needs to be extracted. Below are a couple of fields we tend to capture in the control table
a) Source Table name: This is the name of the source table that we will extract data from
b) Staging Table Name: This is the staging table we will temporarily write the incremental records extracted from the source table.
c) Source Select Column Extraction Query: This is the list of columns we want to extract from the specific source table. This gives us flexibility and efficiency in our ETL pipeline to focus on only the data we need
d) Source Filter (Where) condition: This also helps us apply additional filter conditions to limit the data we want to extract
e) Last Extraction Begin Date: This date is used as the starting date to extract data from the source table for the current incremental load ETL process.
f) Extraction Type: This field is used to dictate if the specific table should be a Full or Incremental load. Sometimes, some tables have a small number of records that extracting the full table data every time has a negligible effect compared to an incremental extraction.
g) ETL Status: This is the status of the ETL job for that table, it can tell if the current table being extracted is in progress, completed, or in an error…