My Review of Databricks Delta Lake Change Data Feed
In this article, I will give my unbiased assessment of Databricks Delta Lake change data feed, analyzing the potential use cases, pros and cons.
One of the recent features introduced by Databricks is Delta Table Change Data feed. Databricks describes it as below
“Change data feed allows Databricks to track row-level changes between versions of a Delta table. When enabled on a Delta table, the runtime records change events for all the data written into the table. This includes the row data along with metadata indicating whether the specified row was inserted, deleted, or updated.” Reference
This means that in Delta Lake Tables we can now access the audit trail of the changes that happened in the table.
Before this we would need to use Databricks time travel feature to see what changed from the previous version of the table to the current version of the table.
How is change data feed useful
- Let’s say you want to implement an incremental load data extraction from your bronze delta table to your silver and gold tables. You can identify the changed records with a watermark field. However, Change Data Feed provides this information in a much cleaner and robust way. We can use this to implement incremental data extraction across our…