Mastering Data Organization: A Guide to Partitioning Databricks Delta Tables

Nnaemezue Obi-Eyisi
6 min readOct 28, 2023

This article was inspired by a real-world underperforming Spark Streaming pipeline handling IoT data. One of my initial investigative steps was to identify the root cause of performance bottlenecks during upserts or inserts into a delta table. It became evident when I attempted to read that Delta table and create a copy of it in a test folder within my Azure Data Lake Storage container, and it took six hours to copy 67 million records from that table even though it had only 10 fields. Notably, there were no lengthy text fields…

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini