Evaluating Databricks Job clusters Cost Savings and Performance in ADF

Nnaemezue Obi-Eyisi
4 min readNov 3, 2023

The motivation behind writing this article is to explore the potential cost savings associated with Databricks job clusters and to determine whether using them within Data Factory or in Databricks workflows is the better option. As we are aware, all-purpose clusters tend to be more expensive than job clusters. This cost difference arises not only because interactive clusters require a minimum of about 10 minutes to shut off, but also because all-purpose clusters are more than twice the price of a job cluster ($0.15/DBU vs $0.40/BDU).

In this article, I aim to compare and contrast Data Factory orchestration with and without Databricks workflows. Additionally, I’ll discuss the drawbacks of workflows and compare the performance of job clusters and interactive clusters.

Data Factory with Job Clusters

When using Azure Data Factory to execute a Databricks notebook activity with a job cluster, cluster provisioning is a significant concern, especially for the initial notebook activity. Typically, the initial activity takes around 3 to 5 minutes to kick off, while the system efficiently reuses existing compute resources for subsequent notebook activities, but it still deallocates and reallocate the compute resource for each notebook activity. This takes a…

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini