Databricks Workflow — Why, When and Tips

Nnaemezue Obi-Eyisi
3 min readJul 6, 2024

Databricks Workflows orchestrate data processing, machine learning, and analytics pipelines on the Databricks Data Intelligence Platform. It provides fully managed orchestration services seamlessly integrated with the Databricks platform, utilizing Databricks Jobs to execute non-interactive code within your workspace and leveraging Delta Live Tables for constructing dependable and sustainable ETL pipelines.

Suitable Use Cases for Databricks Workflows

  1. Databricks Workflow is ideal for orchestrating the Transform (T) or Processing portion of your workloads, where all processing takes place within Databricks. In this scenario, the data is already landed inside ADLS, allowing you to read, process, and write the data back to ADLS.
  2. We can also use workflows for efficient extraction from external sources in the cloud, web APIs, or other sources supported by Databricks.

Unsuitable Use Cases for Databricks Workflows

  1. In cases where data extraction is necessary from on-premise sources requiring a self-hosted integration runtime, or from data sources not efficiently supported by Databricks JDBC connectors, it’s recommended to utilize a more suitable ETL tool such as Azure Data Factory.

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini