Databricks Workflow ‘For Each’ Task: Limitations and Workarounds

Nnaemezue Obi-Eyisi
2 min readSep 16, 2024

Positives

The ‘For Each’ task in Databricks workflows works exceptionally well for parallelizing or iterating over an array of values. It can also help save costs in data pipelines by streamlining processing.

Limitations and Workarounds

Limitation 1: Inability to Iterate Over Multi-Layered Nested JSON Schemas

The ‘For Each’ task struggles when it comes to iterating over deeply nested JSON schemas.

  • Workaround 1: Flatten the JSON schema beforehand into an array or an array of dictionary objects. This allows the ‘For Each’ task to iterate over the objects and extract key-value pairs efficiently.
  • Workaround 2: Alternatively, you can flatten the JSON schema in a separate notebook before running the ‘For Each’ task.

Limitation 2: ‘For Each’ Does Not Support Multiple Inner Tasks

Currently, the ‘For Each’ task only supports one inner task, which can be limiting.

  • Workaround: To bypass this restriction, encapsulate the other task chains within a separate job. Then, use the ‘Run Job’ task type to trigger the child job from within…

--

--

Nnaemezue Obi-Eyisi
Nnaemezue Obi-Eyisi

Written by Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini

Responses (1)