Comparing Shallow Clones, Deep Clones, Views, and Create Table As in Databricks Delta Lake

Nnaemezue Obi-Eyisi
2 min readNov 14, 2023

Databricks offers various methods to create copies of existing Delta Lake tables, each serving specific use cases. Let’s explore the characteristics and use cases of shallow clones, deep clones, views, and the CREATE TABLE AS (CTAS) approach.

1. Shallow Clones:

  • Definition: Shallow clones create a snapshot of a Delta table without copying the data files to the clone target, only copying metadata.
  • Metadata Cloned: Schema, partitioning information, invariants, nullability.

Advantages:

  • Cost-effective, as it avoids duplicating data files.
  • Suitable for scenarios where a quick snapshot with minimal storage impact is needed.

Considerations:

  • References data files in the source directory, making them dependent on the source.
  • Repairing the clone may be necessary after vacuuming the source table.

2. Deep Clones:

  • Definition: Deep clones copy both the source table data and its metadata to the clone target.
  • Metadata Cloned: Schema, partitioning information, invariants, nullability, stream metadata, COPY INTO metadata (for deep clones only).

Advantages:

  • Creates an independent copy with its own history, eliminating dependency on the source table.
  • Supports stopping and continuing streams from the source table to the clone.
  • Syntax is simple as it copies almost all metadata from source table

Considerations:

  • More expensive due to duplicating data along with metadata.
  • Ideal for scenarios where a fully independent replica is required.

3. Views:

  • Definition: Views provide a virtual representation of the data without copying it, enabling customized perspectives on the original data.

Advantages:

  • No additional storage cost, as it does not create a physical copy.
  • Ideal for scenarios where a…

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini