Comparing Shallow Clones, Deep Clones, Views, and Create Table As in Databricks Delta Lake
3 min readNov 14, 2023
Databricks offers various methods to create copies of existing Delta Lake tables, each serving specific use cases. Let’s explore the characteristics and use cases of shallow clones, deep clones, views, and the CREATE TABLE AS (CTAS) approach.
1. Shallow Clones:
- Definition: Shallow clones create a snapshot of a Delta table without copying the data files to the clone target, only copying metadata.
- Metadata Cloned: Schema, partitioning information, invariants, nullability.
Advantages:
- Cost-effective, as it avoids duplicating data files.
- Suitable for scenarios where a quick snapshot with minimal storage impact is needed.
Considerations:
- References data files in the source directory, making them dependent on the source.
- Repairing the clone may be necessary after vacuuming the source table.
2. Deep Clones:
- Definition: Deep clones copy both the source table data and its metadata to the clone target.
- Metadata Cloned: Schema, partitioning information, invariants, nullability, stream metadata, COPY INTO…