Unveiling the Secrets: External Tables vs. External Volumes in Azure Databricks Unity Catalog

Nnaemezue Obi-Eyisi
7 min readSep 25, 2023

While reviewing the Databricks documentation about Unity Catalog, I came across a concept that initially seemed a bit perplexing: the distinction between accessing data objects stored in our cloud storage using External Tables versus External Volumes. This inspired me to write an article exploring the different methods for accessing data from the enterprise data lake through Unity Catalog. In this article, I delve into various syntax and nuances, explaining how one can efficiently access the data, particularly for organizations that have already established a data lakehouse within their data lake.

Prerequisites for working with External tables, Managed tables, External Volumes

To create an external location, you must first establish a storage credential using the Databricks Access Connector. These steps are essential whether you are creating a Metastore or enabling Unity Catalog.

Default Metastore location for Managed Tables

By default, every Unity Catalog-enabled workspace comes with a preconfigured default Metastore location linked to the customer’s data lake (ADLS Gen 2) storage container. This location serves as the storage repository for managed table data and is automatically established as the initial external location

For instance, if you create a table (Testtable) within a new catalog named Test, under a database named Base, it will utilize this default Metastore location for storage.

Where are Managed Table Data stored?

When I check the default storage location in my catalog explorer, I can see the below

Please note that Databricks intentionally restricts the ability to browse the files within this container, as it is intended to be managed by…

--

--

Nnaemezue Obi-Eyisi

I am passionate about empowering, educating, and encouraging individuals pursuing a career in data engineering. Currently a Senior Data Engineer at Capgemini