Unveiling the Secrets: External Tables vs. External Volumes in Azure Databricks Unity Catalog
While reviewing the Databricks documentation about Unity Catalog, I came across a concept that initially seemed a bit perplexing: the distinction between accessing data objects stored in our cloud storage using External Tables versus External Volumes. This inspired me to write an article exploring the different methods for accessing data from the enterprise data lake through Unity Catalog. In this article, I delve into various syntax and nuances, explaining how one can efficiently access the data, particularly for organizations that have already established a data lakehouse within their data lake.
Prerequisites for working with External tables, Managed tables, External Volumes
To create an external location, you must first establish a storage credential using the Databricks Access Connector. These steps are essential whether you are creating a Metastore or enabling Unity Catalog.
Default Metastore location for Managed Tables
By default, every Unity Catalog-enabled workspace comes with a preconfigured default Metastore location linked to the customer’s data lake (ADLS Gen 2) storage container. This location serves as the storage repository for managed table data and is automatically established as the initial external location
For instance, if you create a table (Testtable) within a new catalog named Test, under a database named Base, it will utilize this default Metastore location for storage.
Where are Managed Table Data stored?
When I check the default storage location in my catalog explorer, I can see the below
Please note that Databricks intentionally restricts the ability to browse the files within this container, as it is intended to be managed by…