Databricks can be configured to query data in an external location, like Amazon S3 or Azure Data Lakes. This document covers the information needed to register a Databricks external configuration in Openbridge.
Configuration
There are five configuration elements relating to external storage.
Azure Data Lake Storage Container Name
Azure Data Lake Storage Connection String
Databricks Storage Credentials Name
Databricks External Location Storage URI
Azure Data Lake Storage Container Path Prefix
Be aware that this data will be found in a mix of locations. Some will be in the Azure UI, and others in the Databricks UI.
If you are unfamiliar with using external storage with Databricks, see these docs describing how to configure your Azure and Databricks environments;
Azure External Locations: Manage external locations
Azure Unity Catalog: Metastores and Create Unity Catalog. Please note that Openbridge requires the use of Unity Catalog for external locations.
Azure Data Lake Storage Container Name
The container name in Azure should match your registered external location in Databricks. See below Data Lake External Location Storage URI.
Azure Data Lake Storage Connection String
The connection string for the Azure Data Lake container is specified above.
Databricks Storage Credentials Name
The storage credentials name used when you registered your external location in Databricks. This should be the same credential that appears below in Data Lake External Location Storage URI.
Databricks External Location Storage URI
This is the URI for the external storage location you registered within Databricks.
Azure Data Lake Storage Container Path Prefix
A path is optional. For example, if you have a directory in your container that you want to use for storing all your Openbridge data, you would specify this directory. If the directory were called, myexternaldatabricksdata
you would set the path to myexternaldatabricksdata
. If left empty, we would use the container's root to store data.
Note: Do not place any pre or trailing slashes in the path /myexternaldatabricksdata
or myexternaldatabricksdata
/ or /myexternaldatabricksdata
/