Databricks can be configured to query data in an external location, like Amazon S3 or Azure Data Lakes. This document covers the information needed to register a Databricks external configuration in Openbridge.
Configuration
There are five configuration elements relating to external storage.
Azure Data Lake Storage Container Name
Azure Data Lake Storage Connection String
Databricks Storage Credentials Name
Databricks External Location Storage URI
Azure Data Lake Storage Container Path Prefix
Be aware that this data will be found in a mix of locations. Some will be in the Azure UI, and others in the Databricks UI.
If you are unfamiliar with using external storage with Databricks, see these docs describing how to configure your Azure and Databricks environments;
Azure External Locations: Manage external locations
Azure Unity Catalog: Metastores and Create Unity Catalog. Please note that Openbridge requires the use of Unity Catalog for external locations.
Azure Data Lake Storage Container Name
The container name in Azure should match your registered external location in Databricks.
Azure Data Lake Storage Connection String
The connection string for the Azure Data Lake storage account of the container.
Databricks Storage Credentials Name & External Location Storage URI
The name of the storage credentials used when you registered your external location in Databricks. This should be the same credential that appears below in the Data Lake External Location Storage URI. The credential will look like a long string: xxxxxx-xxxx-xxxx-xxxx-xxxxxxx-xxxx-xxxxxx-xxxx-xxxxxxxx
This is the URI for the external storage location you registered within Databricks.The URI will always begin with a abfss
.
β
The URI also follows a standard format with your container name and storage account like this: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/
Azure Data Lake Storage Container Path Prefix
A path is optional, though we suggest creating a folder in your container called parquet
to organize deliveries.
β
For example, if you have a directory in your container that you want to use for storing all your Openbridge data, you would first create that directory in your container and specify this directory in the setup screen on Openbridge.
β
If the directory were called, myexternaldatabricksdata
you would set the path to myexternaldatabricksdata
. If left empty, we would use the container's root to store data.
Note: Do not place any pre or trailing slashes in the path /myexternaldatabricksdata
or myexternaldatabricksdata
/ or /myexternaldatabricksdata
/