Databricks Component
Manage compute, workflow jobs, ML models, SQL queries and more within a Databricks workspace.
Component key: databricks
Description
Databricks is an analytics and artificial intelligence platform where you can build, scale and govern data and AI, including generative AI and other machine learning models. This component lets you interact with the Databricks REST API to manage clusters, jobs, libraries, and other resources.
API Documentation
This component was built using the Databricks REST API Reference.
Connections
Databricks Personal Access Token
While service principal authentication is the recommended method for authenticating with the Databricks REST API, you can also use personal access tokens (which are tied to specific users). To generate a personal access token:
- Open https://accounts.cloud.databricks.com/workspaces and select your workspace. Open the URL for your workspace (e.g.,
https://dbc-00000000-aaaa.cloud.databricks.com
) and log in. - From the top-right click your user icon and select Settings.
- Under the User > Developer tab, select Manage under Access tokens.
- Click the Generate New Token button. Enter a description for the token and click Generate. Omit Lifetime (days) to create a token that never expires.
Your token will look similar to dap000000000000000000000000000000000
. Copy this token and use it as the token
input for the Databricks components.
When configuring an instance in Prismatic, enter the personal access token along with your workspace's endpoint (like dbc-REPLACE-ME.cloud.databricks.com
).
See https://docs.databricks.com/en/dev-tools/auth/pat.html for more information on personal access token authentication.
Input | Notes |
---|---|
Personal Access Token password apiKey | From DataBricks, go to User Settings > Developer > Access Tokens > Manage > Generate New Token |
Host string host | The hostname of your Databricks instance. Include the entire domain name. For example, dbc-1234567890123456.cloud.databricks.com |
Databricks Workspace Service Principal
With service principal authentication, you create a service user within your account, grant the user permissions to a workspace, and then generate a client ID and secret pair for that service user. This component uses that key pair to authenticate with workspaces that the service account has been granted permissions to. This is the best practice for authenticating with the Databricks REST API.
To set up service principal authentication, you need to:
- Create the service principal
- Open https://accounts.cloud.databricks.com/users. Under the Service principals tab select Add service principal.
- Give the service principal any name and click Add.
- Grant the service principal permission to your workspace
- Navigate to https://accounts.cloud.databricks.com/workspaces and select your workspace
- Under the Permissions tab select Add permissions
- Search for the service principal you created and grant the permission Admin.
- Generate a key pair for the service principal
- Navigate to the service principal and open the Principal information tab.
- Under OAuth secrets select Generate secret.
- Take note of the Secret (i.e. "Client Secret") and Client ID you receive. The client ID should be a UUID like
00000000-0000-0000-0000-000000000000
. The client secret will look likedose00000000000000000000000000000000
.
When configuring an instance in Prismatic, enter the client ID and client secret along with your workspace's token URL endpoint (like https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token
) if you need workspace-level API access.
If you need account-level access to the API (i.e. you need to manage workspaces using this service principal), you will need to grant the service principal administrative access to your account, and your token URL will look like https://accounts.cloud.databricks.com/oidc/accounts/<my-account-id>/v1/token
.
See https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html for more information on service principal OAuth client credential authentication.
Input | Default | Notes | Example |
---|---|---|---|
Service Principal Client ID string / Required clientId | Client ID of your Service Principal. Make sure that your service principal has been granted the necessary permissions in your Databricks workspace. https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html#step-2-assign-workspace-level-permissions-to-the-databricks-service-principal | 00000000-0000-0000-0000-000000000000 | |
Service Principal Client Secret password / Required clientSecret | Client Secret of your Service Principal. | dose00000000000000000000000000000000 | |
Scopes string / Required scopes | all-apis | OAuth scopes to request. Defaults to all-apis. | |
Token URL string / Required tokenUrl | https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token | The OAuth 2.0 Token URL for your Databricks workspace. Replace REPLACE-ME in https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token to reflect your workspace's URL. |
Data Sources
Select Cluster
Select a Databricks cluster to use | key: selectCluster | type: picklist
Input | Notes |
---|---|
Connection connection / Required connection |
Select Node Type
Select a Databricks node type to use | key: selectNodeType | type: picklist
Input | Notes |
---|---|
Connection connection / Required connection |
Select SQL Warehouse
Select an SQL Warehouse | key: selectWarehouse | type: picklist
Input | Notes |
---|---|
Connection connection / Required connection |
Actions
Create Execution Context
Create a Databricks execution context | key: createExecutionContext
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. | |
Language string / Required language | python |
Example Payload for Create Execution Context
{
"data": {
"id": "1234-567890-reef123"
}
}
Get Cluster
Get a Databricks cluster by ID | key: getCluster
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for Get Cluster
{
"data": {
"cluster_id": "1234-567890-reef123",
"spark_context_id": 4020997813441462000,
"cluster_name": "my-cluster",
"spark_version": "13.3.x-scala2.12",
"aws_attributes": {
"zone_id": "us-west-2c",
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"disk_spec": {
"disk_count": 0
},
"cluster_source": "UI",
"enable_local_disk_encryption": false,
"instance_source": {
"node_type_id": "i3.xlarge"
},
"driver_instance_source": {
"node_type_id": "i3.xlarge"
},
"state": "TERMINATED",
"state_message": "Inactive cluster terminated (inactive for 120 minutes).",
"start_time": 1618263108824,
"terminated_time": 1619746525713,
"last_state_loss_time": 1619739324740,
"num_workers": 30,
"default_tags": {
"Vendor": "Databricks",
"Creator": "someone@example.com",
"ClusterName": "my-cluster",
"ClusterId": "1234-567890-reef123"
},
"creator_user_name": "someone@example.com",
"termination_reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
},
"type": "SUCCESS"
},
"init_scripts_safe_mode": false,
"spec": {
"spark_version": "13.3.x-scala2.12"
}
}
}
Get Command Status
Gets the status of and, if available, the results from a currently executing command. | key: getCommandStatus
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Command ID string / Required commandId | The ID of the command to get the status of | 00000000000000000000000000000000 | |
Connection connection / Required connection | |||
Execution Context ID string / Required contextId | The ID of the execution context, likely created by the Create Execution Context action. | ||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for Get Command Status
{
"data": {
"id": "d4aa2c2f871048e797efdbe635de94be",
"status": "Running",
"result": null
}
}
Get Current User
Get the currently authenticated Databricks user or service principal. | key: getCurrentUser
Input | Default | Notes |
---|---|---|
Connection connection / Required connection | ||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for Get Current User
{
"data": {
"emails": [
{
"type": "work",
"value": "1d021345-e23c-4f29-84fa-d027a622259e",
"primary": true
}
],
"displayName": "Example Service User",
"schemas": [
"urn:ietf:params:scim:schemas:core:2.0:User",
"urn:ietf:params:scim:schemas:extension:workspace:2.0:User"
],
"name": {
"familyName": "User",
"givenName": "Example Service"
},
"active": true,
"groups": [
{
"display": "admins",
"type": "direct",
"value": "272831250941646",
"$ref": "Groups/272831250941646"
}
],
"id": "7556761598142352",
"userName": "1d021345-e23c-4f29-84fa-d027a622259e"
}
}
Get SQL Warehouse
Get an SQL Warehouse | key: getWarehouse
Input | Default | Notes | Example |
---|---|---|---|
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. | |
Warehouse ID string / Required warehouseId | The ID of an SQL warehouse | 0000000000000000 |
Example Payload for Get SQL Warehouse
{
"data": {
"id": "0000000000000000",
"name": "Starter Warehouse",
"size": "SMALL",
"cluster_size": "Small",
"min_num_clusters": 1,
"max_num_clusters": 1,
"auto_stop_mins": 60,
"auto_resume": true,
"creator_name": "example@example.com",
"creator_id": 5760885597616698,
"tags": {},
"spot_instance_policy": "COST_OPTIMIZED",
"enable_photon": true,
"enable_serverless_compute": false,
"warehouse_type": "PRO",
"num_clusters": 1,
"num_active_sessions": 0,
"state": "RUNNING",
"jdbc_url": "jdbc:spark://dbc-example-0000.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/0000000000000000;",
"odbc_params": {
"hostname": "dbc-example-0000.cloud.databricks.com",
"path": "/sql/1.0/warehouses/0000000000000000",
"protocol": "https",
"port": 443
},
"health": {
"status": "HEALTHY"
}
}
}
List Clusters
Return information about all pinned clusters, active clusters, up to 200 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days. | key: listClusters
Input | Default | Notes |
---|---|---|
Connection connection / Required connection | ||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for List Clusters
{
"data": [
{
"cluster_id": "1234-567890-reef123",
"spark_context_id": 4020997813441462000,
"cluster_name": "my-cluster",
"spark_version": "13.3.x-scala2.12",
"aws_attributes": {
"zone_id": "us-west-2c",
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"disk_spec": {
"disk_count": 0
},
"cluster_source": "UI",
"enable_local_disk_encryption": false,
"instance_source": {
"node_type_id": "i3.xlarge"
},
"driver_instance_source": {
"node_type_id": "i3.xlarge"
},
"state": "TERMINATED",
"state_message": "Inactive cluster terminated (inactive for 120 minutes).",
"start_time": 1618263108824,
"terminated_time": 1619746525713,
"last_state_loss_time": 1619739324740,
"num_workers": 30,
"default_tags": {
"Vendor": "Databricks",
"Creator": "someone@example.com",
"ClusterName": "my-cluster",
"ClusterId": "1234-567890-reef123"
},
"creator_user_name": "someone@example.com",
"termination_reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
},
"type": "SUCCESS"
},
"init_scripts_safe_mode": false,
"spec": {
"spark_version": "13.3.x-scala2.12"
}
}
]
}
List Node Types
Returns a list of supported Spark node types. These node types can be used to launch a cluster. | key: listNodeTypes
Input | Default | Notes |
---|---|---|
Connection connection / Required connection | ||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for List Node Types
{
"data": [
{
"node_type_id": "r4.xlarge",
"memory_mb": 31232,
"num_cores": 4,
"description": "r4.xlarge",
"instance_type_id": "r4.xlarge",
"is_deprecated": false,
"category": "Memory Optimized",
"support_ebs_volumes": true,
"support_cluster_tags": true,
"num_gpus": 0,
"node_instance_type": {
"instance_type_id": "r4.xlarge",
"local_disks": 0,
"local_disk_size_gb": 0,
"instance_family": "EC2 r4 Family vCPUs",
"swap_size": "10g"
},
"is_hidden": false,
"support_port_forwarding": true,
"supports_elastic_disk": true,
"display_order": 0,
"is_io_cache_enabled": false
}
]
}
List SQL Warehouses
List all SQL Warehouses in the Databricks workspace | key: listWarehouses
Input | Default | Notes |
---|---|---|
Connection connection / Required connection | ||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for List SQL Warehouses
{
"data": [
{
"id": "0000000000000000",
"name": "Starter Warehouse",
"size": "SMALL",
"cluster_size": "Small",
"min_num_clusters": 1,
"max_num_clusters": 1,
"auto_stop_mins": 60,
"auto_resume": true,
"creator_name": "example@example.com",
"creator_id": 5760885597616698,
"tags": {},
"spot_instance_policy": "COST_OPTIMIZED",
"enable_photon": true,
"enable_serverless_compute": false,
"warehouse_type": "PRO",
"num_clusters": 1,
"num_active_sessions": 0,
"state": "RUNNING",
"jdbc_url": "jdbc:spark://dbc-example-0000.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/0000000000000000;",
"odbc_params": {
"hostname": "dbc-example-0000.cloud.databricks.com",
"path": "/sql/1.0/warehouses/0000000000000000",
"protocol": "https",
"port": 443
},
"health": {
"status": "HEALTHY"
}
}
]
}
Raw Request
Send raw HTTP request to the Databricks API. | key: rawRequest
Input | Default | Notes | Example |
---|---|---|---|
Connection connection / Required connection | |||
Data string data | The HTTP body payload to send to the URL. | {"exampleKey": "Example Data"} | |
Debug Request boolean debugRequest | false | Enabling this flag will log out the current request. | |
File Data string Key Value List fileData | File Data to be sent as a multipart form upload. | [{key: "example.txt", value: "My File Contents"}] | |
File Data File Names string Key Value List fileDataFileNames | File names to apply to the file data inputs. Keys must match the file data keys above. | ||
Form Data string Key Value List formData | The Form Data to be sent as a multipart form upload. | [{"key": "Example Key", "value": new Buffer("Hello World")}] | |
Header string Key Value List headers | A list of headers to send with the request. | User-Agent: curl/7.64.1 | |
Max Retry Count string maxRetries | 0 | The maximum number of retries to attempt. Specify 0 for no retries. | |
Method string / Required method | The HTTP method to use. | ||
Query Parameter string Key Value List queryParams | A list of query parameters to send with the request. This is the portion at the end of the URL similar to ?key1=value1&key2=value2. | ||
Response Type string / Required responseType | json | The type of data you expect in the response. You can request json, text, or binary data. | |
Retry On All Errors boolean retryAllErrors | false | If true, retries on all erroneous responses regardless of type. This is helpful when retrying after HTTP 429 or other 3xx or 4xx errors. Otherwise, only retries on HTTP 5xx and network errors. | |
Retry Delay (ms) string retryDelayMS | 0 | The delay in milliseconds between retries. This is used when 'Use Exponential Backoff' is disabled. | |
Timeout string timeout | The maximum time that a client will await a response to its request | 2000 | |
URL string / Required url | The URL https://<WORKSPACE-URL>/api/ is prepended to the URL you provide here. For example, if you provide "/2.0/clusters/list", the full URL will be "https://${host}/api/2.0/clusters/list". You can also provide a full URL with protocol (i.e. "https://accounts.cloud.databricks.com/api/2.0/accounts/{account_id}/scim/v2/Groups" to override the prepended base URL. | /2.0/clusters/list | |
Use Exponential Backoff boolean useExponentialBackoff | false | Specifies whether to use a pre-defined exponential backoff strategy for retries. When enabled, 'Retry Delay (ms)' is ignored. |
Restart Cluster
Restart a Databricks cluster by ID | key: restartCluster
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for Restart Cluster
{
"data": "Cluster restarted successfully"
}
Run Command
Run a command in a Databricks execution context | key: runCommand
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Command string / Required command | The executable code to run in the execution context | print(0.1 + 0.2) | |
Connection connection / Required connection | |||
Execution Context ID string / Required contextId | The ID of the execution context, likely created by the Create Execution Context action. | ||
Debug Request boolean debug | false | Enabling this flag will log out the current request. | |
Language string / Required language | python |
Example Payload for Run Command
{
"data": {
"id": "d4aa2c2f871048e797efdbe635de94be"
}
}
SQL: Execute an SQL Statement
Run a SQL query in the Databricks workspace. You can choose to wait for the result or asynchronously issue the request and return the statement ID. | key: runSql
Input | Default | Notes | Example |
---|---|---|---|
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. | |
SQL Parameters code sqlParameters | The parameters to use in the SQL statement. This should represent an array of objects, and each object should have a name and value. For example, [{ "name": "my_name", "value": "the name" } | ||
SQL Statement string / Required sqlStatement | The SQL statement to run | SELECT * FROM table | |
Warehouse ID string / Required warehouseId | The ID of an SQL warehouse | 0000000000000000 |
Parameters can be passed to an SQL query using colon notation.
Your parameters should be an array of objects, and each object should have a name
and value
property, and optional type
property.
For example, if your statement reads SELECT * FROM my_table WHERE name = :my_name AND date = :my_date
, you can pass the following parameters:
[
{ "name": "my_name", "value": "the name" },
{ "name": "my_date", "value": "2020-01-01", "type": "DATE" }
]
This action will execute the SQL query and then wait for the results to be available, throwing an error if the query fails.
If you expect your query to run for a long time, use the "Raw Request" action to issue a query, take note of the statement_id
that is returned, and then use that statement ID to fetch results at a later time.
Large results may be split into chunks, and you can use the the next_chunk_internal_link
to fetch the next chunk of results.
See https://docs.databricks.com/api/workspace/statementexecution/executestatement for more information.
Start SQL Warehouse
Start an SQL Warehouse | key: startWarehouse
Input | Default | Notes | Example |
---|---|---|---|
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. | |
Warehouse ID string / Required warehouseId | The ID of an SQL warehouse | 0000000000000000 |
Example Payload for Start SQL Warehouse
{
"data": "Warehouse started"
}
Start Terminated Cluster
Start a terminated Databricks cluster by ID | key: startTerminatedCluster
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for Start Terminated Cluster
{
"data": "Cluster started successfully"
}
Stop SQL Warehouse
Stop an SQL Warehouse | key: stopWarehouse
Input | Default | Notes | Example |
---|---|---|---|
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. | |
Warehouse ID string / Required warehouseId | The ID of an SQL warehouse | 0000000000000000 |
Example Payload for Stop SQL Warehouse
{
"data": "Warehouse stopped"
}
Terminate Cluster
Terminate a Databricks cluster by ID | key: terminateCluster
Input | Default | Notes | Example |
---|---|---|---|
Cluster ID string / Required clusterId | The unique identifier for the cluster | 1234-567890-reef123 | |
Connection connection / Required connection | |||
Debug Request boolean debug | false | Enabling this flag will log out the current request. |
Example Payload for Terminate Cluster
{
"data": "Cluster terminated successfully"
}