Skip to main content

Databricks Component

Manage compute, workflow jobs, ML models, SQL queries and more within a Databricks workspace.

Component key: databricks

Description

Databricks is an analytics and artificial intelligence platform where you can build, scale and govern data and AI, including generative AI and other machine learning models. This component lets you interact with the Databricks REST API to manage clusters, jobs, libraries, and other resources.

API Documentation

This component was built using the Databricks REST API Reference.

Connections

Databricks Personal Access Token

While service principal authentication is the recommended method for authenticating with the Databricks REST API, you can also use personal access tokens (which are tied to specific users). To generate a personal access token:

  1. Open https://accounts.cloud.databricks.com/workspaces and select your workspace. Open the URL for your workspace (e.g., https://dbc-00000000-aaaa.cloud.databricks.com) and log in.
  2. From the top-right click your user icon and select Settings.
  3. Under the User > Developer tab, select Manage under Access tokens.
  4. Click the Generate New Token button. Enter a description for the token and click Generate. Omit Lifetime (days) to create a token that never expires.

Your token will look similar to dap000000000000000000000000000000000. Copy this token and use it as the token input for the Databricks components.

When configuring an instance in Prismatic, enter the personal access token along with your workspace's endpoint (like dbc-REPLACE-ME.cloud.databricks.com).

See https://docs.databricks.com/en/dev-tools/auth/pat.html for more information on personal access token authentication.

InputNotes
Input
Personal Access Token
password
apiKey
Notes
From DataBricks, go to User Settings > Developer > Access Tokens > Manage > Generate New Token
Input
Host
string
host
Notes
The hostname of your Databricks instance. Include the entire domain name. For example, dbc-1234567890123456.cloud.databricks.com

Databricks Workspace Service Principal

With service principal authentication, you create a service user within your account, grant the user permissions to a workspace, and then generate a client ID and secret pair for that service user. This component uses that key pair to authenticate with workspaces that the service account has been granted permissions to. This is the best practice for authenticating with the Databricks REST API.

To set up service principal authentication, you need to:

  1. Create the service principal
    1. Open https://accounts.cloud.databricks.com/users. Under the Service principals tab select Add service principal.
    2. Give the service principal any name and click Add.
  2. Grant the service principal permission to your workspace
    1. Navigate to https://accounts.cloud.databricks.com/workspaces and select your workspace
    2. Under the Permissions tab select Add permissions
    3. Search for the service principal you created and grant the permission Admin.
  3. Generate a key pair for the service principal
    1. Navigate to the service principal and open the Principal information tab.
    2. Under OAuth secrets select Generate secret.
    3. Take note of the Secret (i.e. "Client Secret") and Client ID you receive. The client ID should be a UUID like 00000000-0000-0000-0000-000000000000. The client secret will look like dose00000000000000000000000000000000.

When configuring an instance in Prismatic, enter the client ID and client secret along with your workspace's token URL endpoint (like https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token) if you need workspace-level API access.

If you need account-level access to the API (i.e. you need to manage workspaces using this service principal), you will need to grant the service principal administrative access to your account, and your token URL will look like https://accounts.cloud.databricks.com/oidc/accounts/<my-account-id>/v1/token.

See https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html for more information on service principal OAuth client credential authentication.

InputDefaultNotesExample
Input
Service Principal Client ID
string
/ Required
clientId
Default
Notes
Client ID of your Service Principal. Make sure that your service principal has been granted the necessary permissions in your Databricks workspace. https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html#step-2-assign-workspace-level-permissions-to-the-databricks-service-principal
Example
00000000-0000-0000-0000-000000000000
Input
Service Principal Client Secret
password
/ Required
clientSecret
Default
Notes
Client Secret of your Service Principal.
Example
dose00000000000000000000000000000000
Input
Scopes
string
/ Required
scopes
Default
all-apis
Notes
OAuth scopes to request. Defaults to all-apis.
Example
 
Input
Token URL
string
/ Required
tokenUrl
Default
https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token
Notes
The OAuth 2.0 Token URL for your Databricks workspace. Replace REPLACE-ME in https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token to reflect your workspace's URL.
Example
 

Data Sources

Select Cluster

Select a Databricks cluster to use | key: selectCluster | type: picklist

InputNotes
Input
Connection
connection
/ Required
connection
Notes
 

Select Node Type

Select a Databricks node type to use | key: selectNodeType | type: picklist

InputNotes
Input
Connection
connection
/ Required
connection
Notes
 

Select SQL Warehouse

Select an SQL Warehouse | key: selectWarehouse | type: picklist

InputNotes
Input
Connection
connection
/ Required
connection
Notes
 

Actions

Create Execution Context

Create a Databricks execution context | key: createExecutionContext

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
Language
string
/ Required
language
Default
python
Notes
 
Example
 

{
"data": {
"id": "1234-567890-reef123"
}
}

Get Cluster

Get a Databricks cluster by ID | key: getCluster

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 

{
"data": {
"cluster_id": "1234-567890-reef123",
"spark_context_id": 4020997813441462000,
"cluster_name": "my-cluster",
"spark_version": "13.3.x-scala2.12",
"aws_attributes": {
"zone_id": "us-west-2c",
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"disk_spec": {
"disk_count": 0
},
"cluster_source": "UI",
"enable_local_disk_encryption": false,
"instance_source": {
"node_type_id": "i3.xlarge"
},
"driver_instance_source": {
"node_type_id": "i3.xlarge"
},
"state": "TERMINATED",
"state_message": "Inactive cluster terminated (inactive for 120 minutes).",
"start_time": 1618263108824,
"terminated_time": 1619746525713,
"last_state_loss_time": 1619739324740,
"num_workers": 30,
"default_tags": {
"Vendor": "Databricks",
"Creator": "someone@example.com",
"ClusterName": "my-cluster",
"ClusterId": "1234-567890-reef123"
},
"creator_user_name": "someone@example.com",
"termination_reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
},
"type": "SUCCESS"
},
"init_scripts_safe_mode": false,
"spec": {
"spark_version": "13.3.x-scala2.12"
}
}
}

Get Command Status

Gets the status of and, if available, the results from a currently executing command. | key: getCommandStatus

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Command ID
string
/ Required
commandId
Default
Notes
The ID of the command to get the status of
Example
00000000000000000000000000000000
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Execution Context ID
string
/ Required
contextId
Default
Notes
The ID of the execution context, likely created by the Create Execution Context action.
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 

{
"data": {
"id": "d4aa2c2f871048e797efdbe635de94be",
"status": "Running",
"result": null
}
}

Get Current User

Get the currently authenticated Databricks user or service principal. | key: getCurrentUser

InputDefaultNotes
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.

{
"data": {
"emails": [
{
"type": "work",
"value": "1d021345-e23c-4f29-84fa-d027a622259e",
"primary": true
}
],
"displayName": "Example Service User",
"schemas": [
"urn:ietf:params:scim:schemas:core:2.0:User",
"urn:ietf:params:scim:schemas:extension:workspace:2.0:User"
],
"name": {
"familyName": "User",
"givenName": "Example Service"
},
"active": true,
"groups": [
{
"display": "admins",
"type": "direct",
"value": "272831250941646",
"$ref": "Groups/272831250941646"
}
],
"id": "7556761598142352",
"userName": "1d021345-e23c-4f29-84fa-d027a622259e"
}
}

Get SQL Warehouse

Get an SQL Warehouse | key: getWarehouse

InputDefaultNotesExample
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
Warehouse ID
string
/ Required
warehouseId
Default
Notes
The ID of an SQL warehouse
Example
0000000000000000

{
"data": {
"id": "0000000000000000",
"name": "Starter Warehouse",
"size": "SMALL",
"cluster_size": "Small",
"min_num_clusters": 1,
"max_num_clusters": 1,
"auto_stop_mins": 60,
"auto_resume": true,
"creator_name": "example@example.com",
"creator_id": 5760885597616698,
"tags": {},
"spot_instance_policy": "COST_OPTIMIZED",
"enable_photon": true,
"enable_serverless_compute": false,
"warehouse_type": "PRO",
"num_clusters": 1,
"num_active_sessions": 0,
"state": "RUNNING",
"jdbc_url": "jdbc:spark://dbc-example-0000.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/0000000000000000;",
"odbc_params": {
"hostname": "dbc-example-0000.cloud.databricks.com",
"path": "/sql/1.0/warehouses/0000000000000000",
"protocol": "https",
"port": 443
},
"health": {
"status": "HEALTHY"
}
}
}

List Clusters

Return information about all pinned clusters, active clusters, up to 200 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days. | key: listClusters

InputDefaultNotes
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.

{
"data": [
{
"cluster_id": "1234-567890-reef123",
"spark_context_id": 4020997813441462000,
"cluster_name": "my-cluster",
"spark_version": "13.3.x-scala2.12",
"aws_attributes": {
"zone_id": "us-west-2c",
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"disk_spec": {
"disk_count": 0
},
"cluster_source": "UI",
"enable_local_disk_encryption": false,
"instance_source": {
"node_type_id": "i3.xlarge"
},
"driver_instance_source": {
"node_type_id": "i3.xlarge"
},
"state": "TERMINATED",
"state_message": "Inactive cluster terminated (inactive for 120 minutes).",
"start_time": 1618263108824,
"terminated_time": 1619746525713,
"last_state_loss_time": 1619739324740,
"num_workers": 30,
"default_tags": {
"Vendor": "Databricks",
"Creator": "someone@example.com",
"ClusterName": "my-cluster",
"ClusterId": "1234-567890-reef123"
},
"creator_user_name": "someone@example.com",
"termination_reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
},
"type": "SUCCESS"
},
"init_scripts_safe_mode": false,
"spec": {
"spark_version": "13.3.x-scala2.12"
}
}
]
}

List Node Types

Returns a list of supported Spark node types. These node types can be used to launch a cluster. | key: listNodeTypes

InputDefaultNotes
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.

{
"data": [
{
"node_type_id": "r4.xlarge",
"memory_mb": 31232,
"num_cores": 4,
"description": "r4.xlarge",
"instance_type_id": "r4.xlarge",
"is_deprecated": false,
"category": "Memory Optimized",
"support_ebs_volumes": true,
"support_cluster_tags": true,
"num_gpus": 0,
"node_instance_type": {
"instance_type_id": "r4.xlarge",
"local_disks": 0,
"local_disk_size_gb": 0,
"instance_family": "EC2 r4 Family vCPUs",
"swap_size": "10g"
},
"is_hidden": false,
"support_port_forwarding": true,
"supports_elastic_disk": true,
"display_order": 0,
"is_io_cache_enabled": false
}
]
}

List SQL Warehouses

List all SQL Warehouses in the Databricks workspace | key: listWarehouses

InputDefaultNotes
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.

{
"data": [
{
"id": "0000000000000000",
"name": "Starter Warehouse",
"size": "SMALL",
"cluster_size": "Small",
"min_num_clusters": 1,
"max_num_clusters": 1,
"auto_stop_mins": 60,
"auto_resume": true,
"creator_name": "example@example.com",
"creator_id": 5760885597616698,
"tags": {},
"spot_instance_policy": "COST_OPTIMIZED",
"enable_photon": true,
"enable_serverless_compute": false,
"warehouse_type": "PRO",
"num_clusters": 1,
"num_active_sessions": 0,
"state": "RUNNING",
"jdbc_url": "jdbc:spark://dbc-example-0000.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/0000000000000000;",
"odbc_params": {
"hostname": "dbc-example-0000.cloud.databricks.com",
"path": "/sql/1.0/warehouses/0000000000000000",
"protocol": "https",
"port": 443
},
"health": {
"status": "HEALTHY"
}
}
]
}

Raw Request

Send raw HTTP request to the Databricks API. | key: rawRequest

InputDefaultNotesExample
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Data
string
data
Default
Notes
The HTTP body payload to send to the URL.
Example
{"exampleKey": "Example Data"}
Input
Debug Request
boolean
debugRequest
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
File Data
string
Key Value List
fileData
Default
Notes
File Data to be sent as a multipart form upload.
Example
[{key: "example.txt", value: "My File Contents"}]
Input
File Data File Names
string
Key Value List
fileDataFileNames
Default
Notes
File names to apply to the file data inputs. Keys must match the file data keys above.
Example
 
Input
Form Data
string
Key Value List
formData
Default
Notes
The Form Data to be sent as a multipart form upload.
Example
[{"key": "Example Key", "value": new Buffer("Hello World")}]
Input
Header
string
Key Value List
headers
Default
Notes
A list of headers to send with the request.
Example
User-Agent: curl/7.64.1
Input
Max Retry Count
string
maxRetries
Default
0
Notes
The maximum number of retries to attempt. Specify 0 for no retries.
Example
 
Input
Method
string
/ Required
method
Default
Notes
The HTTP method to use.
Example
 
Input
Query Parameter
string
Key Value List
queryParams
Default
Notes
A list of query parameters to send with the request. This is the portion at the end of the URL similar to ?key1=value1&key2=value2.
Example
 
Input
Response Type
string
/ Required
responseType
Default
json
Notes
The type of data you expect in the response. You can request json, text, or binary data.
Example
 
Input
Retry On All Errors
boolean
retryAllErrors
Default
false
Notes
If true, retries on all erroneous responses regardless of type. This is helpful when retrying after HTTP 429 or other 3xx or 4xx errors. Otherwise, only retries on HTTP 5xx and network errors.
Example
 
Input
Retry Delay (ms)
string
retryDelayMS
Default
0
Notes
The delay in milliseconds between retries. This is used when 'Use Exponential Backoff' is disabled.
Example
 
Input
Timeout
string
timeout
Default
Notes
The maximum time that a client will await a response to its request
Example
2000
Input
URL
string
/ Required
url
Default
Notes
The URL https://<WORKSPACE-URL>/api/ is prepended to the URL you provide here. For example, if you provide "/2.0/clusters/list", the full URL will be "https://${host}/api/2.0/clusters/list". You can also provide a full URL with protocol (i.e. "https://accounts.cloud.databricks.com/api/2.0/accounts/{account_id}/scim/v2/Groups" to override the prepended base URL.
Example
/2.0/clusters/list
Input
Use Exponential Backoff
boolean
useExponentialBackoff
Default
false
Notes
Specifies whether to use a pre-defined exponential backoff strategy for retries. When enabled, 'Retry Delay (ms)' is ignored.
Example
 

Restart Cluster

Restart a Databricks cluster by ID | key: restartCluster

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 

{
"data": "Cluster restarted successfully"
}

Run Command

Run a command in a Databricks execution context | key: runCommand

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Command
string
/ Required
command
Default
Notes
The executable code to run in the execution context
Example
print(0.1 + 0.2)
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Execution Context ID
string
/ Required
contextId
Default
Notes
The ID of the execution context, likely created by the Create Execution Context action.
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
Language
string
/ Required
language
Default
python
Notes
 
Example
 

{
"data": {
"id": "d4aa2c2f871048e797efdbe635de94be"
}
}

SQL: Execute an SQL Statement

Run a SQL query in the Databricks workspace. You can choose to wait for the result or asynchronously issue the request and return the statement ID. | key: runSql

InputDefaultNotesExample
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
SQL Parameters
code
sqlParameters
Default
Notes
The parameters to use in the SQL statement. This should represent an array of objects, and each object should have a name and value. For example, [{ "name": "my_name", "value": "the name" }
Example
 
Input
SQL Statement
string
/ Required
sqlStatement
Default
Notes
The SQL statement to run
Example
SELECT * FROM table
Input
Warehouse ID
string
/ Required
warehouseId
Default
Notes
The ID of an SQL warehouse
Example
0000000000000000

Parameters can be passed to an SQL query using colon notation. Your parameters should be an array of objects, and each object should have a name and value property, and optional type property.

For example, if your statement reads SELECT * FROM my_table WHERE name = :my_name AND date = :my_date, you can pass the following parameters:

[
{ "name": "my_name", "value": "the name" },
{ "name": "my_date", "value": "2020-01-01", "type": "DATE" }
]

This action will execute the SQL query and then wait for the results to be available, throwing an error if the query fails.

If you expect your query to run for a long time, use the "Raw Request" action to issue a query, take note of the statement_id that is returned, and then use that statement ID to fetch results at a later time. Large results may be split into chunks, and you can use the the next_chunk_internal_link to fetch the next chunk of results. See https://docs.databricks.com/api/workspace/statementexecution/executestatement for more information.


Start SQL Warehouse

Start an SQL Warehouse | key: startWarehouse

InputDefaultNotesExample
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
Warehouse ID
string
/ Required
warehouseId
Default
Notes
The ID of an SQL warehouse
Example
0000000000000000

{
"data": "Warehouse started"
}

Start Terminated Cluster

Start a terminated Databricks cluster by ID | key: startTerminatedCluster

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 

{
"data": "Cluster started successfully"
}

Stop SQL Warehouse

Stop an SQL Warehouse | key: stopWarehouse

InputDefaultNotesExample
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 
Input
Warehouse ID
string
/ Required
warehouseId
Default
Notes
The ID of an SQL warehouse
Example
0000000000000000

{
"data": "Warehouse stopped"
}

Terminate Cluster

Terminate a Databricks cluster by ID | key: terminateCluster

InputDefaultNotesExample
Input
Cluster ID
string
/ Required
clusterId
Default
Notes
The unique identifier for the cluster
Example
1234-567890-reef123
Input
Connection
connection
/ Required
connection
Default
 
Notes
 
Example
 
Input
Debug Request
boolean
debug
Default
false
Notes
Enabling this flag will log out the current request.
Example
 

{
"data": "Cluster terminated successfully"
}