Databricks Component
Manage compute, workflow jobs, ML models, SQL queries and more within a Databricks workspace.
Component key: databricks
Description
Databricks is an analytics and artificial intelligence platform for building, scaling, and governing data and AI, including generative AI and other machine learning models. This component allows interacting with the Databricks REST API to manage clusters, jobs, libraries, and other resources.
API Documentation
This component was built using the Databricks REST API Reference.
Connections
Personal Access Token
key: personalAccessTokenWhile service principal authentication is the recommended method for authenticating with the Databricks REST API, personal access tokens (which are tied to specific users) can also be used.
Prerequisites
- A Databricks workspace account
Setup Steps
- Open Databricks Workspaces and select the workspace. Open the URL for the workspace (e.g.,
https://dbc-00000000-aaaa.cloud.databricks.com) and log in. - From the top-right, click the user icon and select Settings.
- Under the User > Developer tab, select Manage under Access tokens.
- Click the Generate New Token button. Enter a description for the token and click Generate. Omit Lifetime (days) to create a token that never expires.
The token will look similar to dap000000000000000000000000000000000. Copy this token for use in the connection configuration.
Configure the Connection
Create a connection of type Databricks Personal Access Token and enter:
- Host: The workspace endpoint (e.g.,
dbc-REPLACE-ME.cloud.databricks.com) - Personal Access Token: The token generated above
See Databricks personal access token authentication for more information.
| Input | Notes | Example |
|---|---|---|
| Personal Access Token | From Databricks, go to User Settings > Developer > Access Tokens > Manage > Generate New Token | |
| Host | The hostname of the Databricks instance. Include the entire domain name. For example, dbc-1234567890123456.cloud.databricks.com |
OAuth 2.0 Client Credentials
key: workspaceServicePrincipalWith service principal authentication, a service user is created within the account, the user is granted permissions to a workspace, and then a client ID and secret pair is generated for that service user. This component uses that key pair to authenticate with workspaces that the service account has been granted permissions to. This is the best practice for authenticating with the Databricks REST API.
Prerequisites
- A Databricks account with administrator access
- Access to Databricks Account Console
Setup Steps
- Create the service principal
- Open Databricks Users. Under the Service principals tab select Add service principal.
- Give the service principal any name and click Add.
- Grant the service principal permission to the workspace
- Navigate to Databricks Workspaces and select the workspace.
- Under the Permissions tab select Add permissions.
- Search for the service principal created above and grant the permission Admin.
- Generate a key pair for the service principal
- Navigate to the service principal and open the Principal information tab.
- Under OAuth secrets select Generate secret.
- Take note of the Secret (i.e. "Client Secret") and Client ID received. The client ID should be a UUID like
00000000-0000-0000-0000-000000000000. The client secret will look likedose00000000000000000000000000000000.
Configure the Connection
Create a connection of type Databricks Workspace Service Principal and enter:
- Token URL: The OAuth 2.0 Token URL for the Databricks workspace. Replace
REPLACE-MEinhttps://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/tokento reflect the workspace URL. For account-level API access, usehttps://accounts.cloud.databricks.com/oidc/accounts/<my-account-id>/v1/tokeninstead. - Scopes: OAuth scopes to request (defaults to
all-apis) - Service Principal Client ID: The Client ID from the generated key pair
- Service Principal Client Secret: The Client Secret from the generated key pair
For account-level access (e.g., managing workspaces using the service principal), grant the service principal administrative access to the account and use the account-level token URL format: https://accounts.cloud.databricks.com/oidc/accounts/<my-account-id>/v1/token.
See Databricks OAuth machine-to-machine authentication for more information on service principal OAuth client credential authentication.
| Input | Notes | Example |
|---|---|---|
| Service Principal Client ID | The client ID of the Databricks Service Principal. The service principal must be granted the necessary permissions in the Databricks workspace. https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html#step-2-assign-workspace-level-permissions-to-the-databricks-service-principal | 00000000-0000-0000-0000-000000000000 |
| Service Principal Client Secret | The client secret of the Databricks Service Principal. | dose00000000000000000000000000000000 |
| Scopes | The OAuth scopes to request. Defaults to all-apis. | all-apis |
| Token URL | The OAuth 2.0 Token URL for the Databricks workspace. Replace REPLACE-ME in https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token to reflect the workspace URL. | https://dbc-REPLACE-ME.cloud.databricks.com/oidc/v1/token |
Data Sources
Select Cluster
Select a Databricks cluster. | key: selectCluster | type: picklist
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
Select Node Type
Select a Databricks node type. | key: selectNodeType | type: picklist
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
Select SQL Warehouse
Select a SQL Warehouse. | key: selectWarehouse | type: picklist
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
Actions
Create Execution Context
Create a Databricks execution context | key: createExecutionContext
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Connection | The Databricks connection to use. | |
| Language | The programming language to use in the execution context. | python |
{
"data": {
"id": "1234-567890-reef123"
}
}
Execute SQL Statement
Run a SQL query in the Databricks workspace. You can choose to wait for the result or asynchronously issue the request and return the statement ID. | key: runSql
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. | |
| SQL Parameters | The parameters to use in the SQL statement. This should represent an array of objects, and each object should have a name and value. For example, [{ "name": "my_name", "value": "the name" } | |
| SQL Statement | The SQL statement to execute against the Databricks SQL warehouse. | SELECT * FROM table |
| Warehouse ID | The unique identifier for the Databricks SQL warehouse. | 0000000000000000 |
Parameters can be passed to an SQL query using colon notation.
Parameters should be an array of objects, and each object should have a name and value property, and optional type property.
For example, if the statement reads SELECT * FROM my_table WHERE name = :my_name AND date = :my_date, the following parameters can be passed:
[
{ "name": "my_name", "value": "the name" },
{ "name": "my_date", "value": "2020-01-01", "type": "DATE" }
]
This action will execute the SQL query and then wait for the results to be available, throwing an error if the query fails.
If the query is expected to run for a long time, use the "Raw Request" action to issue a query, take note of the statement_id that is returned, and then use that statement ID to fetch results at a later time.
Large results may be split into chunks, and the next_chunk_internal_link can be used to fetch the next chunk of results.
See Databricks Statement Execution API for more information.
{
"data": {
"statement_id": "01eea4a3-4f6d-1f31-a8d4-dbc7b6b1a5c4",
"status": {
"state": "SUCCEEDED"
},
"manifest": {
"format": "JSON_ARRAY",
"schema": {
"column_count": 2,
"columns": [
{
"name": "id",
"position": 0,
"type_name": "INT",
"type_text": "int"
},
{
"name": "name",
"position": 1,
"type_name": "STRING",
"type_text": "string"
}
]
},
"total_chunk_count": 1,
"chunks": [
{
"chunk_index": 0,
"row_offset": 0,
"row_count": 2,
"byte_count": 28
}
],
"total_row_count": 2,
"total_byte_count": 28,
"is_volume_operation": false
},
"result": {
"chunk_index": 0,
"row_offset": 0,
"row_count": 2,
"data_array": [
[
"1",
"Alice"
],
[
"2",
"Bob"
]
]
}
}
}
Get Cluster
Get a Databricks cluster by ID | key: getCluster
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Connection | The Databricks connection to use. |
{
"data": {
"cluster_id": "1234-567890-reef123",
"spark_context_id": 4020997813441462000,
"cluster_name": "my-cluster",
"spark_version": "13.3.x-scala2.12",
"aws_attributes": {
"zone_id": "us-west-2c",
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"disk_spec": {
"disk_count": 0
},
"cluster_source": "UI",
"enable_local_disk_encryption": false,
"instance_source": {
"node_type_id": "i3.xlarge"
},
"driver_instance_source": {
"node_type_id": "i3.xlarge"
},
"state": "TERMINATED",
"state_message": "Inactive cluster terminated (inactive for 120 minutes).",
"start_time": 1618263108824,
"terminated_time": 1619746525713,
"last_state_loss_time": 1619739324740,
"num_workers": 30,
"default_tags": {
"Vendor": "Databricks",
"Creator": "someone@example.com",
"ClusterName": "my-cluster",
"ClusterId": "1234-567890-reef123"
},
"creator_user_name": "someone@example.com",
"termination_reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
},
"type": "SUCCESS"
},
"init_scripts_safe_mode": false,
"spec": {
"spark_version": "13.3.x-scala2.12"
}
}
}
Get Command Status
Gets the status of and, if available, the results from a currently executing command. | key: getCommandStatus
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Command ID | The unique identifier of the command whose status will be retrieved. | 00000000000000000000000000000000 |
| Connection | The Databricks connection to use. | |
| Execution Context ID | The ID of the execution context, likely created by the Create Execution Context action. |
{
"data": {
"id": "d4aa2c2f871048e797efdbe635de94be",
"status": "Running",
"result": null
}
}
Get Current User
Get the currently authenticated Databricks user or service principal. | key: getCurrentUser
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
{
"data": {
"emails": [
{
"type": "work",
"value": "1d021345-e23c-4f29-84fa-d027a622259e",
"primary": true
}
],
"displayName": "Example Service User",
"schemas": [
"urn:ietf:params:scim:schemas:core:2.0:User",
"urn:ietf:params:scim:schemas:extension:workspace:2.0:User"
],
"name": {
"familyName": "User",
"givenName": "Example Service"
},
"active": true,
"groups": [
{
"display": "admins",
"type": "direct",
"value": "272831250941646",
"$ref": "Groups/272831250941646"
}
],
"id": "7556761598142352",
"userName": "1d021345-e23c-4f29-84fa-d027a622259e"
}
}
Get SQL Warehouse
Get a SQL Warehouse by ID. | key: getWarehouse
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. | |
| Warehouse ID | The unique identifier for the Databricks SQL warehouse. | 0000000000000000 |
{
"data": {
"id": "0000000000000001",
"name": "Starter Warehouse",
"size": "SMALL",
"cluster_size": "Small",
"min_num_clusters": 1,
"max_num_clusters": 1,
"auto_stop_mins": 60,
"auto_resume": true,
"creator_name": "admin@example.com",
"creator_id": 5760885597616698,
"tags": {},
"spot_instance_policy": "COST_OPTIMIZED",
"enable_photon": true,
"enable_serverless_compute": false,
"warehouse_type": "PRO",
"num_clusters": 1,
"num_active_sessions": 0,
"state": "RUNNING",
"jdbc_url": "jdbc:spark://dbc-example-0001.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/0000000000000001;",
"odbc_params": {
"hostname": "dbc-example-0001.cloud.databricks.com",
"path": "/sql/1.0/warehouses/0000000000000001",
"protocol": "https",
"port": 443
},
"health": {
"status": "HEALTHY"
}
}
}
List Clusters
Return information about all pinned clusters, active clusters, up to 200 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days. | key: listClusters
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
{
"data": [
{
"cluster_id": "1234-567890-reef123",
"spark_context_id": 4020997813441462000,
"cluster_name": "my-cluster",
"spark_version": "13.3.x-scala2.12",
"aws_attributes": {
"zone_id": "us-west-2c",
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"spot_bid_price_percent": 100,
"ebs_volume_count": 0
},
"node_type_id": "i3.xlarge",
"driver_node_type_id": "i3.xlarge",
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"disk_spec": {
"disk_count": 0
},
"cluster_source": "UI",
"enable_local_disk_encryption": false,
"instance_source": {
"node_type_id": "i3.xlarge"
},
"driver_instance_source": {
"node_type_id": "i3.xlarge"
},
"state": "TERMINATED",
"state_message": "Inactive cluster terminated (inactive for 120 minutes).",
"start_time": 1618263108824,
"terminated_time": 1619746525713,
"last_state_loss_time": 1619739324740,
"num_workers": 30,
"default_tags": {
"Vendor": "Databricks",
"Creator": "someone@example.com",
"ClusterName": "my-cluster",
"ClusterId": "1234-567890-reef123"
},
"creator_user_name": "someone@example.com",
"termination_reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
},
"type": "SUCCESS"
},
"init_scripts_safe_mode": false,
"spec": {
"spark_version": "13.3.x-scala2.12"
}
}
]
}
List Node Types
Returns a list of supported Spark node types. These node types can be used to launch a cluster. | key: listNodeTypes
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
{
"data": [
{
"node_type_id": "r4.xlarge",
"memory_mb": 31232,
"num_cores": 4,
"description": "r4.xlarge",
"instance_type_id": "r4.xlarge",
"is_deprecated": false,
"category": "Memory Optimized",
"support_ebs_volumes": true,
"support_cluster_tags": true,
"num_gpus": 0,
"node_instance_type": {
"instance_type_id": "r4.xlarge",
"local_disks": 0,
"local_disk_size_gb": 0,
"instance_family": "EC2 r4 Family vCPUs",
"swap_size": "10g"
},
"is_hidden": false,
"support_port_forwarding": true,
"supports_elastic_disk": true,
"display_order": 0,
"is_io_cache_enabled": false
}
]
}
List SQL Warehouses
List all SQL Warehouses in the Databricks workspace | key: listWarehouses
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. |
{
"data": [
{
"id": "0000000000000001",
"name": "Starter Warehouse",
"size": "SMALL",
"cluster_size": "Small",
"min_num_clusters": 1,
"max_num_clusters": 1,
"auto_stop_mins": 60,
"auto_resume": true,
"creator_name": "admin@example.com",
"creator_id": 5760885597616698,
"tags": {},
"spot_instance_policy": "COST_OPTIMIZED",
"enable_photon": true,
"enable_serverless_compute": false,
"warehouse_type": "PRO",
"num_clusters": 1,
"num_active_sessions": 0,
"state": "RUNNING",
"jdbc_url": "jdbc:spark://dbc-example-0001.cloud.databricks.com:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/warehouses/0000000000000001;",
"odbc_params": {
"hostname": "dbc-example-0001.cloud.databricks.com",
"path": "/sql/1.0/warehouses/0000000000000001",
"protocol": "https",
"port": 443
},
"health": {
"status": "HEALTHY"
}
}
]
}
Raw Request
Send raw HTTP request to the Databricks API. | key: rawRequest
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. | |
| Data | The HTTP body payload to send to the URL. | {"exampleKey": "Example Data"} |
| File Data | File Data to be sent as a multipart form upload. | [{key: "example.txt", value: "My File Contents"}] |
| File Data File Names | File names to apply to the file data inputs. Keys must match the file data keys above. | |
| Form Data | The Form Data to be sent as a multipart form upload. | [{"key": "Example Key", "value": new Buffer("Hello World")}] |
| Header | A list of headers to send with the request. | User-Agent: curl/7.64.1 |
| Max Retry Count | The maximum number of retries to attempt. Specify 0 for no retries. | 0 |
| Method | The HTTP method to use. | |
| Query Parameter | A list of query parameters to send with the request. This is the portion at the end of the URL similar to ?key1=value1&key2=value2. | |
| Response Type | The type of data you expect in the response. You can request json, text, or binary data. | json |
| Retry On All Errors | If true, retries on all erroneous responses regardless of type. This is helpful when retrying after HTTP 429 or other 3xx or 4xx errors. Otherwise, only retries on HTTP 5xx and network errors. | false |
| Retry Delay (ms) | The delay in milliseconds between retries. This is used when 'Use Exponential Backoff' is disabled. | 0 |
| Timeout | The maximum time that a client will await a response to its request | 2000 |
| URL | The URL https://<WORKSPACE-URL>/api/ is prepended to the URL you provide here. For example, if you provide "/2.0/clusters/list", the full URL will be "https://${host}/api/2.0/clusters/list". You can also provide a full URL with protocol (i.e. "https://accounts.cloud.databricks.com/api/2.0/accounts/{account_id}/scim/v2/Groups" to override the prepended base URL. | /2.0/clusters/list |
| Use Exponential Backoff | Specifies whether to use a pre-defined exponential backoff strategy for retries. When enabled, 'Retry Delay (ms)' is ignored. | false |
{
"data": {
"clusters": [
{
"cluster_id": "1234-567890-reef123",
"cluster_name": "my-cluster",
"state": "RUNNING"
}
]
}
}
Restart Cluster
Restart a Databricks cluster by ID | key: restartCluster
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Connection | The Databricks connection to use. |
{
"data": "Cluster restarted successfully"
}
Run Command
Run a command in a Databricks execution context | key: runCommand
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Command | The executable code to run in the execution context. | print(0.1 + 0.2) |
| Connection | The Databricks connection to use. | |
| Execution Context ID | The ID of the execution context, likely created by the Create Execution Context action. | |
| Language | The programming language to use in the execution context. | python |
{
"data": {
"id": "d4aa2c2f871048e797efdbe635de94be"
}
}
Start SQL Warehouse
Start a SQL Warehouse. | key: startWarehouse
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. | |
| Warehouse ID | The unique identifier for the Databricks SQL warehouse. | 0000000000000000 |
{
"data": "Warehouse started"
}
Start Terminated Cluster
Start a terminated Databricks cluster by ID | key: startTerminatedCluster
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Connection | The Databricks connection to use. |
{
"data": "Cluster started successfully"
}
Stop SQL Warehouse
Stop a SQL Warehouse. | key: stopWarehouse
| Input | Notes | Example |
|---|---|---|
| Connection | The Databricks connection to use. | |
| Warehouse ID | The unique identifier for the Databricks SQL warehouse. | 0000000000000000 |
{
"data": "Warehouse stopped"
}
Terminate Cluster
Terminate a Databricks cluster by ID | key: terminateCluster
| Input | Notes | Example |
|---|---|---|
| Cluster ID | The unique identifier for the Databricks cluster. | 1234-567890-reef123 |
| Connection | The Databricks connection to use. |
{
"data": "Cluster terminated successfully"
}
Changelog
2026-02-27
Modernized component with global debug support for all actions