At the time of writing this blogpost, I’m a mere one week away from the end of my summer internship on the Exploratory Data Analysis (EDA) team here at Databricks. I can’t believe the summer has flown by this quickly—it feels like just yesterday that I was cloning my team’s repo and pestering my onboarding buddies for help! Over the course of 12 weeks, I completed a series of three project phases with one underlying theme: improving the user experience for interacting with images in the Databricks notebook.
The Databricks Notebook
If you’ve ever interacted with data through code, you’ve probably used a notebook. Notebooks are a type of code editor for Python, SQL, Scala, and R, commonplace in data science and machine learning as a means to extract and use data. As a Data + AI company, Databricks provides customers with its own notebook deeply integrated with the platform.
What is the Databricks Notebook?
The Databricks notebook supports the regular features that other notebooks support, such as a code editor, menu items, and the Databricks Assistant. But what’s special about the Databricks notebook is that it’s extremely well-integrated with the rest of Databricks’ products: Jobs, Delta Live Tables (DLTs), Generative AI (GenAI) pretraining and fine-tuning, and more. Customers use the Databricks notebook to access the entire suite of Databricks’ offerings, so creating a seamless notebook experience (which is what the EDA team focuses on) is an important element for Databricks to unlock the power of data for its customers.
What problem did my intern project tackle?
The Databricks notebook is a mature product, but as with any product, there are always things to improve! Turning data into insight is as much about telling a story as it is about crunching the numbers, and images are a key part of that. Further, as GenAI expands into different domains, such as vision and image generation, training and fine-tuning models with images and videos is becoming increasingly common. Databricks recently released Shutterstock ImageAI, which generates high-quality custom images based on specific business needs.
Researchers and engineers across the world use the Databricks notebook every day for countless applications that involve multimedia files. However, until recently, working with multimedia files in the notebook was cumbersome. For instance, customers had to figure out roundabout ways to embed images in notebook markdown cells, and they couldn’t even open images from the file browser.
My summer intern project focused on improving the user experience for interacting with images in the Databricks notebook. Below are the key features that I rolled out this summer.
Key Features
Embedding images in notebook markdown
We’ve added the ability to embed images in markdown cells in a more user-friendly, standard markdown format. Now, customers can embed images with both relative paths and absolute paths (/Workspace
for workspace files, and /Volumes
for volumes files). This gives customers more flexibility in introducing images into their notebooks, whether it be for data visualization, image training, or feline comic relief.
Drag & drop images into notebook cells
A natural action for customers is dragging and dropping images into the notebook. Previously, dragging and dropping an image into the Databricks notebook resulted in opening the image in a new browser tab, which interrupted the customer’s flow and prevented customers from easily using images.
Now, dragging and dropping an image into a notebook markdown cell automatically uploads the image to the workspace file system and embeds it in the cell!
Due to Databricks’ fast-paced nature and rapid product iteration, I was able to fully roll out most of my project’s key features to production by the end of my internship! Having this much customer impact as an intern was never something that crossed my mind before this summer, and I’m very grateful to have had the opportunity to have a clear influence on our product in the span of just three months.
My Internship Experience
My intern project wasn’t the only thing that I was able to do this summer! I had the opportunity to attend the 2024 Data + AI Summit (DAIS), work on a cool hackathon project with the Databricks Assistant, visit the new and growing Databricks office in Seattle, and go on many, many delicious meal excursions with my intern class.
This summer, I had the opportunity to meet, learn from, and work with many of the industry leaders in the Data + AI space. Moreover, interacting with a large and energetic intern class made me more excited about new technologies than ever before. I’m not hesitant to say that I’ve truly made lifelong friends during my time here.
I’d like to give a special thanks to my mentor Richard Fung, my manager Neha Sharma, our Workspace org director Ted Tomlinson, and the rest of the EDA team for their mentorship. Every one of my team members was so impressively intelligent yet modest—sitting through every one of my minor feature demos and giving extensive feedback to help make my project features better. They’ve taught me invaluable skills that I’ll carry for the rest of my career.
If you’re passionate about building interesting and impactful products, then I recommend that you apply to work at Databricks! You can check out current job opportunities on the Databricks Careers page.
Source link
lol