While real progress has been made in streamlining some aspects of big data analytics workflows, there is still too much duct tape keeping it all together, according to Tristan Handy, the founder and CEO of dbt Labs, which today unveiled a slew of enhancements to dbt Cloud at its annual user conference.
Dbt has emerged as one of the most popular tools for preparing data for analytics. Instead of writing raw SQL code, data engineers write dbt’s syntax to create models that define the data transformations that need to be performed, while respecting dependencies up and down the stack. At runtime, a dbt user calls one model or series of models to execute a transformation in a defined, declarative manner. It’s DevOps discipline meets data engineering, or DataOps.
The DataOps approach of dbt has resonated with millions of workers who use dbt, or analytyics engineers, as dbt Labs likes to call them. When data transformations are coded in dbt, it brings other benefits, like fewer lines of code, automated documentation, visual lineage, and pipeline break notifications.
However, even with these data benefits in hand, it doesn’t mean we have solved all data problems, Handy says.
“The data industry has made real progress towards maturity over the past decade,” Handy says in a press release. “But real problems persist. Siloed data. Lack of trust. Too much ‘duct tape’ in our operational systems.”
Handy elaborated on his thoughts in a blog post last month.
“We can observe from dbt product instrumentation data that a large majority of companies that transition to the cloud adopt at least some elements of a mature analytics workflow–particularly related to data transformations. But what about the other layers of the analytics stack?” he wrote.
There are sticking points in those other layers, he says. For instance, Handy asks whether notebooks and dashboards are well-tested and have provable SLAs. “Do your ingestion pipelines have clear versioning? Do they have processes to roll back schema changes? Do they support multiple environments?”
“Can data consumers request support and declare incidents directly from within the analytical systems they interact with?” he asks. “Do you have on-call rotations? Do you have a well-defined incident management process? The answer to these questions, for almost every company out there, is ‘no,’” he writes.
While it’s unlikely that any one company or product could supply all those capabilities, the folks at dbt Labs are making a go out of filling the gaps and ripping off that duct tape. To that end, dbt Labs today announced a series of enhancements in dbt Cloud, its enterprise offering for analytics professionals. The company says these enhancements represent the “One dbt” vision of creating a single dbt experience across multiple data personas and data platforms as part of what it calls the analytics development lifecycle, or ADLC.
The company today unveiled several enhancements to dbt Cloud that it says will help customers build better data pipelines. That includes dbt Copilot that will automate repetitive manual work around things like creating tests, writing documentation, and creating semantic models. Dbt Labs is also building a chatbot that lets users ask questions of their data using natural language.
Dbt Labs is building on the data mesh that it launched at last year’s Coalesce, which allowed cross-project dbt references, with a new cross-platform mesh. The new offering uses Apache Iceberg to create portable data tables that can be read across different platforms. Benefits include the ability to centrally define and maintain data governance standards, to see end-to-end lineage across various data platforms, and find, reference, and re-use existing data assets instead of rebuilding, dbt Labs says.
Dbt Cloud customers are also getting a new low-code, drag-and-drop environment for building and exploring dbt models. The company says this new environment (which is currently in beta) will allow a new group of less-technical users to develop analytics code themselves.
It will be easier to catch bugs in dbt code before they go into production using the new Advanced CI (continuous integration) offering. Dbt Labs says Advanced CI will make it easier for users to compare code changes as part of the CI process and catch any unexpected behavior before the new code is merged into production. “This improves code quality and helps organizations optimize compute spend by only materializing correct models,” the company says.
Other improvements dbt Labs is making to dbt Cloud include:
- Data health tiles that can be embedded into any downstream app to provide real-time info about their data, including freshness and quality, directly in tools where users work;
- Auto-exposures with Tableau, a new feature that automatically incorporates Tableau dashboards into dbt lineage, boosting data freshness;
- Semantic layer integration with Power BI;
- New supported adapters, including Teradata (preview) and AWS Athena (GA).
Related Items:
AI Impacting Data Engineering Faster Than Expected, dbt Labs’ Handy Says
Tristan Handy’s Audacious Vision of the Future of Data Engineering
Semantic Layer Belongs in Middleware, and dbt Wants to Deliver It
Source link
lol