For the internal combustion engine to take over horse drawn carriages, more than the car itself was needed. Paved roads and gas stations had to come into the picture to reap the benefits of the new technology.
We believe a query engine based on an update model will be at the center of the paradigm shift towards streaming data. But to unlock the value in this new incremental way to evaluate query results, we need some radical changes in tooling, since today’s gear is heavily oriented towards static data.
For over 10 years Deephaven users have influenced the evolution of a framework with new kinds of tooling to unlock this value. The fundamentals that users have deemed vital include:
A data transport solution. An engine process that transacts in deltas becomes more valuable when it can extend that model beyond its local resources. As such, a wire protocol that can support data structures representing table changes (as opposed to just static dataframes) is critical. Barrage is an open-source extension of Apache Arrow Flight that packages table updates as flatbuffer metadata types that conform to Flight payloads. Barrage can serve as the backbone for systems driven by dynamic data, benefiting from the ubiquity of gRPC.
A distributed deployment model. Servers are also clients, easily connecting to one another point-to-point. Query results are published to be used by other queries with low coupling: teams within an organization can use each other’s queries and models like programmers use libraries. This avoids the typical horse trading necessitated by a centrally administered system like a monolithic database or a shared message broker, doing for data teams what microservices did for development teams. This model naturally empowers collaboration, pipelining, and parallelization.
Multi-language client APIs. Client APIs in popular languages are an obvious requirement. It is certainly the case that dynamic consumers may be engineered to simply receive raw Barrage or Kafka produced by the engine, but clients that can bidirectionally interoperate open up a range of use cases. Deephaven APIs offer clients not only changes (which puts the burden on client code to apply updates to some client-side data structure), but also a full client-side representation of a dataframe (table) as a dynamic query result. The dataframe gets automatically updated by the Deephaven library in client memory as the engine updates the query results.
Client code, therefore, can concentrate on AI or business logic based on table contents. Efficiently applying updates is the core competence; clients shouldn’t be forced to write code to do it. And, as is the case server-side, clients can send custom code directly to data, embedding evaluation of user defined functions, in either Java or Python, on their query expressions.
Contributors are currently evolving clients in Python, Java, C++, and JavaScript. For other gRPC- and Flight languages, you remain in a position to take advantage of the engine and its update model. The spec is well documented, and use cases around add-only are straight forward to support.
A JavaScript client specifically organized for big and ticking data. Given the prevalence of browser-based user experiences, JavaScript APIs need attention beyond that given other languages. Nice experiences involve smart viewports built to expect dynamic data and lazily support massive tables. JS developers can interact with data at a low-level (as with other clients) or choose to interact with a React Canvas-based grid widget and other utilities that drive modern user experiences.
A web UI that is interactive and enabling. Though some users will want to build their own custom experiences, enabling people to work with updating tables without first developing front-ends is important. Accordingly, Deephaven makes available a Web-UI that provides value out of the box. It is a tightly-integrated, streaming-tables-first UI that is organized to empower exploratory model building, iterative app-dev, and dashboarding use cases.
Integrations with Jupyter Lab and other established experiences. Since real-time changes are table stakes for many data scientists and AI use cases, and since Jupyter experiences offer primary workflows for them, providing infrastructure to bring the two together is important. Deephaven as a Python library, integration of the server within the Jupyter kernel, and portability of the aforementioned JavaScript widgets to be used in iPython clients all play a role in supporting users.
You need more than a query engine and an update model: you also need the right tooling around them. The toolkit that Deephaven makes available has been built with interoperability and extendability front of mind. An ecosystem of Flight applications that inherit the table updates via Barrage is a promising end state. Streaming tables for everyone.