Deephaven Community release 0.28 focused on client-API and server-side Python features. We expect it to be the penultimate release before 1.0.
Independent of the release, contributors continue to spend considerable time on Parquet and Iceberg integrations, as well as flexibility and ease-of-use upgrades related to real-time data ingestion. Full release notes can be found on GitHub.
Over the last few releases, the team has modernized Deephaven’s time library. At its core, Deephaven’s engine relies on standard Java date-time types: Instant, LocalDate, ZoneId, ZonedDataTime, Period, Duration, etc. With this release, the distinction between Java date-times and Deephaven Python date-times is much clearer. Additionally, users can now easily convert to NumPy, Pandas, or Python equivalents, offering full coverage of familiar patterns.
The team will continue to document the two fundamental approaches for working with time types:
(1) Keeping the processing and calculations on the Python side by using familiar Python libraries’ time types and deephaven.time module.
(2) Pushing (or keeping) data within Java, and relying on the Query Language time APIs, inheriting both extreme flexibility and heightened performance.
Stay tuned for more communication about best practices and example use cases.
The multi_join operation is now available in the Python API.
For use cases where you need to string together a series of relational joins, multi_join
performs parallel natural_joins
based on the distinct keys presented. In many cases, this accelerates the delivery of results meaningfully, and, of course, this applies to real-time tables as well. The syntax below gives you a taste.
from deephaven.table multi_join
t1 = foo
t2 = bar
t3 = baz
result = multi_join(input=[t1,t2,t3], on="C1").table()
The C++ API, its docs, and its build mechanics have received significant attention. C++ client TableHandles now support time series found via aj()
, raj()
, efficient projections for sparse tables via LazyUpdate()
, and filtering one table based on another using WhereIn()
.
The C++ client also supports InputTables. This is a powerful structure for bringing to the server-side worker static or dynamic sources that you have client-side. Users employ InputTables to support client UIs, one-time ports of DataFrames, and low-frequency updates to tables from the client. InputTables support keyed or append-only inputs. This is one of many alternatives, including Arrow DoPuts and other native, non-intermediated connections between the Deephaven engine and external sources.
Deephaven’s R Client API empowers R developers to inherit the server-side power of Deephaven via R interfaces. Many use Deephaven as a real-time interface with periodic snapshots communicating back and forth with their R infrastructure. They also use Deephaven as a partner technology for R, using its compelling math libraries in conjunction with the Python interfaces and high-performance Deephaven provides.
Since the R Client is an abstraction on top of the gRPC C++ client API noted above, R, too, inherits the new operations noted accordingly.
Another nice capability: R clients can now be created from other libraries.
Deephaven’s Kafka ingestor now supports protobuf, the cross-platform format used to serialize structured data. Its protobuf_spec adds read support for Kafka Protobuf (https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) for Deephaven tables. Schema changes will handled automatically.
Advanced users can further configure options via Deephaven’s ProtobufConsumeOptions class.
We hope Deephaven’s Community docs provide guidance and answer all your questions, but please communicate with the contributing team via our Slack if you need more help or have other questions to discuss.
Source link
lol