The Deephaven team dedicated its January efforts to interoperability. The use of open formats, the integration of popular tools, and the customization of experiences remain fundamental priorities. The Deephaven projects – deephaven-core, barrage, and web-client-ui – evolved in a full embrace of those principles.
- Developed a new (stand-alone) CSV reader – focused on speed and type inference.
- Introduced multi-thread parallelization of
select()
andupdate()
. - Delivered an integration with Change Data Capture (CDC).
- Established Docker images of Deephaven paired with popular Python AI libraries.
- Changed backend technology to align with best practices for virtual environment management.
- Improved the Kafka Avro integration.
End-of-February deliveries will center around:
- Infrastructure to support plug-ins for both server-side and JS-client-side extendability.
- Useful Python visualization libraries delivered as Deephaven plug-ins, starting with matplotlib and seaborn.
- Pandas DataFrames rendered in Deephaven as table widgets in the browser UI, complete with interactive experiences.
- A more idiomatic and interoperable Python architecture.
- Structuring to support using Deephaven as a library in Java and Python.
- Use of table maps to support grouped-plotting (i.e.
plotBy()
) and other experiences. - Performance benchmarking and testing for both incremental updates and batch operations.
- WIP for the C++ client API: Dynamic data delivered from server to client using Barrage.
All release details can be explored in the Pull-Requests and related GitHub Issues itemized here.
Query engine
Parallel select()
implementation
The query engine can now be configured to use multiple threads to support independent column decorations (via the operations of select()
and update()
), both across multiple columns or within a single one. #1749, #1855
Python, ML, AI, and Data Science
“Base+” images that pair Deephaven with Python AI modules are now available
PyTorch, TensorFlow, SciKit-Learn are important libraries for many Deephaven users, so easing their deployment with the core engine was paramount. You can now find these respective Docker images in the main README or QuickStart. #1803
Efficiency improvements to the DH Learn Library
The Deephaven Learn Library enables you to easily marry the power of the Deephaven engine – and its streaming tables – to Python libraries. RowSet generation, an important part of its value proposition was made more efficient by introducing a builder. #1655
More Python-idiomatic implementation of Deephaven’s datetimeutils
The dtypes
module was refactored into a package with new wrapper classes, proper docstrings, and unit tests. #1812
Other fixes
- Solution for the JPY library’s mishandling of long values. #1461
- Bump to the latest Python 3.7. #1906
Data Sources and Sinks
New CSV reader
Deephaven was relying on the Apache Common CSV reader, but users found its performance unsatisfactory. After exploring other parsers available in open source, we decided to write a new one from scratch. Though alternatives had some interesting capabilities, Deephaven use cases rely on type inference and often need good handling of date-time fields, so a new solution was necessary. The implementation focused on delivering best-of-class performance and is available in its own repo under an Apache 2.0 license. #1629, #1837
CDC ingestion support
Deephaven now supports integrating sources with Change Data Capture software patterns. CDC maps elegantly to Deephaven’s real-time capabilities generally and its incremental update model specifically. We will soon post an example integration with Debezium, using a MySQL source database. #1819
Other fixes
- Support of BigDecimal publishing to Kafka Avro. #1894, #1899
- Improvements for publishing to Kafka related to schema registry and DateTime types. #1877
- Parquet cleanup for flat array sources as it relates to select() of static data. #1793
- Fix to batch export response handling for the pyclient. #1859
Enhancements to the Deephaven docs can be found here. (DH docs are run as a software project in GitHub. It will soon be made public.)
Source link
lol