Real-time windowed and cumulative calculations made easy | Deephaven

Real-time windowed and cumulative calculations made easy | Deephaven


Each Deephaven Core release brings new features to get excited about. One of this year’s biggest and most exciting is the update_by table operation. Its introduction to the table API has transformed what used to be difficult and tedious doable in a single function call.

Efficiently perform windowed or cumulative calculations, compute inter-row statistics, aggregate, and group data all with one function.

update_by adds new capability for computing inter-row statistics. This enables efficient and expressive cross-row calculations, such as the differences between rows, or an exponential moving average. It can perform two types of operations: cumulative and rolling.

  • Cumulative operations are applied to an entire table, such as a cumulative sum.
  • Rolling operations are applied to a subset of rows in a table, such as a rolling sum over the previous ten rows or the previous ten seconds.

update_by is highly efficient and will only recompute the absolute minimum amount of data required for each calculation. It retains the current table rows and simply appends the aggregating columns. It’s also parallelized – concurrency can be achieved even in Python, which is typically limited by the GIL.

This operation’s name is derived from the combination of two of Deephaven’s most commonly used operations – update and agg_by.

  • It’s like update because it updates table data, creating new columns with specified calculations.
  • It’s like agg_by because the calculations implemented by update_by are done on grouped (aggregated) sets of data. As with other aggregations, they can be done on a per-group basis, or applied to an entire table.

Examples are more illustrative of capabilities than a long list. So, here are a few that show what update_by can do.

Rolling averages

The following example calculates row-based and time-based rolling averages of a column with numeric data. No grouping columns are given, so the calculations are applied to the whole table. The row-based moving average calculates an average given the previous row, current row, and next row. The same is true for the time-based average, which includes all rows within plus or minus 1.5 seconds from the current row.

from deephaven.updateby import rolling_avg_tick, rolling_avg_time
from deephaven.time import parse_instant
from deephaven import empty_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = empty_table(25).update(["Timestamp = base_time + i * SECOND", "X = i"])

row_based_avg = rolling_avg_tick(cols=["RowAvgX = X"], rev_ticks=2, fwd_ticks=1)
time_based_avg = rolling_avg_time(ts_col="Timestamp", cols=["TimeAvgX = X"], rev_time="PT1.5S", fwd_time="PT1.5S")

result = source.update_by(ops=[row_based_avg, time_based_avg], by=[])

EMAs grouped by a symbol

The following example calculates row and time-based EMAs of the same data as the example above. This time around, the Symbol column is given as the grouping column. It contains alternating symbols Tau and Epsilon, so the EMAs are calculated on a per-symbol basis.

from deephaven.updateby import ema_tick, ema_time
from deephaven.time import parse_instant
from deephaven import empty_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = empty_table(25).update(["Timestamp = base_time + i * SECOND", "Symbol = (i % 2 == 0) ? `Tau` : `Epsilon`", "X = i"])

row_based_ema = ema_tick(decay_ticks=5, cols=["RowEmaX = X"])
time_based_ema = ema_time(ts_col="Timestamp", decay_time="PT5S", cols=["TimeEmaX = X"])

result = source.update_by(ops=[row_based_ema, time_based_ema], by=["Symbol"])

result_tau = result.where(["Symbol = `Tau`"])
result_epsilon = result.where(["Symbol = `Epsilon`"])

Rolling group in real-time

The following example calculates a rolling group on real-time data.

from deephaven.updateby import rolling_group_tick, rolling_group_time
from deephaven.time import parse_instant
from deephaven import time_table

base_time = parse_instant("2023-04-01T10:00:00 ET")

source = time_table("PT0.5S").update(["Symbol = (i % 2 == 0) ? `Tau` : `Epsilon`", "X = i"])

row_group_op = rolling_group_tick(cols=["RowGroupX = X"], rev_ticks=4, fwd_ticks=5)
time_group_op = rolling_group_time(ts_col="Timestamp", cols=["TimeGroupX = X"], rev_time="PT5S", fwd_time="PT3S")

result = source.update_by(ops=[row_group_op, time_group_op], by=["Symbol"])

result_tau = result.where(["Symbol = `Tau`"])
result_epsilon = result.where(["Symbol = `Epsilon`"])

A lot. Here’s the full list:

Have any questions for us? Reach out on Slack!



Source link
lol

By stp2y

Leave a Reply

Your email address will not be published. Required fields are marked *

No widgets found. Go to Widget page and add the widget in Offcanvas Sidebar Widget Area.