Data Processing Pipelines#
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.sampledata import stocks
from holoviews.operation.timeseries import rolling, rolling_outlier_std
hv.extension('bokeh')
opts.defaults(opts.Curve(width=600, framewise=True))
In the previous guides we discovered how to load and declare dynamic, live data and how to transform elements using dim
expressions and operations. In this guide we will discover how to combine dynamic data with operations to declare lazy and declarative data processing pipelines, which can be used for interactive exploration but can also drive complex dashboards or even bokeh apps.
Declaring dynamic data#
We will begin by declaring a function which loads some data. In this case we will just load some stock data from the bokeh but you could imagine querying this data using REST interface or some other API or even loading some large collection of data from disk or generating the data from some simulation or data processing job.
def load_symbol(symbol, **kwargs):
df = pd.DataFrame(getattr(stocks, symbol))
df['date'] = df.date.astype('datetime64[ns]')
return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close'))
stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT']
dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols)
We begin by displaying our DynamicMap to see what we are dealing with. Recall that a DynamicMap
is only evaluated when you request the key so the load_symbol
function is only executed when first displaying the DynamicMap
and whenever we change the widget dropdown:
dmap
Processing data#
It is very common to want to process some data, for this purpose HoloViews provides so-called Operations
, which are described in detail in the Transforming Elements. Operations
are simply parameterized functions, which take HoloViews objects as input, transform them in some way and then return the output.
In combination with Dimensioned Containers such as HoloMap
and GridSpace
they are a powerful way to explore how the parameters of your transform affect the data. We will start with a simple example. HoloViews provides a rolling
function which smoothes timeseries data with a rolling window. We will apply this operation with a rolling_window
of 30, i.e. roughly a month of our daily timeseries data:
smoothed = rolling(dmap, rolling_window=30)
smoothed
As you can see the rolling
operation applies directly to our DynamicMap
, smoothing each Curve
before it is displayed. Applying an operation to a DynamicMap
keeps the data as a DynamicMap
, this means the operation is also applied lazily whenever we display or select a different symbol in the dropdown widget.
Dynamically evaluating parameters on operations and transforms with .apply
#
The .apply
method allows us to automatically build a dynamic pipeline given an object and some operation or function along with parameter, stream or widget instances passed in as keyword arguments. Internally it will then build a Stream
to ensure that whenever one of these changes the plot is updated. To learn more about streams see the Responding to Events.
This mechanism allows us to build powerful pipelines by linking parameters on a user defined class or even an external widget, e.g. here we import an IntSlider
widget from panel
:
import panel as pn
slider = pn.widgets.IntSlider(name='rolling_window', start=1, end=100, value=50)
Using the .apply
method we could now apply the rolling
operation to the DynamicMap and link the slider to the operation’s rolling_window
parameter (which also works for simple functions as will be shown below). However, to further demonstrate the features of dim
expressions and the .transform
method, which we first introduced in the Transforming elements user guide, we will instead apply the rolling mean using the .df
namespace accessor on a dim
expression:
rolled_dmap = dmap.apply.transform(adj_close=hv.dim('adj_close').df.rolling(slider).mean())
rolled_dmap
The rolled_dmap
is another DynamicMap that defines a simple two-step pipeline, which calls the original callback when the symbol
changes and reapplies the expression whenever the slider value changes. Since the widget’s value is now linked to the plot via a Stream
we can display the widget and watch the plot update:
slider
The power of building pipelines is that different visual components can share the same inputs but compute very different things from that data. The part of the pipeline that is shared is only evaluated once making it easy to build efficient data processing code. To illustrate this we will also apply the rolling_outlier_std
operation which computes outliers within the rolling_window
and again we will supply the widget value
:
outliers = dmap.apply(rolling_outlier_std, rolling_window=slider.param.value)
rolled_dmap * outliers.opts(color='red', marker='triangle')
We can chain operations like this indefinitely and attach parameters or explicit streams to each stage. By chaining we can watch our visualization update whenever we change a stream value anywhere in the pipeline and HoloViews will be smart about which parts of the pipeline are recomputed, which allows us to build complex visualizations very quickly.
The .apply
method is also not limited to operations. We can just as easily apply a simple Python function to each object in the DynamicMap
. Here we define a function to compute the residual between the original dmap
and the rolled_dmap
.
def residual_fn(overlay):
# Get first and second Element in overlay
el1, el2 = overlay.get(0), overlay.get(1)
# Get x-values and y-values of curves
xvals = el1.dimension_values(0)
yvals = el1.dimension_values(1)
yvals2 = el2.dimension_values(1)
# Return new Element with subtracted y-values
# and new label
return el1.clone((xvals, yvals-yvals2),
vdims='Residual')
If we overlay the two DynamicMaps we can then dynamically broadcast this function to each of the overlays, producing a new DynamicMap which responds to both the symbol selector widget and the slider:
residual = (dmap * rolled_dmap).apply(residual_fn)
residual
In later guides we will see how we can combine HoloViews plots and Panel widgets into custom layouts allowing us to define complex dashboards. For more information on how to deploy bokeh apps from HoloViews and build dashboards see the Deploying Bokeh Apps and Dashboards guides. To get a quick idea of what this might look like let’s compose all the components we have no built: