Data Processing Pipelines#
import pandas as pd import holoviews as hv from holoviews import opts from bokeh.sampledata import stocks from holoviews.operation.timeseries import rolling, rolling_outlier_std hv.extension('bokeh') opts.defaults(opts.Curve(width=600, framewise=True))
In the previous guides we discovered how to load and declare dynamic, live data and how to transform elements using
dim expressions and operations. In this guide we will discover how to combine dynamic data with operations to declare lazy and declarative data processing pipelines, which can be used for interactive exploration but can also drive complex dashboards or even bokeh apps.
Declaring dynamic data#
We will begin by declaring a function which loads some data. In this case we will just load some stock data from the bokeh but you could imagine querying this data using REST interface or some other API or even loading some large collection of data from disk or generating the data from some simulation or data processing job.
def load_symbol(symbol, **kwargs): df = pd.DataFrame(getattr(stocks, symbol)) df['date'] = df.date.astype('datetime64[ns]') return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close')) stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT'] dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In , in <cell line: 7>() 4 return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close')) 6 stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT'] ----> 7 dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols) NameError: name 'hv' is not defined
We begin by displaying our DynamicMap to see what we are dealing with. Recall that a
DynamicMap is only evaluated when you request the key so the
load_symbol function is only executed when first displaying the
DynamicMap and whenever we change the widget dropdown:
It is very common to want to process some data, for this purpose HoloViews provides so-called
Operations, which are described in detail in the Transforming Elements.
Operations are simply parameterized functions, which take HoloViews objects as input, transform them in some way and then return the output.
In combination with Dimensioned Containers such as
GridSpace they are a powerful way to explore how the parameters of your transform affect the data. We will start with a simple example. HoloViews provides a
rolling function which smoothes timeseries data with a rolling window. We will apply this operation with a
rolling_window of 30, i.e. roughly a month of our daily timeseries data:
smoothed = rolling(dmap, rolling_window=30) smoothed
As you can see the
rolling operation applies directly to our
DynamicMap, smoothing each
Curve before it is displayed. Applying an operation to a
DynamicMap keeps the data as a
DynamicMap, this means the operation is also applied lazily whenever we display or select a different symbol in the dropdown widget.
Dynamically evaluating parameters on operations and transforms with
.apply method allows us to automatically build a dynamic pipeline given an object and some operation or function along with parameter, stream or widget instances passed in as keyword arguments. Internally it will then build a
Stream to ensure that whenever one of these changes the plot is updated. To learn more about streams see the Responding to Events.
This mechanism allows us to build powerful pipelines by linking parameters on a user defined class or even an external widget, e.g. here we import an
IntSlider widget from
import panel as pn slider = pn.widgets.IntSlider(name='rolling_window', start=1, end=100, value=50)
.apply method we could now apply the
rolling operation to the DynamicMap and link the slider to the operation’s
rolling_window parameter (which also works for simple functions as will be shown below). However, to further demonstrate the features of
dim expressions and the
.transform method, which we first introduced in the Transforming elements user guide, we will instead apply the rolling mean using the
.df namespace accessor on a
rolled_dmap = dmap.apply.transform(adj_close=hv.dim('adj_close').df.rolling(slider).mean()) rolled_dmap
rolled_dmap is another DynamicMap that defines a simple two-step pipeline, which calls the original callback when the
symbol changes and reapplies the expression whenever the slider value changes. Since the widget’s value is now linked to the plot via a
Stream we can display the widget and watch the plot update:
The power of building pipelines is that different visual components can share the same inputs but compute very different things from that data. The part of the pipeline that is shared is only evaluated once making it easy to build efficient data processing code. To illustrate this we will also apply the
rolling_outlier_std operation which computes outliers within the
rolling_window and again we will supply the widget
outliers = dmap.apply(rolling_outlier_std, rolling_window=slider.param.value) rolled_dmap * outliers.opts(color='red', marker='triangle')
We can chain operations like this indefinitely and attach parameters or explicit streams to each stage. By chaining we can watch our visualization update whenever we change a stream value anywhere in the pipeline and HoloViews will be smart about which parts of the pipeline are recomputed, which allows us to build complex visualizations very quickly.
.apply method is also not limited to operations. We can just as easily apply a simple Python function to each object in the
DynamicMap. Here we define a function to compute the residual between the original
dmap and the
def residual_fn(overlay): # Get first and second Element in overlay el1, el2 = overlay.get(0), overlay.get(1) # Get x-values and y-values of curves xvals = el1.dimension_values(0) yvals = el1.dimension_values(1) yvals2 = el2.dimension_values(1) # Return new Element with subtracted y-values # and new label return el1.clone((xvals, yvals-yvals2), vdims='Residual')
If we overlay the two DynamicMaps we can then dynamically broadcast this function to each of the overlays, producing a new DynamicMap which responds to both the symbol selector widget and the slider:
residual = (dmap * rolled_dmap).apply(residual_fn) residual
In later guides we will see how we can combine HoloViews plots and Panel widgets into custom layouts allowing us to define complex dashboards. For more information on how to deploy bokeh apps from HoloViews and build dashboards see the Deploying Bokeh Apps and Dashboards guides. To get a quick idea of what this might look like let’s compose all the components we have no built: