Transforming Elements#

import param
import numpy as np
import holoviews as hv
from holoviews import opts

hv.extension('bokeh', 'matplotlib')

HoloViews objects provide a convenient way of wrapping your data along with some metadata for exploration and visualization. For the simplest visualizations, you can simply declare a small collection of elements which can then be composed or placed in an appropriate container. As soon as the task becomes more complex, it is natural to write functions that output HoloViews objects.

In this user guide, we will introduce to related concepts to express transforms of some data, first we will cover dim transforms to express simple transforms of some data and then Operation classes to express more complex transformations. Operations provide a consistent structure for such code, making it possible to write general functions that can process HoloViews objects. This enables powerful new ways of specifying HoloViews objects computed from existing data, allowing the construction of flexible data processing pipelines. Examples of such operations are histogram, rolling, datashade or decimate, which apply some computation on certain types of Element and return a new Element with the transformed data.

In this user guide we will discover how transforms and operations work, how to control their parameters and how to chain them. The Data Processing Pipelines guide extends what we will have learned to demonstrate how operations can be applied lazily by using the dynamic flag, letting us define deferred processing pipelines that can drive highly complex visualizations and dashboards.

Transforms#

A transform is expressed using a dim expression, which we originally introduced in the context of the Style Mapping user guide. It allows expressing some deferred computation on a HoloViews Element. This can be a powerful way to transform some data quickly and easily. Let us start by declaring a Dataset with a single dimension x:

ds = hv.Dataset(np.linspace(0, np.pi), 'x')
ds
:Dataset   [x]

The Dataset x values consist of an array of monotonically increasing values from 0 to np.pi. We can now define a transform which takes these values and transform them:

expr = np.sin(hv.dim('x')*10+5)
expr
np.sin((dim('x')*10)+5)

This expression takes these values multiplies them by 10, adds 5 and then calculates the sine. Using the .transform method we can now apply this expression to the Dataset and assign the values to a newly created y dimension by supplying it as a keyword (in the same way we could override the x dimension):

transformed = ds.transform(y=expr)
transformed
:Dataset   [x]   (y)

We can see the result of this by casting it to a Curve:

hv.Curve(transformed)

This allows almost any mathematical transformation to be expressed and applied on a Dataset in a deferred way. The regular dim expression supports all the standard mathematical operators and NumPy array methods. However if we want to use methods which exist only on specific datatypes we can invoke them using .df or .xr, which let you make (pandas) dataframe and xarray API (method and accessor) calls respectively. Let us for example load an XArray Dataset, which has a number of custom methods to perform complex computations on the data, e.g. the quantile method:

import xarray as xr

air_temp = xr.tutorial.load_dataset('air_temperature')
print(air_temp.quantile.__doc__)
Compute the qth quantile of the data along the specified dimension.

        Returns the qth quantiles(s) of the array elements for each variable
        in the Dataset.

        Parameters
        ----------
        q : float or array-like of float
            Quantile to compute, which must be between 0 and 1 inclusive.
        dim : str or Iterable of Hashable, optional
            Dimension(s) over which to apply quantile.
        method : str, default: "linear"
            This optional parameter specifies the interpolation method to use when the
            desired quantile lies between two data points. The options sorted by their R
            type as summarized in the H&F paper [1]_ are:

                1. "inverted_cdf"
                2. "averaged_inverted_cdf"
                3. "closest_observation"
                4. "interpolated_inverted_cdf"
                5. "hazen"
                6. "weibull"
                7. "linear"  (default)
                8. "median_unbiased"
                9. "normal_unbiased"

            The first three methods are discontiuous.  The following discontinuous
            variations of the default "linear" (7.) option are also available:

                * "lower"
                * "higher"
                * "midpoint"
                * "nearest"

            See :py:func:`numpy.quantile` or [1]_ for details. The "method" argument
            was previously called "interpolation", renamed in accordance with numpy
            version 1.22.0.

        keep_attrs : bool, optional
            If True, the dataset's attributes (`attrs`) will be copied from
            the original object to the new one.  If False (default), the new
            object will be returned without attributes.
        numeric_only : bool, optional
            If True, only apply ``func`` to variables with a numeric dtype.
        skipna : bool, optional
            If True, skip missing values (as marked by NaN). By default, only
            skips missing values for float dtypes; other dtypes either do not
            have a sentinel missing value (int) or skipna=True has not been
            implemented (object, datetime64 or timedelta64).

        Returns
        -------
        quantiles : Dataset
            If `q` is a single quantile, then the result is a scalar for each
            variable in data_vars. If multiple percentiles are given, first
            axis of the result corresponds to the quantile and a quantile
            dimension is added to the return Dataset. The other dimensions are
            the dimensions that remain after the reduction of the array.

        See Also
        --------
        numpy.nanquantile, numpy.quantile, pandas.Series.quantile, DataArray.quantile

        Examples
        --------
        >>> ds = xr.Dataset(
        ...     {"a": (("x", "y"), [[0.7, 4.2, 9.4, 1.5], [6.5, 7.3, 2.6, 1.9]])},
        ...     coords={"x": [7, 9], "y": [1, 1.5, 2, 2.5]},
        ... )
        >>> ds.quantile(0)  # or ds.quantile(0, dim=...)
        <xarray.Dataset> Size: 16B
        Dimensions:   ()
        Coordinates:
            quantile  float64 8B 0.0
        Data variables:
            a         float64 8B 0.7
        >>> ds.quantile(0, dim="x")
        <xarray.Dataset> Size: 72B
        Dimensions:   (y: 4)
        Coordinates:
          * y         (y) float64 32B 1.0 1.5 2.0 2.5
            quantile  float64 8B 0.0
        Data variables:
            a         (y) float64 32B 0.7 4.2 2.6 1.5
        >>> ds.quantile([0, 0.5, 1])
        <xarray.Dataset> Size: 48B
        Dimensions:   (quantile: 3)
        Coordinates:
          * quantile  (quantile) float64 24B 0.0 0.5 1.0
        Data variables:
            a         (quantile) float64 24B 0.7 3.4 9.4
        >>> ds.quantile([0, 0.5, 1], dim="x")
        <xarray.Dataset> Size: 152B
        Dimensions:   (quantile: 3, y: 4)
        Coordinates:
          * y         (y) float64 32B 1.0 1.5 2.0 2.5
          * quantile  (quantile) float64 24B 0.0 0.5 1.0
        Data variables:
            a         (quantile, y) float64 96B 0.7 4.2 2.6 1.5 3.6 ... 6.5 7.3 9.4 1.9

        References
        ----------
        .. [1] R. J. Hyndman and Y. Fan,
           "Sample quantiles in statistical packages,"
           The American Statistician, 50(4), pp. 361-365, 1996
        

We can construct an expression to apply this method on the data and compute the 95th percentile of air temperatures along the ‘time’ dimension:

quantile_expr = hv.dim('air').xr.quantile(0.95, dim='time')
quantile_expr
dim('air').xr.quantile(0.95, dim='time')

Now we can apply this to the Dataset using the transform method, in the resulting dataset we can see that the time dimension has been dropped:

temp_ds = hv.Dataset(air_temp, ['lon', 'lat', 'time'])

transformed_ds = temp_ds.transform(air=quantile_expr)

transformed_ds
:Dataset   [lon,lat]   (4xDaily Air temperature at sigma level 995)

To visualize this data we will cast it to an Image:

hv.Image(transformed_ds)

The real power of dim transforms comes in when combining them with parameters. We will look at this in more detail later as part of the Pipeline user guide but let’s quickly see what this looks like. We will create a Panel slider to control the q value in the call to the quantile method:

import panel as pn

q = pn.widgets.FloatSlider(name='quantile')

quantile_expr = hv.dim('air').xr.quantile(q, dim='time')
quantile_expr
dim('air').xr.quantile(FloatSlider(name='quantile'), dim='time')

Now that we have expressed this dynamic dim transform let us apply it using .apply.transform:

temp_ds = hv.Dataset(air_temp, ['lon', 'lat'])
transformed = temp_ds.apply.transform(air=quantile_expr).apply(hv.Image)

pn.Column(q, transformed.opts(colorbar=True, width=400))

dim expressions provide a very powerful way to apply transforms on your data either statically or controlled by some external parameter, e.g. one driven by a Panel widget.

Operations are parameterized#

In cases a simple transform is not sufficient or you want to encapsulate some transformation in a more rigorous way an Operation allows encapsulating the parameters of a transform on a function-like object. Operations in HoloViews are subclasses of Operation, which transform one Element or Overlay of Elements by returning a new Element that may be a transformation of the original. All operations are parameterized using the param library which allows easy validation and documentation of the operation arguments. In particular, operations are instances of param.ParameterizedFunction which allows operations to be used in the same way as normal python functions.

This approach has several advantages, one of which is that we can manipulate the parameters of operations at several different levels: at the class-level, at the instance-level or when it is called. Another advantage is that using parameterizing operations allows them to be inspected just like any other HoloViews object using hv.help. We will now do this for the histogram operation:

from holoviews.operation import histogram
hv.help(histogram)
Parameters of 'histogram'
=========================

Parameters changed from their default values are marked in red.
Soft bound values are marked in cyan.
C/V= Constant/Variable, RO/RW = ReadOnly/ReadWrite, AN=Allow None

Name                  Value          Type        Mode  

group              'Operation'      String       V RW  
dynamic             'default'      Selector      V RW  
input_ranges            {}      ClassSelector  V RW AN 
link_inputs           False        Boolean       V RW  
streams                 []      ClassSelector    V RW  
bin_range              None      NumericTuple  V RW AN 
bins                   None     ClassSelector  V RW AN 
cumulative            False        Boolean       V RW  
dimension              None         String     V RW AN 
frequency_label        None         String     V RW AN 
groupby                None     ClassSelector  V RW AN 
groupby_range        'shared'      Selector      V RW  
log                   False        Boolean       V RW  
mean_weighted         False        Boolean       V RW  
normed                False        Selector      V RW  
nonzero               False        Boolean       V RW  
num_bins                20         Integer       V RW  
weight_dimension       None         String     V RW AN 
style_prefix           None         String     V RW AN 

Parameter docstrings:
=====================

group:            The group string used to identify the output of the
                  Operation. By default this should match the operation name.
dynamic:          Whether the operation should be applied dynamically when a
                  specific frame is requested, specified as a Boolean. If set to
                  'default' the mode will be determined based on the input type,
                  i.e. if the data is a DynamicMap it will stay dynamic.
input_ranges:     Ranges to be used for input normalization (if applicable) in a
                  format appropriate for the Normalization.ranges parameter.
                  
                  By default, no normalization is applied. If key-wise
                  normalization is required, a 2-tuple may be supplied where the
                  first component is a Normalization.ranges list and the second
                  component is Normalization.keys. 
link_inputs:      If the operation is dynamic, whether or not linked streams
                  should be transferred from the operation inputs for backends
                  that support linked streams.
                  
                  For example if an operation is applied to a DynamicMap with an
                  RangeXY, this switch determines whether the corresponding
                  visualization should update this stream with range changes
                  originating from the newly generated axes.
streams:          List of streams that are applied if dynamic=True, allowing
                  for dynamic interaction with the plot.
bin_range:        Specifies the range within which to compute the bins.
bins:             An explicit set of bin edges or a method to find the optimal
                  set of bin edges, e.g. 'auto', 'fd', 'scott' etc. For more
                  documentation on these approaches see the np.histogram_bin_edges
                  documentation.
cumulative:       Whether to compute the cumulative histogram
dimension:        Along which dimension of the Element to compute the histogram.
frequency_label:  Format string defining the label of the frequency dimension of the Histogram.
groupby:          Defines a dimension to group the Histogram returning an NdOverlay of Histograms.
groupby_range:    Whether to group the histograms along the same range or separate them.
log:              Whether to use base 10 logarithmic samples for the bin edges.
mean_weighted:    Whether the weighted frequencies are averaged.
normed:           Controls normalization behavior.  If `True` or `'integral'`, then
                  `density=True` is passed to np.histogram, and the distribution
                  is normalized such that the integral is unity.  If `False`,
                  then the frequencies will be raw counts. If `'height'`, then the
                  frequencies are normalized such that the max bin height is unity.
nonzero:          Whether to use only nonzero values when computing the histogram
num_bins:         Number of bins in the histogram .
weight_dimension: Name of the dimension the weighting should be drawn from
style_prefix:     Used for setting a common style for histograms in a HoloMap or AdjointLayout.

Applying operations#

Above we can see a listing of all the parameters of the operation, with the defaults, the expected types and detailed docstrings for each one. The histogram operation can be applied to any Element and will by default generate a histogram for the first value dimension defined on the object it is applied to. As a simple example we can create an BoxWhisker Element containing samples from a normal distribution, and then apply the histogram operation to those samples in two ways: 1) by creating an instance on which we will change the num_bins and 2) by passing bin_range directly when calling the operation:

boxw = hv.BoxWhisker(np.random.randn(10000))
histop_instance = histogram.instance(num_bins=50)

boxw + histop_instance(boxw).relabel('num_bins=50') + histogram(boxw, bin_range=(0, 3)).relabel('bin_range=(0, 3)')

We can see that these two ways of using operations gives us convenient control over how the parameters are applied. An instance allows us to persist some defaults which will be used in all subsequent calls, while passing keyword arguments to the operations applies the parameters for just that particular call.

The third way to manipulate parameters is to set them at the class level. If we always want to use num_bins=30 instead of the default of num_bins=20 shown in the help output above, we can simply set histogram.num_bins=30.

Operations on containers#

Operations in HoloViews are applied to individual elements, which means that when you apply an operation to a container object (such as NdLayout, GridSpace and HoloMap) the operation is applied once per element. For an operation to work, all the elements must be of the same type which means the operation effectively acts to map the operation over all the contained elements. As a simple example we can define a HoloMap of BoxWhisker Elements by varying the width of the distribution via the Sigma value and then apply the histogram operation to it:

holomap = hv.HoloMap({(i*0.1+0.1): hv.BoxWhisker(np.random.randn(10000)*(i*0.1+0.1)) for i in range(5)},
                     kdims='Sigma')
holomap + histogram(holomap)

As you can see the operation has generated a Histogram for each value of Sigma in the HoloMap. In this way we can apply the operation to the entire parameter space defined by a HoloMap, GridSpace, and NdLayout.

Combining operations#

Since operations take a HoloViews object as input and return another HoloViews object we can very easily chain and combine multiple operations to perform complex analyses quickly and easily, while instantly visualizing the output.

In this example we’ll work with operations on timeseries. We first define a small function to generate a random, noisy timeseries:

from holoviews.operation import timeseries

def time_series(T = 1, N = 100, mu = 0.1, sigma = 0.1, S0 = 20):  
    """Parameterized noisy time series"""
    dt = float(T)/N
    t = np.linspace(0, T, N)
    W = np.random.standard_normal(size = N) 
    W = np.cumsum(W)*np.sqrt(dt)       # standard brownian motion
    X = (mu-0.5*sigma**2)*t + sigma*W 
    S = S0*np.exp(X)                   # geometric brownian motion
    return S

curve = hv.Curve(time_series(N=1000)).opts(width=600)

Now we will start applying some operations to this data. HoloViews ships with two ready-to-use timeseries operations: the rolling operation, which applies a function over a rolling window, and a rolling_outlier_std operation that computes outlier points in a timeseries by excluding points less than sigma standard deviation removed from the rolling mean:

smoothed = curve * timeseries.rolling(curve) * timeseries.rolling_outlier_std(curve)
smoothed.opts(opts.Scatter(color='black'))

In the next section we will define a custom operation that will compose with the smoothed operation output above to form a short operation pipeline.

Defining custom operations#

We can now define our own custom Operation which as you may recall can process either elements and overlays. This means we can define a simple operation that takes our smoothed overlay and computes the difference between the raw and smoothed curves that it contains. Such a subtraction will give us the residual between the smoothed and unsmoothed Curve elements, removing long-term trends and leaving the short-term variation.

Defining an operation is very simple. An Operation subclass should define a _process method, which simply accepts an element argument. Optionally we can also define parameters on the operation, which we can access using the self.p attribute on the operation. In this case we define a String parameter, which specifies the name of the subtracted value dimension on the returned Element.

from holoviews.operation import Operation

class residual(Operation):
    """
    Subtracts two curves from one another.
    """
    
    label = param.String(default='Residual', doc="""
        Defines the label of the returned Element.""")
    
    def _process(self, element, key=None):
        # Get first and second Element in overlay
        el1, el2 = element.get(0), element.get(1)
        
        # Get x-values and y-values of curves
        xvals  = el1.dimension_values(0)
        yvals  = el1.dimension_values(1)
        yvals2 = el2.dimension_values(1)
        
        # Return new Element with subtracted y-values
        # and new label
        return el1.clone((xvals, yvals-yvals2),
                         vdims=self.p.label)

Having defined the residual operation let’s try it out right away by applying it to our original and smoothed Curve. We’ll place the two objects on top of each other so they can share an x-axis and we can compare them directly:

(smoothed + residual(smoothed).opts(xaxis=None)).cols(1)

In this view we can immediately see that only a very small residual is left when applying this level of smoothing. However we have only tried one particular rolling_window value, the default value of 10. To assess how this parameter affects the residual we can evaluate the operation over a number different parameter settings, as we will see in the next section.

Evaluating operation parameters#

When applying an operation there are often parameters to vary. Using traditional plotting approaches it’s often difficult to evaluate them interactively to get a detailed understanding of what they do. Here we will apply the rolling operations with varying rolling_window widths and window_types across a HoloMap:

rolled = hv.HoloMap({(w, str(wt)): timeseries.rolling(curve, rolling_window=w, window_type=wt)
                     for w in [10, 25, 50, 100, 200] for wt in [None, 'hamming', 'triang']},
                    kdims=['Window', 'Window Type'])
rolled

This visualization is already useful since we can compare the effect of various parameter values by moving the slider and trying different window options. However since we can also chain operations we can easily compute the residual and view the two together.

To do this we simply overlay the HoloMap of smoothed curves on top of the original curve and pass it to our new residual function. Then we can combine the smoothed view with the original and see how the smoothing and residual curves vary across parameter values:

(curve * rolled + residual(curve * rolled)).cols(1)

Using a few additional lines we have now evaluated the operation over a number of different parameters values, allowing us to process the data with different smoothing parameters. In addition, by interacting with this visualization we can gain a better understanding of the operation parameters as well as gain insights into the structure of the underlying data.

Operations on 2D elements#

Let’s look at another example of operations in action, this time applying a simple filter to an Image. The basic idea is the same as above, although accessing the values to be transformed is a bit more complicated. First, we prepare an example image:

hv.output(backend='matplotlib', size=200)

from scipy.misc import ascent

stairs_image = hv.Image(ascent()[200:500, :], bounds=[0, 0, ascent().shape[1], 300], label="stairs")
stairs_image

We’ll define a simple Operation, which takes an Image and applies a high-pass or low-pass filter. We then use this to build a HoloMap of images filtered with different sigma values:

from scipy import ndimage

class image_filter(hv.Operation):
    
    sigma = param.Number(default=5)
    
    type_ = param.String(default="low-pass")

    def _process(self, element, key=None):
        xs = element.dimension_values(0, expanded=False)
        ys = element.dimension_values(1, expanded=False)
        
        # setting flat=False will preserve the matrix shape
        data = element.dimension_values(2, flat=False)
        
        if self.p.type_ == "high-pass":
            new_data = data - ndimage.gaussian_filter(data, self.p.sigma)
        else:
            new_data = ndimage.gaussian_filter(data, self.p.sigma)
        
        label = element.label + " ({} filtered)".format(self.p.type_)
        # make an exact copy of the element with all settings, just with different data and label:
        element = element.clone((xs, ys, new_data), label=label)
        return element

stairs_map = hv.HoloMap({sigma: image_filter(stairs_image, sigma=sigma)
                         for sigma in range(0, 12, 1)}, kdims="sigma")

stairs_map.opts(framewise=True)