Sampling Data

As explained in the Composing Data and Containers tutorials, HoloViews allows you to build up hierarchical containers that express the natural relationships between your data items, in whatever multidimensional space best characterizes your application domain. Once your data is in such containers, individual visualizations are then made by choosing subregions of this multidimensional space, either smaller numeric ranges (as in cropping of photographic images), or lower-dimensional subsets (as in selecting frames from a movie, or a specific movie from a large library), or both (as in selecting a cropped version of a frame from a specific movie from a large library).

In this tutorial, we show how to specify such selections, using four different (but related) operations that can act on an element e :

Operation Example syntax Description
indexing e[5.5], e[3,5.5] Selecting a single data value, returning one actual numerical value from the existing data
slice e[3:5.5], e[3:5.5,0:1] Selecting a contiguous portion from an Element, returning the same type of Element
sample e.sample(y=5.5),
e.sample((3,3))
Selecting one or more regularly spaced data values, returning a new type of Element
select e.select(y=5.5),
e.select(y=(3,5.5))
More verbose notation covering all supporting slice and index operations by dimension name.

These operations are all concerned with selecting some subset of your data values, without combining across data values (e.g. averaging) or otherwise transforming your actual data. In the Columnar Data tutorial we will look at other operations on the data that reduce, summarize, or transform the data in other ways, rather than selections as covered here.

We'll be going through each operation in detail and provide a visual illustration to help make the semantics of each operation clear. This Tutorial assumes that you are familiar with continuous and discrete coordinate systems, so please review our Continuous Coordinates Tutorial if you have not done so already.

Indexing and slicing Elements

In the Exploring Data Tutorial we saw examples of how to select individual elements embedded in a multi-dimensional space. We also briefly introduced "deep slicing" of the RGB elements to select a subregion of the images. The Continuous Coordinates Tutorial covered slicing and indexing in Elements representing continuous coordinate coordinate systems such as Image types. Here we'll be going through each operation in full detail, providing a visual illustration to help make the semantics of each operation clear.

How the Element may be indexed depends on the key dimensions (or kdims ) of the Element. It is thus important to consider the nature and dimensionality of your data when choosing the Element type for it.

In [1]:
import numpy as np
import holoviews as hv
hv.notebook_extension()
%opts Layout [fig_size=125] Points [size_index=None] (s=50) Scatter3D [size_index=None]
%opts Bounds (linewidth=2 color='k') {+axiswise} Text (fontsize=16 color='k') Image (cmap='Reds')

1D Elements: Slicing and indexing

Certain Chart elements support both single-dimensional indexing and slicing: Scatter , Curve , Histogram , and ErrorBars . Here we'll look at how we can easily slice a Histogram to select a subregion of it:

In [2]:
np.random.seed(42)
edges, data = np.histogram(np.random.randn(100))
hist = hv.Histogram(edges, data)
subregion = hist[0:1]
hist * subregion
Out[2]:

The two bins in a different color show the selected region, overlaid on top of the full histogram. We can also access the value for a specific bin in the Histogram . A continuous-valued index that falls inside a particular bin will return the corresponding value or frequency.

In [3]:
hist[0.25], hist[0.5], hist[0.55]
Out[3]:
(21, 21, 11)

We can slice a Curve the same way:

In [4]:
xs = np.linspace(0, np.pi*2, 21)
curve = hv.Curve((xs, np.sin(xs)))
subregion = curve[np.pi/2:np.pi*1.5]
curve * subregion * hv.Scatter(curve)
Out[4]:

Here again the region in a different color is the specified subregion, and we've also marked each discrete point with a dot using the Scatter Element . As before we can also get the value for a specific sample point; whatever x-index is provided will snap to the closest sample point and return the dependent value:

In [5]:
curve[4.05], curve[4.1], curve[4.17], curve[4.3]
Out[5]:
(-0.80901699437494734,
 -0.80901699437494734,
 -0.80901699437494734,
 -0.95105651629515353)

It is important to note that an index (or a list of indices, as for the 2D and 3D cases below) will always return the raw indexed (dependent) value, i.e. a number. A slice (indicated with : ), on the other hand, will retain the Element type even in cases where the plot might not be useful, such as having only a single value, two values, or no value at all in that range:

In [6]:
curve[4:4.5]
Out[6]:

2D and 3D Elements: slicing

For data defined in a 2D space, there are 2D equivalents of the 1D Curve and Scatter types. A Points , for example, can be thought of as a number of points in a 2D space.

In [7]:
r = np.arange(0, 1, 0.005)
xs, ys = (r * fn(85*np.pi*r) for fn in (np.cos, np.sin))
paths = hv.Points((xs, ys))
paths + paths[0:1, 0:1]
Out[7]:

However, indexing is not supported in this space, because there could be many possible points near a given set of coordinates, and finding the nearest one would require a search across potentially incommensurable dimensions, which is poorly defined and difficult to support.

Slicing in 3D works much like slicing in 2D, but indexing is not supported for the same reason as in 2D:

In [8]:
xs = np.linspace(0, np.pi*8, 201)
scatter = hv.Scatter3D((xs, np.sin(xs), np.cos(xs)))
scatter + scatter[5:10, :, 0:]
Out[8]:

2D Raster and Image: slicing and indexing

Raster and the various other image-like objects (Images, RGB, HSV, etc.) can all sliced and indexed, as can Surface, because they all have an underlying regular grid of key dimension values:

In [9]:
%opts Image (cmap='Blues') Bounds (color='red')
In [10]:
np.random.seed(0)
extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_slice = img[1:9,4:5]
box = hv.Bounds((1,4,9,5))
img*box + img_slice
Out[10]:
In [11]:
img[4.2,4.2], img[4.3,4.2], img[5.0,4.2]
Out[11]:
(0.20887675609483469, 0.20887675609483469, 0.16130951788499626)

Sampling

Sampling is essentially a process of indexing an Element at multiple index locations, and collecting the results. Thus any Element that can be indexed can also be sampled. Compared to regular indexing, sampling is different in that multiple indices may be supplied at the same time. Also, indexing will only return the value at that location, whereas the return type from a sampling operation is another Element type, usually either a Table or a Curve , to allow both key and value dimensions to be returned.

Sampling Elements

Sampling can use either an explicit list of samples, or or by passing the samples for each dimension keyword arguments.

We'll start by taking a single sample of an Image object, to make it clear how sampling and indexing are similar operations yet different in their results:

In [12]:
img_coords = hv.Points(img.table(), extents=extents)
labeled_img = img * img_coords * hv.Points([img.closest([(5.1,4.9)])])(style=dict(color='r'))
img + labeled_img + img.sample([(5.1,4.9)])
Out[12]:
In [13]:
img[5.1,4.9]
Out[13]:
0.16130951788499626

Here, the output of the indexing operation is the value (0.1965823616800535) from the location closest to the specified , whereas .sample() returns a Table that lists both the coordinates and the value, and slicing (in previous section) returns an Element of the same type, not a Table.

Next we can try sampling along only one Dimension on our 2D Image, leaving us with a 1D Element (in this case a Curve ):

In [14]:
sampled = img.sample(y=5)
labeled_img = img * img_coords * hv.Points(zip(sampled['x'], [img.closest(y=5)]*10))
img + labeled_img + sampled
Out[14]:

Sampling works on any regularly sampled Element type. For example, we can select multiple samples along the x-axis of a Curve.

In [15]:
xs = np.arange(10)
samples = [2, 4, 6, 8]
curve = hv.Curve(zip(xs, np.sin(xs)))
curve_samples = hv.Scatter(zip(xs, [0] * 10)) * hv.Scatter(zip(samples, [0]*len(samples))) 
curve + curve_samples + curve.sample(samples)
Out[15]:

Sampling HoloMaps

Sampling is often useful when you have more data than you wish to visualize or analyze at one time. First, let's create a HoloMap containing a number of observations of some noisy data.

In [16]:
obs_hmap = hv.HoloMap({i: hv.Image(np.random.randn(10, 10), bounds=extents)
                       for i in range(3)}, key_dimensions=['Observation'])

HoloMaps also provide additional functionality to perform regular sampling on your data. In this case we'll take 3x3 subsamples of each of the Images.

In [17]:
sample_style = dict(edgecolors='k', alpha=1)
all_samples = obs_hmap.table().to.scatter3d()(style=dict(alpha=0.15))
sampled = obs_hmap.sample((3,3))
subsamples = sampled.to.scatter3d()(style=sample_style)
all_samples * subsamples + sampled
Out[17]:

By supplying bounds in as a (left, bottom, right, top) tuple we can also sample a subregion of our images:

In [18]:
sampled = obs_hmap.sample((3,3), bounds=(2,5,5,10))
subsamples = sampled.to.scatter3d()(style=sample_style)
all_samples * subsamples + sampled
Out[18]:

Since this kind of sampling is only well supported for continuous coordinate systems, we can only apply this kind of sampling to Image types for now.

Sampling Charts

Sampling Chart-type Elements like Curve, Scatter, Histogram is only supported by providing an explicit list of samples, since those Elements have no underlying regular grid.

In [19]:
xs = np.arange(10)
extents = (0, 0, 2, 10)
curve = hv.HoloMap({(i) : hv.Curve(zip(xs, np.sin(xs)*i))
                    for i in np.linspace(0.5, 1.5, 3)},
                   key_dimensions=['Observation'])
all_samples = curve.table().to.points()
sampled = curve.sample([0, 2, 4, 6, 8])
sampling = all_samples * sampled.to.points(extents=extents)(style=dict(color='r'))
sampling + sampled
Out[19]:

Alternatively, you can always deconstruct your data into a Table (see the Columnar Data tutorial) and perform select operations instead. This is also the easiest way to sample NdElement types like Bars. Individual samples should be supplied as a set, while ranges can be specified as a two-tuple.

In [20]:
sampled = curve.table().select(Observation=(0, 1.1), x={0, 2, 4, 6, 8})
sampling = all_samples * sampled.to.points(extents=extents)(style=dict(color='r'))
sampling + sampled
Out[20]:

These tools should help you index, slice, sample, and select your data with ease. The Columnar Data tutorial) explains how to do other types of operations, such as averaging and other reduction operations.


Download this notebook from GitHub (right-click to download).