Exploring Data

In the Introductory Tutorial and the Element and Container overviews you can see how HoloViews allows you to wrap your data into annotated Elements that can be composed easily into complex visualizations.

In this tutorial, we will see how all of the data you want to examine can be embedded as Elements into a nested, sparsely populated, multi-dimensional data structure that gives you maximum flexibility to slice, select, and combine your data for visualization and analysis. With HoloViews objects, you can visualize your multi-dimensional data as animations, images, charts, and parameter spaces with ease, allowing you to quickly discover the important features interactively and then prepare corresponding plots for reports, publications, or web pages.

We will first start with the very powerful HoloMap container, and then show how HoloMap objects can be nested inside the other Container objects to make all of your data available easily.

In [1]:
import numpy as np
import holoviews as hv
hv.notebook_extension()
%output holomap='auto'
%timer start
Timer start: 2018/05/10 18:02:53

To start, here are some general imports we will be using, mainly from the Python standard library:

In [2]:
import json

import matplotlib.dates as md


try:
    from urllib2 import urlopen
except:
    from urllib.request import urlopen
from io import BytesIO

HoloMap Basics

Python users will be familiar with dictionaries as a way to collect data together in a conveniently accessible manner. Unlike NumPy arrays, dictionaries are sparse and heterogeneous and do not have to be declared with a fixed size.

HoloMaps are a core part of HoloViews and are essential for generating animated visualizations. They also provide highly useful ways to manipulate your data for display and have several useful properties:

  • HoloMaps are ordered (internally they use OrderedDictionary, or if installed, the optimized cyordereddict).
  • HoloMaps let you index your data with an arbitrary number of dimensions (e.g. date and batch-number), not just one like a Python dictionary.
  • The dimensions used may be simple strings, or objects recording the name, type, and physical units of the dimension.
  • HoloMaps let you select portions of your data by slicing each available dimension independently.
  • HoloMaps also provide ways to transform your data by sampling, reducing and collapsing the data Elements.
  • Dimensions in a HoloMap may be mapped onto parameter spaces for easy visualization of a portion of your multidimensional data space.

Loading data

In this notebook we will be exploring weather data from Hurricane Sandy, which swept across the Caribbean and the Eastern US seaboard in late October 2012. We will scrape our data from various online sources, exploring not only how we can quickly generate animations using HoloMaps, but also how we can deal with very high-dimensional data.

We've already downloaded and cropped a number of frames of the satellite-imagery-based wind speed models from NASA and cached them on the HoloViews website. If you want to select a different cropping region or sample more frames you can find out how to get the raw data directly from NASA in this Wiki entry. For now, we'll just get the preprocessed data:

In [3]:
iobuffer = BytesIO(urlopen('http://assets.holoviews.org/hurricane.npz').read())
data = np.load(BytesIO(iobuffer.getvalue()))
dates = data['dates']
surface_data, nearsrfc_data = data['surface'], data['near_surface']

Constructing a HoloMap

Declaring Dimensions

Now that we have loaded the data we can store the raw image arrays as RGB Elements and create a HoloMap. We begin by declaring the key dimensions (kdims) of the HoloMap, which determine how the data will be stored and thus how you will be able to index and select it most easily. In this case we will index our HoloMap both by the frame number and the date:

In [4]:
date_dim = hv.Dimension("Date", value_format=md.DateFormatter('%b %d %Y %H:%M UTC'), type=float)
kdims = ['Frame', date_dim]

Dimensions can be specified as a simple string, or as a Dimension object with additional information to give HoloViews some hints about how to format and display values along that Dimension.

Populating the HoloMap

Creating a HoloMap is just like creating a Python dictionary, and so you can either pass a dictionary object or a list of (key,value) pairs. The keys can each be a single value for a one-dimensional HoloMap, or tuples for multiple Dimensions.

In [5]:
srfc = [((frame, date), hv.RGB(surface_data[...,frame], bounds=(0, 0)+surface_data.shape[0:2][::-1], xdensity=1,
                                label='Hurricane Sandy', group='Surface Wind Speed'))
        for frame, date in zip(range(len(dates)), dates)]

nsrfc = [((frame, date), hv.RGB(nearsrfc_data[...,frame], bounds=(0, 0)+nearsrfc_data.shape[0:2][::-1], xdensity=1,
                                label='Hurricane Sandy', group='Near Surface Wind Speed'))
        for frame, date in zip(range(len(dates)), dates)]

surface_wind = hv.HoloMap(srfc, kdims=kdims)
nearsurface_wind = hv.HoloMap(nsrfc, kdims=kdims)

Not only is the HoloMap constructor similar to Python dictionaries, HoloMaps also provide __getitem__, __setitem__, update, get, pop, keys, values and items just as for normal dictionaries. In addition, HoloMap provides a .clone method that will return a copy of the HoloMap containing the same data, where the data and all the parameters may now be overridden.

Basic usage and attributes on HoloMaps

A HoloMap must be uniform in the type, group, label, and key dimensions of its Elements, because it defines a parameter space of Elements varying only in their n-dimensional index and data. This also allows HoloMaps to inherit the value and label of its Elements, which we can see by inspecting the HoloMap repr() for satellite_map:

In [6]:
print(surface_wind)
:HoloMap   [Frame,Date]
   :RGB   [x,y]   (R,G,B)

Since the RGB elements we have created are not square we can declare that RGB Elements should be displayed with an aspect ratio of 1.0 using the %opts line magic, which will apply to all subsequent cells:

In [7]:
%opts RGB [aspect=1]

To get a quick glimpse at the data we have collected, you can access the .last property, which will return the last Element in the HoloMap:

In [8]:
surface_wind.last
Out[8]:

If you are unsure how large the HoloMap is or want to know a bit more about the Dimension ranges, you can use the .info property. For a HoloMap, .info will list the dimensions, their ranges for the key dimensions on the HoloMap, and even the deep_dimensions, i.e. any Dimensions contained within the Elements of the HoloMap.

In [9]:
surface_wind.info
HoloMap containing 14 items of type RGB
---------------------------------------

Key Dimensions: 
	 Frame: 0...13 
	 Date: Oct 25 2012 01:00 UTC...Nov 02 2012 16:00 UTC 
Deep Dimensions: 
	 x: 0...400 
	 y: 0...350 
	 R: 0...1 
	 G: 0...1 
	 B: 0...1 

Indexing and slicing HoloMaps

Having found out a bit about the HoloMap, we can look at a few frames, starting with selecting just the first three:

In [10]:
surface_wind[0:3]
Out[10]:

Because HoloMaps support all the slicing semantics including steps, we can do things like select every second frame in the second half of the animation:

In [11]:
surface_wind[7:14:2]
Out[11]:

As you may have noticed, the slices are not simply by whole-number index, as for a numpy array. A HoloMap, like all other Dimensioned objects (i.e., most HoloViews components), is always sliceable by the values along its key dimensions, in whatever units they are expressed.

Apart from simple slicing semantics, you can also select Elements by passing the Dimension values as a set. Since our Elements are guaranteed to be uniform, a HoloMap also allows deep indexing into the key dimensions of its Elements, allowing us to easily select a subregion of each satellite frame (where : alone means to select the entire range of that dimension)

In [12]:
surface_wind[{0, 2, 3, 5}, :, 150:350, 50:250]
Out[12]:

Finally let's put together everything we've learned about indexing and go one step further. So far we've been looking at just the surface wind speed plots, but now let's combine them into a Layout. Just like Elements, HoloMaps can be grouped into a Layout using the + operator. Since the Layout is a Tree-based data structure it doesn't have any Dimensions of its own and we can't use __getitem__. Instead we may use select, which can be found on all HoloViews components. The .select method may be supplied with any number of dimension and value slice pairs. Slices may be supplied either as explicit slice objects or as tuples.

In [13]:
(surface_wind + nearsurface_wind).select(Frame=slice(0, 10, 2), x=(150,350), y=(50, 250))
Out[13]:

Grouping HoloMaps

HoloMaps provide the starting point to display your data in any number of ways. While HoloMap dimensions are displayed as frames of an animation by default, you can easily transform a HoloMap into another n-D component type, such as an NdLayout, GridSpace, or NdOverlay, via the .layout, .grid, and .overlay methods.

Each of these methods groups the data along the values of the dimensions you specify and return the newly grouped object. These methods are each just convenience methods around the .groupby method, which can split a HoloMap into whatever container and group types you specify.

Before we can start grouping, however, we hit a snag in our indexing: the Frame and Date dimensions we specified above are redundant, because for each frame there is only one corresponding date. As a result, any groupby operation will fail. But we can easily solve this problem by reindexing the HoloMap:

In [14]:
print("Dimensions before reindex: %s" % surface_wind.dimensions('key', label=True))
surface_reindexed = surface_wind.reindex(['Date'])
print("Dimensions after reindex:  %s" % surface_reindexed.dimensions('key', label=True))
Dimensions before reindex: ['Frame', 'Date', 'x', 'y']
Dimensions after reindex:  ['Date', 'x', 'y']

Now that we have removed the redundant Frame Dimension we can create an NdLayout indexed just by the date:

In [15]:
surface_reindexed[::4].layout('Date')
Out[15]:
In [16]:
%output size=250 

For a more compact representation, you may also create a GridSpace using the .grid method. In a GridSpace, each dimension maps onto an axis, which limits it to a maximum of two Dimensions, but redundant data like the shared axes and axis labels are suppressed. To avoid the tick labels overlapping we will also define a rotation of the tick marks by a few degrees.

In [17]:
%opts GridSpace [xrotation=10]
surface_reindexed[::2].grid('Date')
Out[17]:

Adding Dimensions

Now how do we go about combining the two HoloMaps into a single GridSpace? First let us reindex the near-surface data as well.

In [18]:
nearsurface_reindexed = nearsurface_wind.reindex(['Date'])

The two HoloMaps we have represent wind speed at different heights. Meteorologists state the height of different air masses by their pressure. The near-surface imagery is at 850 hPa, while the surface level images are at 1000 hPa.

In [19]:
height = hv.Dimension('Layer Height', unit='hPa')

We can add this Dimension to the HoloMaps via the add_dimension method, which accepts the new dimension, the index position at which to insert that dimension and the dimension value as arguments:

In [20]:
surface = surface_reindexed.add_dimension(height, 1, 1000)
near_surface = nearsurface_reindexed.add_dimension(height, 1, 850)

Now we can combine the two HoloMaps by creating a clone and updating it with the other HoloMap:

In [21]:
combined_hurricane = surface.clone()
combined_hurricane.update(near_surface)

Using .info we can confirm the two HoloMaps have been successfully merged.

In [22]:
combined_hurricane.info
HoloMap containing 28 items of type RGB
---------------------------------------

Key Dimensions: 
	 Date: Oct 25 2012 01:00 UTC...Nov 02 2012 16:00 UTC 
	 Layer Height (hPa): 850...1000 
Constant Dimensions: 
	 Frame: None...None 
Deep Dimensions: 
	 x: 0...400 
	 y: 0...350 
	 R: 0...1 
	 G: 0...1 
	 B: 0...1 

Merging multiple HoloMaps in this step-by-step way would be cumbersome, and avoiding this complexity is why the Collator object (another instance of Dimensioned) has been provided. Collator will be described in the Columnar Data tutorial.

Now that both the Date and Layer Height are Dimensions on the HoloMap we have various options for laying out our data. We can simply map each Dimension to an axis of a GridSpace:

In [23]:
combined_hurricane.select(Date=(None, None, 2)).grid(['Date', 'Layer Height'])
Out[23]:

Or we can choose to animate one Dimension but not the other:

In [24]:
%output size=300
combined_hurricane.grid(['Date'])[::2]
Out[24]:

Handling missing data

Another powerful property of HoloMaps is that when combined into a Layout via the + operator, their Dimensions are coordinated across each frame. This allows you to handle missing values, because HoloViews will blank out any frames without matching dimension values when combining overlapping dimensions:

In [25]:
%output size=100
surface_wind[0:4] + nearsurface_wind[3:6]
Out[25]:

This feature becomes particularly important when combining data from different sources, which shares common dimensions but may not be sampled at precisely the same values. To demonstrate this, let's load some additional data.

Combining heterogenous data

Using the timestamps, we can look up weather data about different cities via the REST API provided by openweathermap.org as shown below. We'll actually use a cached copy of this data, from the HoloViews website, so that it loads more quickly and more reliably for the purposes of this tutorial.

First we define the new dimensions we will be adding:

In [26]:
temp_dim = hv.Dimension('Temperature', unit="$^o$C")
humidity = hv.Dimension('Humidity', unit='%')
pressure = hv.Dimension('Pressure', unit='hpa')
wind     = hv.Dimension('Wind Speed', unit='km/h')

Now we can load the data into a HoloMap of ItemTables. ItemTables simply associate a value with each of the value dimensions we defined above. We will collect data for a few cities on the East Coast at all the timestamps associated with the satellite wind imagery. As for the satellite imagery, we've prefetched this data, and to find out how to do that yourself just go here.

In [27]:
vdims = [temp_dim, humidity, pressure, wind]
cities = ['New York', 'Washington DC', 'Santiago de Cuba']
main_cols = ['temp', 'humidity', 'pressure']

tables = hv.HoloMap(kdims=['City', date_dim])
iobuffer = BytesIO(urlopen('http://assets.holoviews.org/weather.json').read())
weather_json = json.loads(iobuffer.read().decode())
for entry in weather_json:
    city, date = entry['key']
    tables[str(city), date] = hv.ItemTable(zip(vdims, tuple(entry['value'])))

Since the two datasets share the same timestamps we can now combine them into a Layout.

In [28]:
heterogenous = (surface_reindexed + tables).select(Date=(None,None,2))
heterogenous
Out[28]:

The Date dimension in the satellite data and the City and Date dimensions on the weather data combined seamlessly to give us this multi-dimensional selection widget. Since the satellite data is independent of the City it stays fixed when selecting a different city, while the Date, which is present on both, controls both components of the plot. You can play with the sliders a little bit and explore the data; once you've selected a slider you can also press R and P to play an animation back and forth respectively.

Now let's put together some of what we've learned. By making use of the slicing we can zoom in for each city on the satellite imagery and place the ItemTable containing the weather data next to it. Then we'll arrange the layout in three columns by calling the .cols method:

In [29]:
%%output size=120
(surface_reindexed[:, 170:270, 200:300] +\
nearsurface_reindexed[:, 170:270, 200:300] +\
tables.select(City='New York').reindex(['Date']).relabel(label='New York', depth=1) +\
surface_reindexed[:, 150:250, 150:250] +\
nearsurface_reindexed[:, 150:250, 150:250] +\
tables.select(City='Washington DC').reindex(['Date']).relabel(label='Washington DC', depth=1) +\
surface_reindexed[:, 140:240, 50:150] +\
nearsurface_reindexed[:, 140:240, 50:150] +\
tables.select(City='Santiago de Cuba').reindex(['Date']).relabel(label='Santiago de Cuba', depth=1)).cols(3)
Out[29]:

Now that you see how to assemble your data into an organization that lets you explore and analyze it, you can study the various Container types that make this possible, especially the section on nested containers. And then just try it out!

In [30]:
%timer
Timer elapsed: 00:00:41

Download this notebook from GitHub (right-click to download).