holoviews.core.data.dask module#

class holoviews.core.data.dask.DaskInterface(*, name)[source]#

Bases: PandasInterface

The DaskInterface allows a Dataset objects to wrap a dask DataFrame object. Using dask allows loading data lazily and performing out-of-core operations on the data, making it possible to work on datasets larger than memory.

The DaskInterface covers almost the complete API exposed by the PandasInterface with two notable exceptions:

  1. Sorting is not supported and any attempt at sorting will be ignored with a warning.

  2. Dask does not easily support adding a new column to an existing dataframe unless it is a scalar, add_dimension will therefore error when supplied a non-scalar value.

  3. Not all functions can be easily applied to a dask dataframe so some functions applied with aggregate and reduce will not work.

Methods

add_dimension(dataset, dimension, dim_pos, ...)

Returns a copy of the data with the dimension values added.

applies(obj)

Indicates whether the interface is designed specifically to handle the supplied object's type.

compute(dataset)

Converts a lazy Dataset to a non-lazy, in-memory format.

dframe(dataset, dimensions)

Returns the data as a pandas.DataFrame containing the selected dimensions.

iloc(dataset, index)

Dask does not support iloc, therefore iloc will execute the call graph and lose the laziness of the operation.

loaded()

Indicates whether the required dependencies are loaded.

nonzero(dataset)

Returns a boolean indicating whether the Dataset contains any data.

persist(dataset)

Persists the data backing the Dataset in memory.

range(dataset, dimension)

Computes the minimum and maximum value along a dimension.

select_mask(dataset, selection)

Given a Dataset object and a dictionary with dimension keys and selection keys (i.e. tuple ranges, slices, sets, lists.

shape(dataset)

Returns the shape of the data.

unpack_scalar(dataset, data)

Given a dataset object and data in the appropriate format for the interface, return a simple scalar.

values(dataset, dim[, expanded, flat, ...])

Returns the values along a dimension of the dataset.

aggregate

concat_fn

groupby

init

sample

select

sort

Parameter Definitions


classmethod add_dimension(dataset, dimension, dim_pos, values, vdim)[source]#

Returns a copy of the data with the dimension values added.

Parameters:
datasetDataset

The Dataset to add the dimension to

dimensionDimension

The dimension to add

dim_posint

The position in the data to add it to

valuesarray_like

The array of values to add

vdimbool

Whether the data is a value dimension

Returns:
data

A copy of the data with the new dimension

classmethod applies(obj)[source]#

Indicates whether the interface is designed specifically to handle the supplied object’s type. By default simply checks if the object is one of the types declared on the class, however if the type is expensive to import at load time the method may be overridden.

classmethod compute(dataset)[source]#

Converts a lazy Dataset to a non-lazy, in-memory format.

Parameters:
datasetDataset

The dataset to compute

Returns:
Dataset

Dataset with non-lazy data

Notes

This is a no-op if the data is already non-lazy.

classmethod dframe(dataset, dimensions)[source]#

Returns the data as a pandas.DataFrame containing the selected dimensions.

Parameters:
datasetDataset

The dataset to convert

dimensionslist[str]

List of dimensions to include

Returns:
DataFrame

A pandas DataFrame containing the selected dimensions

classmethod iloc(dataset, index)[source]#

Dask does not support iloc, therefore iloc will execute the call graph and lose the laziness of the operation.

classmethod loaded()[source]#

Indicates whether the required dependencies are loaded.

classmethod nonzero(dataset)[source]#

Returns a boolean indicating whether the Dataset contains any data.

Parameters:
datasetDataset

The dataset to check

Returns:
bool

Whether the dataset is not empty

classmethod persist(dataset)[source]#

Persists the data backing the Dataset in memory.

Parameters:
datasetDataset

The dataset to persist

Returns:
Dataset

Dataset with the data persisted to memory

Notes

This is a no-op if the data is already in memory.

classmethod range(dataset, dimension)[source]#

Computes the minimum and maximum value along a dimension.

Parameters:
datasetDataset

The dataset to query

dimensionstr or Dimension

Dimension to compute the range on

Returns:
tuple[Any, Any]

Tuple of (min, max) values

Notes

In the past categorical and string columns were handled by sorting the values and taking the first and last value. This behavior is deprecated and will be removed in 2.0. In future the range for these columns will be returned as (None, None).

classmethod select_mask(dataset, selection)[source]#

Given a Dataset object and a dictionary with dimension keys and selection keys (i.e. tuple ranges, slices, sets, lists. or literals) return a boolean mask over the rows in the Dataset object that have been selected.

classmethod shape(dataset)[source]#

Returns the shape of the data.

Parameters:
datasetDataset

The dataset to get the shape from

Returns:
tuple[int, int]

The shape of the data (rows, cols)

classmethod unpack_scalar(dataset, data)[source]#

Given a dataset object and data in the appropriate format for the interface, return a simple scalar.

classmethod values(dataset, dim, expanded=True, flat=True, compute=True, keep_index=False)[source]#

Returns the values along a dimension of the dataset.

Parameters:
datasetDataset

The dataset to query

dimensionstr or Dimension

Dimension to return the values for

expandedbool, default True

When false returns unique values along the dimension

flatbool, default True

Whether to flatten the array

computebool, default True

Whether to load lazy data into memory as a NumPy array

keep_indexbool, default False

Whether to return the data with an index (if present)

Returns:
array_like

Dimension values in the requested format

Notes

The expanded keyword has different behavior for gridded interfaces where it determines whether 1D coordinates are expanded into a multi-dimensional array.