holoviews.core.data.dask module#
- class holoviews.core.data.dask.DaskInterface(*, name)[source]#
Bases:
PandasInterfaceThe DaskInterface allows a Dataset objects to wrap a dask DataFrame object. Using dask allows loading data lazily and performing out-of-core operations on the data, making it possible to work on datasets larger than memory.
The DaskInterface covers almost the complete API exposed by the PandasInterface with two notable exceptions:
Sorting is not supported and any attempt at sorting will be ignored with a warning.
Dask does not easily support adding a new column to an existing dataframe unless it is a scalar, add_dimension will therefore error when supplied a non-scalar value.
Not all functions can be easily applied to a dask dataframe so some functions applied with aggregate and reduce will not work.
Methods
add_dimension(dataset, dimension, dim_pos, ...)Returns a copy of the data with the dimension values added.
applies(obj)Indicates whether the interface is designed specifically to handle the supplied object's type.
compute(dataset)Converts a lazy Dataset to a non-lazy, in-memory format.
dframe(dataset, dimensions)Returns the data as a pandas.DataFrame containing the selected dimensions.
iloc(dataset, index)Dask does not support iloc, therefore iloc will execute the call graph and lose the laziness of the operation.
loaded()Indicates whether the required dependencies are loaded.
nonzero(dataset)Returns a boolean indicating whether the Dataset contains any data.
persist(dataset)Persists the data backing the Dataset in memory.
range(dataset, dimension)Computes the minimum and maximum value along a dimension.
select_mask(dataset, selection)Given a Dataset object and a dictionary with dimension keys and selection keys (i.e. tuple ranges, slices, sets, lists.
shape(dataset)Returns the shape of the data.
unpack_scalar(dataset, data)Given a dataset object and data in the appropriate format for the interface, return a simple scalar.
values(dataset, dim[, expanded, flat, ...])Returns the values along a dimension of the dataset.
aggregate
concat_fn
groupby
init
sample
select
sort
Parameter Definitions
- classmethod add_dimension(dataset, dimension, dim_pos, values, vdim)[source]#
Returns a copy of the data with the dimension values added.
- Parameters:
- dataset
Dataset The Dataset to add the dimension to
- dimension
Dimension The dimension to add
- dim_pos
int The position in the data to add it to
- valuesarray_like
The array of values to add
- vdimbool
Whether the data is a value dimension
- dataset
- Returns:
dataA copy of the data with the new dimension
- classmethod applies(obj)[source]#
Indicates whether the interface is designed specifically to handle the supplied object’s type. By default simply checks if the object is one of the types declared on the class, however if the type is expensive to import at load time the method may be overridden.
- classmethod compute(dataset)[source]#
Converts a lazy Dataset to a non-lazy, in-memory format.
- Parameters:
- dataset
Dataset The dataset to compute
- dataset
- Returns:
DatasetDataset with non-lazy data
Notes
This is a no-op if the data is already non-lazy.
- classmethod dframe(dataset, dimensions)[source]#
Returns the data as a pandas.DataFrame containing the selected dimensions.
- classmethod iloc(dataset, index)[source]#
Dask does not support iloc, therefore iloc will execute the call graph and lose the laziness of the operation.
- classmethod nonzero(dataset)[source]#
Returns a boolean indicating whether the Dataset contains any data.
- Parameters:
- dataset
Dataset The dataset to check
- dataset
- Returns:
- bool
Whether the dataset is not empty
- classmethod persist(dataset)[source]#
Persists the data backing the Dataset in memory.
- Parameters:
- dataset
Dataset The dataset to persist
- dataset
- Returns:
DatasetDataset with the data persisted to memory
Notes
This is a no-op if the data is already in memory.
- classmethod range(dataset, dimension)[source]#
Computes the minimum and maximum value along a dimension.
- Parameters:
- dataset
Dataset The dataset to query
- dimension
strorDimension Dimension to compute the range on
- dataset
- Returns:
tuple[Any,Any]Tuple of (min, max) values
Notes
In the past categorical and string columns were handled by sorting the values and taking the first and last value. This behavior is deprecated and will be removed in 2.0. In future the range for these columns will be returned as (None, None).
- classmethod select_mask(dataset, selection)[source]#
Given a Dataset object and a dictionary with dimension keys and selection keys (i.e. tuple ranges, slices, sets, lists. or literals) return a boolean mask over the rows in the Dataset object that have been selected.
- classmethod unpack_scalar(dataset, data)[source]#
Given a dataset object and data in the appropriate format for the interface, return a simple scalar.
- classmethod values(dataset, dim, expanded=True, flat=True, compute=True, keep_index=False)[source]#
Returns the values along a dimension of the dataset.
- Parameters:
- dataset
Dataset The dataset to query
- dimension
strorDimension Dimension to return the values for
- expandedbool,
defaultTrue When false returns unique values along the dimension
- flatbool,
defaultTrue Whether to flatten the array
- computebool,
defaultTrue Whether to load lazy data into memory as a NumPy array
- keep_indexbool,
defaultFalse Whether to return the data with an index (if present)
- dataset
- Returns:
- array_like
Dimension values in the requested format
Notes
The expanded keyword has different behavior for gridded interfaces where it determines whether 1D coordinates are expanded into a multi-dimensional array.