podpac.data.DataSource

class podpac.data.DataSource(**kwargs)[source]

Bases: podpac.core.node.Node

Base node for any data obtained directly from a single source.

Parameters
  • source (Any) – The location of the source. Depending on the child node this can be a filepath, numpy array, or dictionary as a few examples.

  • native_coordinates (podpac.Coordinates) – The coordinates of the data source.

  • interpolation (str, dict, optional) –

    {interpolation}

    If input is a string, it must match one of the interpolation shortcuts defined in podpac.data.INTERPOLATION_SHORTCUTS. The interpolation method associated with this string will be applied to all dimensions at the same time.

    If input is a dict or list of dict, the dict or dict elements must adhere to the following format:

    The key 'method' defining the interpolation method name. If the interpolation method is not one of podpac.data.INTERPOLATION_SHORTCUTS, a second key 'interpolators' must be defined with a list of podpac.interpolators.Interpolator classes to use in order of uages. The dictionary may contain an option 'params' key which contains a dict of parameters to pass along to the podpac.interpolators.Interpolator classes associated with the interpolation method.

    The dict may contain the key 'dims' which specifies dimension names (i.e. 'time' or ('lat', 'lon') ). If the dictionary does not contain a key for all unstacked dimensions of the source coordinates, the podpac.data.INTERPOLATION_DEFAULT value will be used. All dimension keys must be unstacked even if the underlying coordinate dimensions are stacked. Any extra dimensions included but not found in the source coordinates will be ignored.

    The dict may contain a key 'params' that can be used to configure the podpac.interpolators.Interpolator classes associated with the interpolation method.

    If input is a podpac.data.Interpolation class, this Interpolation class will be used without modification.

  • nan_vals (List, optional) – List of values from source data that should be interpreted as ‘no data’ or ‘nans’

  • coordinate_index_type (str, optional) – Type of index to use for data source. Possible values are ['list', 'numpy', 'xarray', 'pandas'] Default is ‘numpy’

  • cache_native_coordinates (bool) – Whether to cache native coordinates using the podpac cache_ctrl. Default False.

Notes

Custom DataSource Nodes must implement the get_data() and get_native_coordinates() methods.

Alternative Constructors

from_definition(definition)

Create podpac Node from a dictionary definition.

from_json(s)

Create podpac Node from a JSON definition.

Methods

__init__(**kwargs)

Do not overwrite me

create_output_array(coords[, data])

Initialize an output data array

eval(coordinates[, output])

Evaluates this node using the supplied coordinates.

eval_group(group)

Evaluate the node for each of the coordinates in the group.

find_coordinates()

Get the available native coordinates for the Node.

from_url(url)

Create podpac Node from a WMS/WCS request.

get_cache(key[, coordinates])

Get cached data for this node.

get_data(coordinates, coordinates_index)

This method must be defined by the data source implementing the DataSource class.

get_native_coordinates()

Returns a Coordinates object that describes the native coordinates of the data source.

has_cache(key[, coordinates])

Check for cached data for this node.

init()

Overwrite this method if a node needs to do any additional initialization after the standard initialization.

load(path)

Create podpac Node from file.

put_cache(data, key[, coordinates, overwrite])

Cache data for this node.

rem_cache(key[, coordinates, mode])

Clear cached data for this node.

save(path)

Write node to file.

set_native_coordinates(coordinates[, force])

Set the native_coordinates.

trait_is_defined(name)

Attributes

attrs

List of node attributes

base_ref

Default reference/name in node definitions

cache_ctrl

A trait whose value must be an instance of a specified class.

cache_native_coordinates

A boolean (True, False) trait.

cache_output

A boolean (True, False) trait.

cache_update

A boolean (True, False) trait.

coordinate_index_type

An enum whose value must be in a given sequence.

definition

dtype

A trait which allows any value.

hash

interpolation

interpolation_class

Get the interpolation class currently set for this data source.

interpolators

Return the interpolators selected for the previous node evaluation interpolation.

json

json_pretty

nan_vals

An instance of a Python list.

native_coordinates

{native_coordinates}

output

A trait for unicode strings.

outputs

An instance of a Python list.

style

A trait whose value must be an instance of a specified class.

units

A trait for unicode strings.

Members

__init__(**kwargs)

Do not overwrite me

cache_native_coordinates

A boolean (True, False) trait.

coordinate_index_type

An enum whose value must be in a given sequence.

eval(coordinates, output=None)[source]

Evaluates this node using the supplied coordinates.

The native coordinates are mapped to the requested coordinates, interpolated if necessary, and set to _requested_source_coordinates with associated index _requested_source_coordinates_index. The requested source coordinates and index are passed to get_data() returning the source data at the native coordinatesset to _requested_source_data. Finally _requested_source_data is interpolated using the interpolate method and set to the output attribute of the node.

Parameters
  • coordinates (podpac.Coordinates) –

    The set of coordinates requested by a user. The Node will be evaluated using these coordinates.

    An exception is raised if the requested coordinates are missing dimensions in the DataSource. Extra dimensions in the requested coordinates are dropped.

  • output (podpac.UnitsDataArray, optional) – Default is None. Optional input array used to store the output data. When supplied, the node will not allocate its own memory for the output array. This array needs to have the correct dimensions, coordinates, and coordinate reference system.

Returns

Unit-aware xarray DataArray containing the results of the node evaluation.

Return type

podpac.UnitsDataArray

Raises

ValueError – Cannot evaluate these coordinates

find_coordinates()[source]

Get the available native coordinates for the Node. For a DataSource, this is just the native_coordinates.

Returns

coords_list – singleton list containing the native_coordinates (Coordinates object)

Return type

list

get_data(coordinates, coordinates_index)[source]

This method must be defined by the data source implementing the DataSource class. When data source nodes are evaluated, this method is called with request coordinates and coordinate indexes. The implementing method can choose which input provides the most efficient method of getting data (i.e via coordinates or via the index of the coordinates).

Coordinates and coordinate indexes may be strided or subsets of the source data, but all coordinates and coordinate indexes will match 1:1 with the subset data.

This method may return a numpy array, an xarray DaraArray, or a podpac UnitsDataArray. If a numpy array or xarray DataArray is returned, podpac.data.DataSource.evaluate() will cast the data into a UnitsDataArray using the requested source coordinates. If a podpac UnitsDataArray is passed back, the podpac.data.DataSource.evaluate() method will not do any further processing. The inherited Node method create_output_array can be used to generate the template UnitsDataArray in your DataSource. See podpac.Node.create_output_array() for more details.

Parameters
  • coordinates (podpac.Coordinates) – The coordinates that need to be retrieved from the data source using the coordinate system of the data source

  • coordinates_index (List) – A list of slices or a boolean array that give the indices of the data that needs to be retrieved from the data source. The values in the coordinate_index will vary depending on the coordinate_index_type defined for the data source.

Returns

A subset of the returned data. If a numpy array or xarray DataArray is returned, the data will be cast into UnitsDataArray using the returned data to fill values at the requested source coordinates.

Return type

np.ndarray, xr.DataArray, podpac.UnitsDataArray

Raises

NotImplementedError – This needs to be implemented by derived classes

get_native_coordinates()[source]

Returns a Coordinates object that describes the native coordinates of the data source.

In most cases, this method is defined by the data source implementing the DataSource class. If method is not implemented by the data source, it will try to return self.native_coordinates if self.native_coordinates is not None.

Otherwise, this method will raise a NotImplementedError.

Returns

The coordinates describing the data source array.

Return type

podpac.Coordinates

Notes

Need to pay attention to: - the order of the dimensions - the stacking of the dimension - the type of coordinates

Coordinates should be non-nan and non-repeating for best compatibility

Raises

NotImplementedError – This needs to be implemented by derived classes

interpolation
property interpolation_class

Get the interpolation class currently set for this data source.

The DataSource interpolation property is used to define the podpac.data.Interpolation class that will handle interpolation for requested coordinates.

Returns

Interpolation class defined by DataSource interpolation definition

Return type

podpac.data.Interpolation

property interpolators

Return the interpolators selected for the previous node evaluation interpolation. If the node has not been evaluated, or if interpolation was not necessary, this will return an empty OrderedDict

Returns

Key are tuple of unstacked dimensions, the value is the interpolator used to interpolate these dimensions

Return type

OrderedDict

nan_vals

An instance of a Python list.

property native_coordinates

{native_coordinates}

set_native_coordinates(coordinates, force=False)[source]

Set the native_coordinates. Used by Compositors as an optimization.

Parameters

coordinates (podpac.Coordinates) – Coordinates to set. Usually these are coordinates that are shared across compositor sources.