podpac.data.DataSource

class podpac.data.DataSource(**kwargs: Any)[source]

Bases: Node

Base node for any data obtained directly from a single source.

Parameters:

source (Any) – The location of the source. Depending on the child node this can be a filepath, numpy array, or dictionary as a few examples.
coordinates (podpac.Coordinates) – The coordinates of the data source.
nan_vals (List, optional) – List of values from source data that should be interpreted as ‘no data’ or ‘nans’
coordinate_index_type (str, optional) – Type of index to use for data source. Possible values are ['slice', 'numpy', 'xarray'] Default is ‘numpy’, which allows a tuple of integer indices.
cache_coordinates (bool) – Whether to cache coordinates using the podpac cache_ctrl. Default False.
cache_output (bool) – Should the node’s output be cached? If not provided or None, uses default based on settings[“CACHE_DATASOURCE_OUTPUT_DEFAULT”]. If True, outputs will be cached and retrieved from cache. If False, outputs will not be cached OR retrieved from cache (even if they exist in cache).

Notes

Custom DataSource Nodes must implement the get_data() and get_coordinates() methods.

Alternative Constructors

`from_definition`(definition)	Create podpac Node from a dictionary definition.
`from_json`(s)	Create podpac Node from a JSON definition.

Methods

`__init__`(**kwargs)	Do not overwrite me
`create_output_array`(coords[, data, attrs, ...])	Initialize an output data array
`eval`(coordinates, **kwargs)	Wraps the super Node.eval method in order to cache with the correct coordinates.
`eval_group`(group)	Evaluate the node for each of the coordinates in the group.
`find_coordinates`()	Get the available coordinates for the Node.
`from_name_params`(name[, params])	Create podpac Node from a WMS/WCS request.
`from_url`(url)	Create podpac Node from a WMS/WCS request.
`get_bounds`([crs])	Get the full available coordinate bounds for the Node.
`get_cache`(key[, coordinates])	Get cached data for this node.
`get_coordinates`()	Returns a Coordinates object that describes the coordinates of the data source.
`get_data`(coordinates, coordinates_index)	This method must be defined by the data source implementing the DataSource class.
`get_source_data`([bounds])	Get source data, without interpolation.
`get_ui_spec`([help_as_html])	Get spec of node attributes for building a ui
`has_cache`(key[, coordinates])	Check for cached data for this node.
`init`()	Overwrite this method if a node needs to do any additional initialization after the standard initialization.
`load`(path)	Create podpac Node from file.
`probe`([lat, lon, time, alt, crs])	Evaluates every part of a node / pipeline at a point and records which nodes are actively being used.
`put_cache`(data, key[, coordinates, expires, ...])	Cache data for this node.
`rem_cache`(key[, coordinates, mode])	Clear cached data for this node.
`save`(path)	Write node to file.
`set_coordinates`(coordinates[, force])	Set the coordinates.
`trait_defaults`(names, *metadata)	Return a trait's default value or a dictionary of them
`trait_has_value`(name)	Returns True if the specified trait has a value.
`trait_is_defined`(name)
`trait_values`(**metadata)	A `dict` of trait names and their values.

Attributes

`attrs`	List of node attributes
`base_ref`	Default reference/name in node definitions
`boundary`	An instance of a Python dict.
`cache_coordinates`	A boolean (True, False) trait.
`cache_ctrl`	A trait whose value must be an instance of a specified class.
`cache_output`	A boolean (True, False) trait.
`coordinate_index_type`	An enum whose value must be in a given sequence.
`coordinates`	{coordinates}
`definition`
`dims`	datasource dims.
`dtype`	An enum whose value must be in a given sequence.
`force_eval`	A boolean (True, False) trait.
`hash`
`json`	Definition for this node in JSON format.
`json_pretty`	Definition for this node in JSON format, with indentation suitable for display.
`nan_val`	A trait which allows any value.
`nan_vals`	An instance of a Python list.
`output`	A trait for unicode strings.
`outputs`	An instance of a Python list.
`style`	A trait whose value must be an instance of a specified class.
`udims`	datasource udims.
`units`	A trait for unicode strings.

Members:

__init__(**kwargs): Do not overwrite me

boundary

An instance of a Python dict.

One or more traits can be passed to the constructor to validate the keys and/or values of the dict. If you need more detailed validation, you may use a custom validator method.

Changed in version 5.0: Added key_trait for validating dict keys.

Changed in version 5.0: Deprecated ambiguous trait, traits args in favor of value_trait, per_key_traits.

cache_coordinates: A boolean (True, False) trait.

cache_output: A boolean (True, False) trait.

coordinate_index_type: An enum whose value must be in a given sequence.

property coordinates: {coordinates}

property dims: datasource dims.

eval(coordinates, **kwargs)[source]

Wraps the super Node.eval method in order to cache with the correct coordinates.

The output is independent of the crs or any extra dimensions, so this transforms and removes extra dimensions before caching in the super eval method.

find_coordinates()[source]

Get the available coordinates for the Node. For a DataSource, this is just the coordinates.

Returns:: coords_list – singleton list containing the coordinates (Coordinates object)
Return type:: list

get_bounds(crs='default')[source]

Get the full available coordinate bounds for the Node.

Parameters:

crs (str) – Desired CRS for the bounds. Use ‘source’ to use the native source crs. If not specified, podpac.settings[“DEFAULT_CRS”] is used. Optional.

Returns:

bounds (dict) – Bounds for each dimension. Keys are dimension names and values are tuples (min, max).
crs (str) – The crs for the bounds.

get_coordinates()[source]

Returns a Coordinates object that describes the coordinates of the data source.

In most cases, this method is defined by the data source implementing the DataSource class. If method is not implemented by the data source, it will try to return self.coordinates if self.coordinates is not None.

Otherwise, this method will raise a NotImplementedError.

Returns:: The coordinates describing the data source array.
Return type:: podpac.Coordinates

Notes

Need to pay attention to: - the order of the dimensions - the stacking of the dimension - the type of coordinates

Coordinates should be non-nan and non-repeating for best compatibility

Raises:: NotImplementedError – This needs to be implemented by derived classes

get_data(coordinates, coordinates_index)[source]

This method must be defined by the data source implementing the DataSource class. When data source nodes are evaluated, this method is called with request coordinates and coordinate indexes. The implementing method can choose which input provides the most efficient method of getting data (i.e via coordinates or via the index of the coordinates).

Coordinates and coordinate indexes may be strided or subsets of the source data, but all coordinates and coordinate indexes will match 1:1 with the subset data.

This method may return a numpy array, an xarray DaraArray, or a podpac UnitsDataArray. If a numpy array or xarray DataArray is returned, podpac.data.DataSource.evaluate() will cast the data into a UnitsDataArray using the requested source coordinates. If a podpac UnitsDataArray is passed back, the podpac.data.DataSource.evaluate() method will not do any further processing. The inherited Node method create_output_array can be used to generate the template UnitsDataArray in your DataSource. See podpac.Node.create_output_array() for more details.

Parameters:

coordinates (podpac.Coordinates) – The coordinates that need to be retrieved from the data source using the coordinate system of the data source
coordinates_index (List) – A list of slices or a boolean array that give the indices of the data that needs to be retrieved from the data source. The values in the coordinate_index will vary depending on the coordinate_index_type defined for the data source.

Returns:

A subset of the returned data. If a numpy array or xarray DataArray is returned, the data will be cast into UnitsDataArray using the returned data to fill values at the requested source coordinates.

Return type:

np.ndarray, xr.DataArray, podpac.UnitsDataArray

Raises:

NotImplementedError – This needs to be implemented by derived classes

get_source_data(bounds={})[source]

Get source data, without interpolation.

Parameters:: bounds (dict) – Dictionary of bounds by dimension, optional. Keys must be dimension names, and values are (min, max) tuples, e.g. {'lat': (10, 20)}.
Returns:: data – Source data
Return type:: UnitsDataArray

nan_val: A trait which allows any value.

nan_vals: An instance of a Python list.

set_coordinates(coordinates, force=False)[source]

Set the coordinates. Used by Compositors as an optimization.

Parameters:

coordinates (podpac.Coordinates) – Coordinates to set. Usually these are coordinates that are shared across compositor sources.
NOTE (This is only currently used by SMAPCompositor. It should potentially be moved to the SMAPSource.) –

property udims: datasource udims.