podpac.data.S3

class podpac.data.S3(**kwargs)[source]

Bases: podpac.core.data.datasource.DataSource

Create a DataSource from a file on an S3 Bucket.

node

The DataSource node used to interpret the S3 file

Type

Node, optional

node_class

The class type of self.node. This is used to create self.node if self.node is not specified

Type

DataSource, optional

node_kwargs

Keyword arguments passed to node_class when automatically creating node

Type

dict, optional

return_type

Either: ‘file_handle’ (for files downloaded to RAM); or the default option ‘path’ (for files downloaded to disk)

Type

str, optional

s3_bucket

Name of the S3 bucket. Uses podpac.settings['S3_BUCKET_NAME'] by default.

Type

str, optional

s3_data

If return_type == ‘file_handle’ returns a file pointer object If return_type == ‘path’ returns a string to the data

Type

file/str

source

Path to the file residing in the S3 bucket that will be loaded

Type

str

Methods

__init__(**kwargs)

Do not overwrite me

create_output_array(coords[, data])

Initialize an output data array

eval(coordinates[, output])

Evaluates this node using the supplied coordinates.

eval_group(group)

Evaluate the node for each of the coordinates in the group.

find_coordinates()

Get the available native coordinates for the Node.

get_cache(key[, coordinates])

Get cached data for this node.

get_data(coordinates, coordinates_index)

This method must be defined by the data source implementing the DataSource class.

get_native_coordinates()

Returns a Coordinates object that describes the native coordinates of the data source.

has_cache(key[, coordinates])

Check for cached data for this node.

init()

Overwrite this method if a node needs to do any additional initialization after the standard initialization.

put_cache(data, key[, coordinates, overwrite])

Cache data for this node.

rem_cache(key[, coordinates, mode, all_cache])

Clear cached data for this node.

Attributes

base_definition

Base node defintion for DataSource nodes.

base_ref

Default pipeline node reference/name in pipeline node definitions

cache_ctrl

A trait whose value must be an instance of a specified class.

cache_output

A boolean (True, False) trait.

cache_update

A boolean (True, False) trait.

coordinate_index_type

An enum whose value must be in a given sequence.

definition

Full pipeline definition for this node.

dtype

A trait which allows any value.

hash

interpolation

A trait type representing a Union type.

interpolation_class

Get the interpolation class currently set for this data source.

interpolators

Return the interpolators selected for the previous node evaluation interpolation.

json

definition for this node in json format

json_pretty

nan_vals

An instance of a Python list.

native_coordinates

The coordinates of the data source.

node

node_class

A trait whose value must be a subclass of a specified class.

node_default

node_kwargs

An instance of a Python dict.

pipeline

Create a pipeline node from this node

return_type

An enum whose value must be in a given sequence.

s3_bucket

A trait for unicode strings.

s3_bucket_default

s3_data

A trait which allows any value.

s3_data_default

source

A trait for unicode strings.

style

A trait whose value must be an instance of a specified class.

units

A trait for unicode strings.

Members

__init__(**kwargs)

Do not overwrite me

get_data(coordinates, coordinates_index)[source]

This method must be defined by the data source implementing the DataSource class. When data source nodes are evaluated, this method is called with request coordinates and coordinate indexes. The implementing method can choose which input provides the most efficient method of getting data (i.e via coordinates or via the index of the coordinates).

Coordinates and coordinate indexes may be strided or subsets of the source data, but all coordinates and coordinate indexes will match 1:1 with the subset data.

This method may return a numpy array, an xarray DaraArray, or a podpac UnitsDataArray. If a numpy array or xarray DataArray is returned, podpac.data.DataSource.evaluate() will cast the data into a UnitsDataArray using the requested source coordinates. If a podpac UnitsDataArray is passed back, the podpac.data.DataSource.evaluate() method will not do any further processing. The inherited Node method create_output_array can be used to generate the template UnitsDataArray in your DataSource. See podpac.Node.create_output_array() for more details.

Parameters
  • coordinates (podpac.Coordinates) – The coordinates that need to be retrieved from the data source using the coordinate system of the data source

  • coordinates_index (List) – A list of slices or a boolean array that give the indices of the data that needs to be retrieved from the data source. The values in the coordinate_index will vary depending on the coordinate_index_type defined for the data source.

Returns

A subset of the returned data. If a numpy array or xarray DataArray is returned, the data will be cast into UnitsDataArray using the returned data to fill values at the requested source coordinates.

Return type

np.ndarray, xr.DataArray, podpac.UnitsDataArray

native_coordinates

The coordinates of the data source.

node
node_class

A trait whose value must be a subclass of a specified class.

node_default
node_kwargs

An instance of a Python dict.

return_type

An enum whose value must be in a given sequence.

s3_bucket

A trait for unicode strings.

s3_bucket_default
s3_data

A trait which allows any value.

s3_data_default
source

A trait for unicode strings.