Wrapping Datasetsο
Wrapping a new dataset is challenging because you have to understand all of the quirks of the new dataset and deal with the quirks of PODPAC as well. This reference is meant to record a few rules of thumb when wrapping new datasets to help you deal with the latter.
Rulesο
When evaluating a node with a set of coordinates:
The evaluation coordinates must include ALL of the dimensions present in the source dataset
The evaluation coordinates MAY contain additional dimensions NOT present in the source dataset, and the source may ignore these
When returning data from a data source node:
The ORDER of the evaluation coordinates MUST be preserved (see
UnitsDataArray.part_transpose
)Any multi-channel data must be returned using the
output
dimension which is ALWAYS the LAST dimension
Nodes should be lightweight to instantiate and users should expect fail on eval. Easy checks should be performed on initialization, but anything expensive should be delayed.
Guideο
In theory, to wrap a new DataSource
:
Create a new class that inherits from
podpac.core.data.DataSource
or a derived class (see thepodpac.core.data
module for generic data readers).Implement a method for opening/accessing the data, or use an existing generic data node and hard-code certain attributes
Implement the
get_coordinates(self)
methodImplement the
get_data(self, coordinates, coordinates_index)
methodcoordinates
is apodpac.Coordinates
object and itβs in the same coordinate system as the data source (i.e. a subset of what comes out ofget_coordinates
)coordinates_index
is a list (or tuple?) of slices or boolean arrays or index arrays to indexes into the output ofget_coordinates()
to producecoordinates
that come into this function.
In practice, the real trick is implementing a compositor to put multiple tiles together to look like a single DataSource
. We tend to use the podpac.compositor.OrderedCompositor
node for this task, but it does not handle interpolation between tiles. Instead, see the podpac.core.compositor.tile_compositor
module.
When using compositors, it is prefered the that sources
attribute is populated at instantiation, but on-the-fly (i.e. at eval) population of sources is also acceptible and sometimes necessary for certain datasources.
For examples, check the podpac.datalib
module.
Happy wrapping!