Interpolation
Description
PODPAC allows users to specify various different interpolation schemes for nodes with
increased granularity, and even lets users write their own interpolators. By default
PODPAC uses the podpac.settings["DEFAULT_INTERPOLATION"] == "nearest", which may
be modified by users. Users who wish to see raw datasources can also use "none"
for the interpolator type.
Relevant example notebooks include:
Examples
Consider a DataSource with lat, lon, time coordinates that we will instantiate as:
node = DataSource(..., interpolation=interpolation)
interpolation can be specified …
…as a string
interpolation='nearest'
<<<<<<< HEAD
Descripition: All dimensions are interpolated using nearest neighbor interpolation. This is the default, but available options can be found here:
podpac.core.interpolation.interpolation.INTERPOLATION_METHODS. In particular, for no interpolation, useinterpolation="none". NOTE thenoneinterpolator ONLY considers the bounds of any evaluated coordinates. This means the data is returned at FULL resolution (no striding or sub-selection).Details: PODPAC will automatically select appropriate interpolators based on the source coordinates and eval coordinates. Default interpolator orders can be found in
podpac.core.interpolation.interpolation.INTERPOLATION_METHODS_DICT
…as a dictionary
interpolation = {
'method': 'nearest',
'params': { # Optional. Available parameters depend on the particular interpolator
'spatial_tolerance': 1.1,
'time_tolerance': np.timedelta64(1, 'D')
},
'interpolators': [ScipyGrid, NearestNeighbor] # Optional. Available options are in podpac.core.interpolation.interpolation.INTERPOLATORS
}
Descripition: All dimensions are interpolated using nearest neighbor interpolation, and the type of interpolators are tried in the order specified. For applicable interpolators, the specified parameters will be used.
Details: PODPAC loops through the
interpolatorslist, checking if the interpolator is able to interpolate between the evaluated and source coordinates. The first capable interpolator available will be used.
…as a list
interpolation = [
{
'method': 'bilinear',
'dims': ['lat', 'lon']
},
{
'method': 'nearest',
'dims': ['time']
}
]
Descripition: The dimensions listed in the
'dims'list will used the specified method. These dictionaries can also specify the same field shown in the previous section.Details: PODPAC loops through the
interpolationlist, using the settings specified for each dimension independently.
NOTE! Specifying the interpolation as a list also control the ORDER of interpolation.
The first item in the list will be interpolated first. In this case, lat/lon will be bilinearly interpolated BEFORE time is nearest-neighbor interpolated.
Interpolators
The list of available interpolators are as follows:
NoneInterpolator: An interpolator that passes through the raw, source data at full resolution – it does not do any interpolation. Note: This interpolator can be used for some of the dimension by specifyinginterpolationas a list.NearestNeighbor: A custom implementation based onscipy.cKDtree, which handles nearly any combination of source and destination coordinatesXarrayInterpolator: A light-weight wrapper aroundxarray’sDataArray.interpmethod, which is itself a wrapper aroundscipyinterpolation functions, but with a cleanxarrayinterfaceRasterioInterpolator: A wrapper aroundrasterio’s interpolation/reprojection routines. Appropriate for grid-to-grid interpolation.ScipyGrid: An optimized implementation forgridsources that usesscipy’sRegularGridInterpolator, orRectBivariateSplitinterpolator depending on the method.ScipyPoint: An implementation based onscipy.KDtreecapable ofnearestinterpolation forpointsourcesNearestPreview: An approximate nearest-neighbor interpolator useful for rapidly viewing large files
The default order for these interpolators can be found in podpac.data.INTERPOLATORS.
NearestNeighbor
Since this is the most general of the interpolators, this section deals with the available parameters and settings for the NearestNeighbor interpolator.
Parameters
The following parameters can be set by specifying the interpolation as a dictionary or a list, as described above.
respect_bounds:boolDefault is
True. IfTrue, any requested dimension OUTSIDE of the bounds will be interpolated asnan. Otherwise, any point outside the bounds will have the value of the nearest neighboring point
remove_nan:boolDefault is
False. IfTrue,nan’s in the source dataset will NOT be interpolated. This can be used if a value for the function is needed at every point of the request. It is not helpful when computing statistics, wherenanvalues will be explicitly ignored. In that case, ifremove_nanisTrue,nanvalues will take on the values of neighbors, skewing the statistical result.
*_tolerance:float, where*in [“spatial”, “time”, “alt”]Default is
inf. Maximum distance to the nearest coordinate to be interpolated. Corresponds to the unit of the*dimension.
*_scale:float, where*in [“spatial”, “time”, “alt”]Default is
1. This only applies when the source has stacked dimensions with different units. The*_scaledefines the factor that the coordinates will be scaled by (coordinates are divided by*_scale) to output a valid distance for the combined set of dimensions. For example, when “lat, lon, and alt” dimensions are stacked, [“lat”, “lon”] are in degrees and “alt” is in feet, the*_scaleparameters should be set so that|| [dlat / spatial_scale, dlon / spatial_scale, dalt / alt_scale] ||results in a reasonable distance.
use_selector:boolDefault is
True. IfTrue, a subset of the coordinates will be selected BEFORE the data of a dataset is retrieved. This reduces the number of data retrievals needed for large datasets. In cases whereremove_nan=True, the selector may select onlynanpoints, in which case the interpolation fails to produce non-nandata. This usually happens when requesting a single point from a dataset that containsnans. As such, in these cases setuse_selector=Falseto get a non-nanvalue.
Advanced NearestNeighbor Interpolation Examples
Only interpolate points that are within
1of the source data lat/lon locations
interpolation={"method": "nearest", "params": {"spatial_tolerance": 1}},
When interpolating with mixed time/space, use
1day as equivalent to1degree for determining the distance
interpolation={
"method": "nearest",
"params": {
"spatial_scale": 1,
"time_scale": "1,D",
"alt_scale": 10,
}
}
Remove nan values in the source datasource – in some cases a
nanmay still be interpolated
interpolation={
"method": "nearest",
"params": {
"remove_nan": True,
}
}
Remove nan values in the source datasource in all cases, even for single point requests located directly at
nan-values in the source.
interpolation={
"method": "nearest",
"params": {
"remove_nan": True,
"use_selector": False,
}
}
Do nearest-neighbor extrapolation outside of the bounds of the source dataset
interpolation={
"method": "nearest",
"params": {
"respect_bounds": False,
}
}
Do nearest-neighbor interpolation of time with
nanremoval followed by spatial interpolation
interpolation = [
{
"method": "nearest",
"params": {
"remove_nan": True,
},
"dims": ["time"]
},
{
"method": "nearest",
"dims": ["lat", "lon", "alt"]
},
]
Notes and Caveats
While the API is well developed, all conceivable functionality is not. For example, while we can interpolate gridded data to point data, point data to grid data interpolation is not as well supported, and there may be errors or unexpected results. Advanced users can develop their own interpolators, but this is not currently well-documented.
Gotcha: Parameters for a specific interpolator may be ignored if a different interpolator is automatically selected. These ignored parameters are logged as warnings.