sits package
The def_geobox function
- sits.def_geobox(bbox, crs_out=3035, resolution=10, shape=None)
This function creates an odc geobox.
- Parameters:
bbox (list) – coordinates of a bounding box in CRS units.
crs_out (str, optional) – CRS (EPSG code) of output coordinates. Defaults to 3035.
resolution (float, optional) – output spatial resolution in CRS units. Defaults to 10 (meters).
shape (tuple, optional) – output image size in pixels (x, y). Defaults to None.
- Returns:
geobox object
- Return type:
odc.geo.geobox.GeoBox
Example
>>> bbox = [100, 100, 200, 220] >>> crs_out = 3035 >>> # output geobox closest to the input bbox >>> geobox = def_geobox(bbox, crs_out)
>>> # output geobox with the same dimensions (number of rows and columns) >>> # as the input shape. >>> geobox = def_geobox(bbox, crs_out, shape=(10, 10))
The compare_crs function
- sits.compare_crs(crs_a, crs_b)
The Gdfgeom class
- class sits.Gdfgeom
Bases:
object
This class aims to calculate vector’s buffers and bounding box.
- buffer
vector layer with buffer.
- Type:
GeoDataFrame
- bbox
vector layer’s bounding box.
- Type:
GeoDataFrame
- set_bbox(df_attr)
Calculate the bounding box for each
Csv2gdf
’s GeoDataFrame feature.- Parameters:
df_attr (str) – GeoDataFrame attribute of class
Csv2gdf
. Can be one of the following: ‘gdf’, ‘buffer’, ‘bbox’.outfile (str, optional) – ouput filepath. Defaults to None.
- Returns:
GeoDataFrame object
Csv2gdf.bbox
.- Return type:
GeoDataFrame
Example
>>> geotable.set_bbox('buffer')
- set_buffer(df_attr, radius)
Calculate buffer geometries for each
Csv2gdf
’s GeoDataFrame feature.- Parameters:
df_attr (str) – GeoDataFrame attribute of class
Csv2gdf
. Can be one of the following: ‘gdf’, ‘buffer’, ‘bbox’.radius (float) – buffer distance in CRS unit.
outfile (str, optional) – ouput filepath. Defaults to None.
- Returns:
GeoDataFrame object
Csv2gdf.buffer
.- Return type:
GeoDataFrame
Example
>>> geotable.set_buffer('gdf', 100)
- to_vector(df_attr, outfile=None, driver='GeoJSON')
Write a
Csv2gdf
’s GeoDataFrame layer as a vector file.- Parameters:
df_attr (str) – GeoDataFrame attribute of class
Csv2gdf
. Can be one of the following: ‘gdf’, ‘buffer’, ‘bbox’.outfile (str, optional) – Output path. Defaults to None.
driver (str, optional) – Output vector file format (see GDAL/OGR Vector drivers: https://gdal.org/drivers/vector/index.html). Defaults to “GeoJSON”.
Example
>>> filename = 'mygeom' >>> geotable.to_vector('gdf', f'output/{filename}_gdf.geojson') >>> geotable.to_vector('buffer', f'output/{filename}_buffer.geojson') >>> geotable.to_vector('bbox', f'output/{filename}_bbox.geojson')
The Vec2gdf class
The Csv2gdf class
- class sits.Csv2gdf(csv_file, x_name, y_name, crs_in, id_name='no_id')
Bases:
Gdfgeom
This class aims to load csv tables with geographic coordinates into GeoDataFrame object. It inherits methods and attributes from
Gdfgeom
class- crs_in
CRS of coordinates described in the csv table.
- Type:
int
- table
DataFrame object.
- Type:
DataFrame
- Parameters:
csv_file (str) – csv filepath.
x_name (str) – name of the field describing X coordinates.
y_name (str) – name of the field describing Y coordinates.
crs_in (int) – CRS of coordinates described in the csv table.
id_name (str, optional) – name of the ID field. Defaults to “no_id”.
Example
>>> csv_file = 'example.csv' >>> crs_in = 4326 >>> geotable = Csv2gdf(csv_file, 'longitude', 'latitude', crs_in)
- __init__(csv_file, x_name, y_name, crs_in, id_name='no_id')
Initialize the attributes of Csv2gdf.
- del_rows(col_name, rows_values)
Drop rows from
Csv2gdf.table
according to a column’s values.- Parameters:
col_name (str) – column name.
rows_values (list) – list of values.
- set_gdf(crs_out)
Convert the class attribute
Csv2gdf.table
(DataFrame) into GeoDataFrame object, in the specified output CRS projection.- Parameters:
crs_out (int) – output CRS of GeoDataFrame.
outfile (str, optional) – Defaults to None.
- Returns:
GeoDataFrame object
Csv2gdf.gdf
.- Return type:
GeoDataFrame
Example
>>> geotable.set_gdf(3035)
The StacAttack class
- class sits.StacAttack(provider='mpc', collection='sentinel-2-l2a', key_sat='s2', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12', 'SCL'])
Bases:
object
This class aims to request time-series datasets on STAC catalog and store it as image or csv files.
- stac_conf
parameters for building datacube (xArray) from STAC items.
- Type:
dict
- Parameters:
provider (str, optional) – stac provider. Defaults to ‘mpc’. Can be one of the following: ‘mpc’ (Microsoft Planetary Computer), ‘aws’ (Amazon Web Services).
collection (str, optional) – stac collection. Defaults to ‘sentinel-2-l2a’.
bands (list, optional) – name of the field describing Y coordinates. Defaults to [‘B02’, ‘B03’, ‘B04’, ‘B05’, ‘B06’, ‘B07’, ‘B08’, ‘B8A’, ‘B11’, ‘B12’, ‘SCL’]
Example
>>> stacObj = StacAttack()
- __init__(provider='mpc', collection='sentinel-2-l2a', key_sat='s2', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12', 'SCL'])
Initialize the attributes of StacAttack.
- fixS2shift(shiftval=-1000, minval=1, **kwargs)
Fix Sentinel-2 radiometric offset applied since the ESA Processing Baseline 04.00. For more information: https://sentinels.copernicus.eu/web/sentinel/-/copernicus-sentinel-2-major-products-upgrade-upcoming
- Parameters:
shiftval (int) – radiometric offset value. Defaults to -1000.
minval (int) – minimum radiometric value. Defaults to 1.
**kwargs – other arguments
Returns:
StacAttack.image
with corrected radiometric values.
- gapfill(method='linear', first_last=True, **kwargs)
Gap-fill NaN pixel values through the satellite time-series.
- Parameters:
method (string, optional) – method to use for interpolation (see
xarray.DataArray.interpolate_na
). Defaults to ‘linear’.first_last (bool, optional) – Interpolation of the first and last image of the satellite time-series with
xarray.DataArray.bfill
andxarray.DataArray.ffill
. Defaults to True.**kwargs – other arguments of
xarray.DataArray.interpolate_na
.
Example
>>> stacObj.gapfill()
- loadCube(bbox, arrtype='image', dimx=5, dimy=5, resolution=10, crs_out=3035)
Load images according to a bounding box, with in option predefined pixels dimensions (x, y).
- Parameters:
bbox (list) – coordinates of bounding box [xmin, ymin, xmax, ymax] in the output crs unit.
(string (arrtype) – xarray dataset name. Defaults to ‘image’. Can be one of the following: ‘patch’, ‘image’, ‘masked’.
optional – xarray dataset name. Defaults to ‘image’. Can be one of the following: ‘patch’, ‘image’, ‘masked’.
dimx (int, optional) – number of pixels in columns. Defaults to 5.
dimy (int, optional) – number of pixels in rows. Defaults to 5.
resolution (float, optional) – spatial resolution (in crs unit). Defaults to 10.
crs_out (int, optional) – CRS of output coordinates. Defaults to 3035.
- Returns:
geobox object
StacAttack.geobox
. xarray.Dataset: time-series imageStacAttack.cube
.- Return type:
odc.geo.geobox.GeoBox
Example
>>> aoi_bounds = [0, 0, 1, 1] >>> stacObj.loadCube(aoi_bounds, arrtype='patch', dimx=10, dimy=10)
- mask(mask_array=None, mask_band='SCL', mask_values=[3, 8, 9, 10])
Load binary mask.
- Parameters:
mask_array (xarray.Dataarray, optional) – xarray.dataarray binanry mask (with same dimensions as
StacAttack.cube
). Defaults to None.mask_band (string, optional) – band name used as a mask (i.e. ‘SCL’ for Sentinel-2). Defaults to ‘SCL’.
mask_values (list, optional) – band values related to masked pixels. Defaults to [3, 8, 9, 10].
- Returns:
time-series of binary masks
StacAttack.mask
- Return type:
xarray.Dataarray
Example
>>> stacObj.mask()
- mask_apply()
Apply mask pre-loaded as
StacAttack.mask
on the satellite time-seriesStacAttack.cube
.Example
>>> stacObj.mask() >>> stacObj.mask_apply()
- searchItems(bbox_latlon, date_start=datetime.datetime(2023, 1, 1, 0, 0), date_end=datetime.datetime(2023, 12, 31, 0, 0), **kwargs)
Get list of stac collection’s items.
- Parameters:
bbox_latlon (list) – coordinates of bounding box.
date_start (datetime.datetime, optional) – start date. Defaults to ‘2023-01’.
date_end (datetime.datetime, optional) – end date. Defaults to ‘2023-12’.
**kwargs – others stac compliant arguments.
- Returns:
list of stac collection items
StacAttack.items
.- Return type:
pystac.ItemCollection
Example
>>> stacObj.searchItems(aoi_bounds_4326)
- spectral_index(indices_to_compute: str | list[str], band_mapping: dict = None, **kwargs)
Calculate various spectral indices for remote sensing data using the spyndex and awesome-spectral-indices libraries.
- Parameters:
dataset (xr.Dataset) – The xarray.Dataset containing spectral bands.
band_mapping (dict, optional) – A dictionary to map your dataset’s band names to spyndex’s standard band names (e.g., {‘R’: ‘B04’, ‘N’: ‘B08’}). If None, it assumes your dataset’s variable names are directly usable by spyndex.
**kwargs – other arguments
- Returns:
time-series image
StacAttack.indices
.- Return type:
xarray.Dataarray
Example
>>> stacObj.spectral_index('NDVI', {'R': 'B04', 'N': 'B08'})
- to_csv(outdir, gid=None, id_point='station_id')
Convert xarray dataset into csv file.
- Parameters:
outdir (str) – output directory.
gid (str, optional) – column name of ID. Defaults to None.
Example
>>> outdir = 'output' >>> stacObj.to_csv(outdir)
- to_nc(outdir, gid=None)
Convert xarray dataset into netcdf file.
- Parameters:
outdir (str) – output directory.
gid (str, optional) – column name of ID. Defaults to None.
array_type (str, optional) – xarray dataset name. Defaults to ‘image’. Can be one of the following: ‘patch’, ‘image’, ‘masked’.
Example
>>> outdir = 'output' >>> stacObj.to_nc(outdir)
The Labels class
- class sits.Labels(geolayer)
Bases:
object
This class aims to produce a image of labels from a vector file.
- Parameters:
geolayer (str or geodataframe) – vector layer to rasterize.
- Returns:
geodataframe
Labels.gdf
.- Return type:
GeoDataFrame
Example
>>> geodataframe = <gdf object> >>> vlayer = Labels(geodataframe)
>>> vector_file = 'myVector.shp' >>> vlayer = Labels(vector_file)
- __init__(geolayer)
Initialize the attributes of Labels.
- to_raster(id_field, geobox, filename, outdir, ext='tif', driver='GTiff')
Convert geodataframe into raster file while keeping a column attribute as pixel values.
- Parameters:
id_field (str) – column name to keep as pixels values.
geobox (odc.geo.geobox.GeoBox) – geobox object.
filename (str) – output raster filename.
outdir (str) – output directory.
ext (str, optional) – raster file extension. Defaults to “tif”.
driver (str, optional) – output raster format (gdal standard). Defaults to “GTiff”.
Example
>>> bbox = [0, 0, 1, 1] >>> crs_out = 3035 >>> resolution = 10 >>> geobox = def_geobox(bbox, crs_out, resolution) >>> vlayer.to_raster('id', geobox, 'output_img', 'output_dir')
The Multiproc class
- class sits.Multiproc(array_type, fext, outdir)
Bases:
object
This class aims to parallelize the production of images or patches.
- Parameters:
array_type (str) – xarray dataset name. Can be one of the following: ‘patch’, ‘image’.
fext (str) – output file format: Can be one of the following: ‘nc’, ‘csv’
outdir (str) – output directory.
Example
>>> mproc = Multiproc('patch', 'nc', 'output')
- __init__(array_type, fext, outdir)
Initialize the attributes of
Multiproc
.
- addParams_gapfill(method='linear', first_last=True, **kwargs)
Add optional parameters for
StacAttack.gapfill()
called throughMultiproc.fetch_func()
.- Parameters:
method (string, optional) – method to use for interpolation (see
xarray.DataArray.interpolate_na
). Defaults to ‘linear’.first_last (bool, optional) – Interpolation of the first and last image of the satellite time-series with
xarray.DataArray.bfill
andxarray.DataArray.ffill
. Defaults to True.**kwargs – other arguments of
xarray.DataArray.interpolate_na
.
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.addParams_gapfill(method='nearest', first_last=False):
- addParams_loadCube(dimx=5, dimy=5, resolution=10, crs_out=3035)
Add optional parameters for
StacAttack.loadCube()
called throughMultiproc.fetch_func()
.- Parameters:
dimx (int, optional) – number of pixels in columns. Defaults to 5.
dimy (int, optional) – number of pixels in rows. Defaults to 5.
resolution (float, optional) – spatial resolution (in crs unit). Defaults to 10.
crs_out (int, optional) – CRS of output coordinates. Defaults to 3035.
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.addParams_loadCube(dimx=20, dimy=20):
- addParams_mask(mask_array=None, mask_band='SCL', mask_values=[3, 8, 9, 10])
Add optional parameters for
StacAttack.mask()
called throughMultiproc.fetch_func()
.- Parameters:
mask_array (xarray.Dataarray, optional) – xarray.dataarray binanry mask (with same dimensions as
StacAttack.cube
). Defaults to None.mask_band (string, optional) – band name used as a mask (i.e. ‘SCL’ for Sentinel-2). Defaults to ‘SCL’.
mask_values (list, optional) – band values related to masked pixels. Defaults to [3, 8, 9, 10].
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.addParams_mask(mask_values=[0]):
- addParams_searchItems(date_start=datetime.datetime(2023, 1, 1, 0, 0), date_end=datetime.datetime(2023, 12, 31, 0, 0), **kwargs)
Add optional parameters for
StacAttack.searchItems()
called throughMultiproc.fetch_func()
.- Parameters:
date_start (datetime.datetime, optional) – start date. Defaults to ‘2023-01’.
date_end (datetime.datetime, optional) – end date. Defaults to ‘2023-12’.
**kwargs (optional) – others stac compliant arguments, e.g.
query
parameters to filter according to cloud %.
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.addParams_searchItems(date_start=datetime(2016, 1, 1), query={"eo:cloud_cover": {"lt": 20}})
- addParams_stacAttack(provider='mpc', collection='sentinel-2-l2a', key_sat='s2', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12', 'SCL'])
Add optional parameters for
StacAttack class instance
called throughMultiproc.fetch_func()
.- Parameters:
provider (str, optional) – stac provider. Defaults to ‘mpc’. Can be one of the following: ‘mpc’ (Microsoft Planetary Computer), ‘aws’ (Amazon Web Services).
collection (str, optional) – stac collection. Defaults to ‘sentinel-2-l2a’.
bands (list, optional) – name of the field describing Y coordinates. Defaults to [‘B02’, ‘B03’, ‘B04’, ‘B05’, ‘B06’, ‘B07’, ‘B08’, ‘B8A’, ‘B11’, ‘B12’, ‘SCL’]
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.addParams_stacAttack(bands=['B02', 'B03', 'B04'])
- addParams_to_raster(ext='tif', driver='GTiff')
Add optional parameters for
Labels.to_raster()
called throughMultiproc.fetch_func()
.- Parameters:
ext (str, optional) – raster file extension. Defaults to “tif”.
driver (str, optional) – output raster format (gdal standard). Defaults to “GTiff”.
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.addParams_to_raster(driver="COG")
- add_label(geolayer, id_field)
Export an image of labels with the same dimensions than the datacube, by calling the method
Labels.to_raster()
.- Parameters:
geolayer (GeoDataFrame) – vector file.
id_field (str) – attribute field name.
Example
>>> mproc = Multiproc('patch', 'nc', 'output') >>> mproc.add_label(vlayer, 'myfield')
- dask_compute(scheduler_type='processes')
Call of
dask.compute
to trigger the actual execution of delayed tasks (i.e.Multiproc.fetch_dask
), gathering their results into a final output.- Parameters:
scheduler_type (str) –
type of scheduler. Defaults to ‘processes’. Can be one of the following: - Single-threaded Scheduler ‘single-threaded’ or ‘sync’:
Runs computations in a single thread without parallelism.
Suitable for debugging or when parallelism isn’t required.
- Threaded Scheduler ‘threads’:
Utilizes a pool of threads to execute tasks concurrently.
Good for I/O-bound tasks and when tasks release the Global Interpreter Lock (GIL).
- Multiprocessing Scheduler ‘processes’:
Uses a pool of separate processes to execute tasks in parallel.
Suitable for CPU-bound tasks and when tasks are limited by the GIL.
- Distributed Scheduler ‘distributed’:
Uses a distributed cluster to execute tasks.
Best for large-scale computations across multiple machines.
Example
>>> mproc.dask_compute()
- del_func()
Clear
Multiproc.fetch_dask
, the list ofdask.delayed
function’s instances.
- fetch_func(aoi_latlong, aoi_proj, gid, mask=False, gapfill=False, **kwargs)
Call of
dask.delayed
to convert theMultiproc.__fdask()
function into a delayed object, allowing for lazy evaluation and parallel execution, thus optimizing computational workflows.- Parameters:
aoi_latlong (list) – coordinates of bounding box.
aoi_proj (list) – coordinates of bounding box [xmin, ymin, xmax, ymax] in the output crs.
gid (int) – image/patch index.
**kwargs (dict) – additional arguments (i.e.
StacAttack.searchItems()
,StacAttack.loadImgs()
,StacAttack.loadPatches()
).
- Returns:
list of
dask.delayed
function’s instances.- Return type:
Multiproc.fetch_dask
Example
>>> for bboxes, gid in enumerate(my_df['bboxes']): mproc.fetch_func(bboxes[0], bboxes[1], gid)