sits package

The def_geobox function

sits.def_geobox(bbox, crs_out=3035, resolution=10, shape=None)

This function creates an odc geobox.

Parameters:
  • bbox (list) – coordinates of a bounding box in CRS units.

  • crs_out (str, optional) – CRS (EPSG code) of output coordinates. Defaults to 3035.

  • resolution (float, optional) – output spatial resolution in CRS units. Defaults to 10 (meters).

  • shape (tuple, optional) – output image size in pixels (x, y). Defaults to None.

Returns:

geobox object

Return type:

odc.geo.geobox.GeoBox

Example

>>> bbox = [100, 100, 200, 220]
>>> crs_out = 3035
>>> # output geobox closest to the input bbox
>>> geobox = def_geobox(bbox, crs_out)
>>> # output geobox with the same dimensions (number of rows and columns)
>>> # as the input shape.
>>> geobox = def_geobox(bbox, crs_out, shape=(10, 10))

The compare_crs function

sits.compare_crs(crs_a, crs_b)

The Gdfgeom class

class sits.Gdfgeom

Bases: object

This class aims to calculate vector’s buffers and bounding box.

buffer

vector layer with buffer.

Type:

GeoDataFrame

bbox

vector layer’s bounding box.

Type:

GeoDataFrame

set_bbox(df_attr)

Calculate the bounding box for each Csv2gdf’s GeoDataFrame feature.

Parameters:
  • df_attr (str) – GeoDataFrame attribute of class Csv2gdf. Can be one of the following: ‘gdf’, ‘buffer’, ‘bbox’.

  • outfile (str, optional) – ouput filepath. Defaults to None.

Returns:

GeoDataFrame object Csv2gdf.bbox.

Return type:

GeoDataFrame

Example

>>> geotable.set_bbox('buffer')
set_buffer(df_attr, radius)

Calculate buffer geometries for each Csv2gdf’s GeoDataFrame feature.

Parameters:
  • df_attr (str) – GeoDataFrame attribute of class Csv2gdf. Can be one of the following: ‘gdf’, ‘buffer’, ‘bbox’.

  • radius (float) – buffer distance in CRS unit.

  • outfile (str, optional) – ouput filepath. Defaults to None.

Returns:

GeoDataFrame object Csv2gdf.buffer.

Return type:

GeoDataFrame

Example

>>> geotable.set_buffer('gdf', 100)
to_vector(df_attr, outfile=None, driver='GeoJSON')

Write a Csv2gdf’s GeoDataFrame layer as a vector file.

Parameters:
  • df_attr (str) – GeoDataFrame attribute of class Csv2gdf. Can be one of the following: ‘gdf’, ‘buffer’, ‘bbox’.

  • outfile (str, optional) – Output path. Defaults to None.

  • driver (str, optional) – Output vector file format (see GDAL/OGR Vector drivers: https://gdal.org/drivers/vector/index.html). Defaults to “GeoJSON”.

Example

>>> filename = 'mygeom'
>>> geotable.to_vector('gdf', f'output/{filename}_gdf.geojson')
>>> geotable.to_vector('buffer', f'output/{filename}_buffer.geojson')
>>> geotable.to_vector('bbox', f'output/{filename}_bbox.geojson')

The Vec2gdf class

class sits.Vec2gdf(vec_file)

Bases: Gdfgeom

This class aims to load a vector file as a GeoDataFrame object. It inherits methods and attributes from Gdfgeom class.

Example

>>> v_path = '<vector file path>'
>>> geotable = Vec2gdf(v_path)
__init__(vec_file)

The Csv2gdf class

class sits.Csv2gdf(csv_file, x_name, y_name, crs_in, id_name='no_id')

Bases: Gdfgeom

This class aims to load csv tables with geographic coordinates into GeoDataFrame object. It inherits methods and attributes from Gdfgeom class

crs_in

CRS of coordinates described in the csv table.

Type:

int

table

DataFrame object.

Type:

DataFrame

Parameters:
  • csv_file (str) – csv filepath.

  • x_name (str) – name of the field describing X coordinates.

  • y_name (str) – name of the field describing Y coordinates.

  • crs_in (int) – CRS of coordinates described in the csv table.

  • id_name (str, optional) – name of the ID field. Defaults to “no_id”.

Example

>>> csv_file = 'example.csv'
>>> crs_in = 4326
>>> geotable = Csv2gdf(csv_file, 'longitude', 'latitude', crs_in)
__init__(csv_file, x_name, y_name, crs_in, id_name='no_id')

Initialize the attributes of Csv2gdf.

del_rows(col_name, rows_values)

Drop rows from Csv2gdf.table according to a column’s values.

Parameters:
  • col_name (str) – column name.

  • rows_values (list) – list of values.

set_gdf(crs_out)

Convert the class attribute Csv2gdf.table (DataFrame) into GeoDataFrame object, in the specified output CRS projection.

Parameters:
  • crs_out (int) – output CRS of GeoDataFrame.

  • outfile (str, optional) – Defaults to None.

Returns:

GeoDataFrame object Csv2gdf.gdf.

Return type:

GeoDataFrame

Example

>>> geotable.set_gdf(3035)

The StacAttack class

class sits.StacAttack(provider='mpc', collection='sentinel-2-l2a', key_sat='s2', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12', 'SCL'])

Bases: object

This class aims to request time-series datasets on STAC catalog and store it as image or csv files.

stac_conf

parameters for building datacube (xArray) from STAC items.

Type:

dict

Parameters:
  • provider (str, optional) – stac provider. Defaults to ‘mpc’. Can be one of the following: ‘mpc’ (Microsoft Planetary Computer), ‘aws’ (Amazon Web Services).

  • collection (str, optional) – stac collection. Defaults to ‘sentinel-2-l2a’.

  • bands (list, optional) – name of the field describing Y coordinates. Defaults to [‘B02’, ‘B03’, ‘B04’, ‘B05’, ‘B06’, ‘B07’, ‘B08’, ‘B8A’, ‘B11’, ‘B12’, ‘SCL’]

Example

>>> stacObj = StacAttack()
__init__(provider='mpc', collection='sentinel-2-l2a', key_sat='s2', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12', 'SCL'])

Initialize the attributes of StacAttack.

fixS2shift(shiftval=-1000, minval=1, **kwargs)

Fix Sentinel-2 radiometric offset applied since the ESA Processing Baseline 04.00. For more information: https://sentinels.copernicus.eu/web/sentinel/-/copernicus-sentinel-2-major-products-upgrade-upcoming

Parameters:
  • shiftval (int) – radiometric offset value. Defaults to -1000.

  • minval (int) – minimum radiometric value. Defaults to 1.

  • **kwargs – other arguments

Returns: StacAttack.image with corrected radiometric values.

gapfill(method='linear', first_last=True, **kwargs)

Gap-fill NaN pixel values through the satellite time-series.

Parameters:
  • method (string, optional) – method to use for interpolation (see xarray.DataArray.interpolate_na). Defaults to ‘linear’.

  • first_last (bool, optional) – Interpolation of the first and last image of the satellite time-series with xarray.DataArray.bfill and xarray.DataArray.ffill. Defaults to True.

  • **kwargs – other arguments of xarray.DataArray.interpolate_na.

Example

>>> stacObj.gapfill()
loadCube(bbox, arrtype='image', dimx=5, dimy=5, resolution=10, crs_out=3035)

Load images according to a bounding box, with in option predefined pixels dimensions (x, y).

Parameters:
  • bbox (list) – coordinates of bounding box [xmin, ymin, xmax, ymax] in the output crs unit.

  • (string (arrtype) – xarray dataset name. Defaults to ‘image’. Can be one of the following: ‘patch’, ‘image’, ‘masked’.

  • optional – xarray dataset name. Defaults to ‘image’. Can be one of the following: ‘patch’, ‘image’, ‘masked’.

  • dimx (int, optional) – number of pixels in columns. Defaults to 5.

  • dimy (int, optional) – number of pixels in rows. Defaults to 5.

  • resolution (float, optional) – spatial resolution (in crs unit). Defaults to 10.

  • crs_out (int, optional) – CRS of output coordinates. Defaults to 3035.

Returns:

geobox object StacAttack.geobox. xarray.Dataset: time-series image StacAttack.cube.

Return type:

odc.geo.geobox.GeoBox

Example

>>> aoi_bounds = [0, 0, 1, 1]
>>> stacObj.loadCube(aoi_bounds, arrtype='patch', dimx=10, dimy=10)
mask(mask_array=None, mask_band='SCL', mask_values=[3, 8, 9, 10])

Load binary mask.

Parameters:
  • mask_array (xarray.Dataarray, optional) – xarray.dataarray binanry mask (with same dimensions as StacAttack.cube). Defaults to None.

  • mask_band (string, optional) – band name used as a mask (i.e. ‘SCL’ for Sentinel-2). Defaults to ‘SCL’.

  • mask_values (list, optional) – band values related to masked pixels. Defaults to [3, 8, 9, 10].

Returns:

time-series of binary masks StacAttack.mask

Return type:

xarray.Dataarray

Example

>>> stacObj.mask()
mask_apply()

Apply mask pre-loaded as StacAttack.mask on the satellite time-series StacAttack.cube.

Example

>>> stacObj.mask()
>>> stacObj.mask_apply()
searchItems(bbox_latlon, date_start=datetime.datetime(2023, 1, 1, 0, 0), date_end=datetime.datetime(2023, 12, 31, 0, 0), **kwargs)

Get list of stac collection’s items.

Parameters:
  • bbox_latlon (list) – coordinates of bounding box.

  • date_start (datetime.datetime, optional) – start date. Defaults to ‘2023-01’.

  • date_end (datetime.datetime, optional) – end date. Defaults to ‘2023-12’.

  • **kwargs – others stac compliant arguments.

Returns:

list of stac collection items StacAttack.items.

Return type:

pystac.ItemCollection

Example

>>> stacObj.searchItems(aoi_bounds_4326)
spectral_index(indices_to_compute: str | list[str], band_mapping: dict = None, **kwargs)

Calculate various spectral indices for remote sensing data using the spyndex and awesome-spectral-indices libraries.

Parameters:
  • dataset (xr.Dataset) – The xarray.Dataset containing spectral bands.

  • band_mapping (dict, optional) – A dictionary to map your dataset’s band names to spyndex’s standard band names (e.g., {‘R’: ‘B04’, ‘N’: ‘B08’}). If None, it assumes your dataset’s variable names are directly usable by spyndex.

  • **kwargs – other arguments

Returns:

time-series image StacAttack.indices.

Return type:

xarray.Dataarray

Example

>>> stacObj.spectral_index('NDVI', {'R': 'B04', 'N': 'B08'})
to_csv(outdir, gid=None, id_point='station_id')

Convert xarray dataset into csv file.

Parameters:
  • outdir (str) – output directory.

  • gid (str, optional) – column name of ID. Defaults to None.

Example

>>> outdir = 'output'
>>> stacObj.to_csv(outdir)
to_nc(outdir, gid=None)

Convert xarray dataset into netcdf file.

Parameters:
  • outdir (str) – output directory.

  • gid (str, optional) – column name of ID. Defaults to None.

  • array_type (str, optional) – xarray dataset name. Defaults to ‘image’. Can be one of the following: ‘patch’, ‘image’, ‘masked’.

Example

>>> outdir = 'output'
>>> stacObj.to_nc(outdir)

The Labels class

class sits.Labels(geolayer)

Bases: object

This class aims to produce a image of labels from a vector file.

Parameters:

geolayer (str or geodataframe) – vector layer to rasterize.

Returns:

geodataframe Labels.gdf.

Return type:

GeoDataFrame

Example

>>> geodataframe = <gdf object>
>>> vlayer = Labels(geodataframe)
>>> vector_file = 'myVector.shp'
>>> vlayer = Labels(vector_file)
__init__(geolayer)

Initialize the attributes of Labels.

to_raster(id_field, geobox, filename, outdir, ext='tif', driver='GTiff')

Convert geodataframe into raster file while keeping a column attribute as pixel values.

Parameters:
  • id_field (str) – column name to keep as pixels values.

  • geobox (odc.geo.geobox.GeoBox) – geobox object.

  • filename (str) – output raster filename.

  • outdir (str) – output directory.

  • ext (str, optional) – raster file extension. Defaults to “tif”.

  • driver (str, optional) – output raster format (gdal standard). Defaults to “GTiff”.

Example

>>> bbox = [0, 0, 1, 1]
>>> crs_out = 3035
>>> resolution = 10
>>> geobox = def_geobox(bbox, crs_out, resolution)
>>> vlayer.to_raster('id', geobox, 'output_img', 'output_dir')

The Multiproc class

class sits.Multiproc(array_type, fext, outdir)

Bases: object

This class aims to parallelize the production of images or patches.

Parameters:
  • array_type (str) – xarray dataset name. Can be one of the following: ‘patch’, ‘image’.

  • fext (str) – output file format: Can be one of the following: ‘nc’, ‘csv’

  • outdir (str) – output directory.

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
__init__(array_type, fext, outdir)

Initialize the attributes of Multiproc.

addParams_gapfill(method='linear', first_last=True, **kwargs)

Add optional parameters for StacAttack.gapfill() called through Multiproc.fetch_func().

Parameters:
  • method (string, optional) – method to use for interpolation (see xarray.DataArray.interpolate_na). Defaults to ‘linear’.

  • first_last (bool, optional) – Interpolation of the first and last image of the satellite time-series with xarray.DataArray.bfill and xarray.DataArray.ffill. Defaults to True.

  • **kwargs – other arguments of xarray.DataArray.interpolate_na.

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.addParams_gapfill(method='nearest', first_last=False):
addParams_loadCube(dimx=5, dimy=5, resolution=10, crs_out=3035)

Add optional parameters for StacAttack.loadCube() called through Multiproc.fetch_func().

Parameters:
  • dimx (int, optional) – number of pixels in columns. Defaults to 5.

  • dimy (int, optional) – number of pixels in rows. Defaults to 5.

  • resolution (float, optional) – spatial resolution (in crs unit). Defaults to 10.

  • crs_out (int, optional) – CRS of output coordinates. Defaults to 3035.

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.addParams_loadCube(dimx=20, dimy=20):
addParams_mask(mask_array=None, mask_band='SCL', mask_values=[3, 8, 9, 10])

Add optional parameters for StacAttack.mask() called through Multiproc.fetch_func().

Parameters:
  • mask_array (xarray.Dataarray, optional) – xarray.dataarray binanry mask (with same dimensions as StacAttack.cube). Defaults to None.

  • mask_band (string, optional) – band name used as a mask (i.e. ‘SCL’ for Sentinel-2). Defaults to ‘SCL’.

  • mask_values (list, optional) – band values related to masked pixels. Defaults to [3, 8, 9, 10].

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.addParams_mask(mask_values=[0]):
addParams_searchItems(date_start=datetime.datetime(2023, 1, 1, 0, 0), date_end=datetime.datetime(2023, 12, 31, 0, 0), **kwargs)

Add optional parameters for StacAttack.searchItems() called through Multiproc.fetch_func().

Parameters:
  • date_start (datetime.datetime, optional) – start date. Defaults to ‘2023-01’.

  • date_end (datetime.datetime, optional) – end date. Defaults to ‘2023-12’.

  • **kwargs (optional) – others stac compliant arguments, e.g. query parameters to filter according to cloud %.

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.addParams_searchItems(date_start=datetime(2016, 1, 1), query={"eo:cloud_cover": {"lt": 20}})
addParams_stacAttack(provider='mpc', collection='sentinel-2-l2a', key_sat='s2', bands=['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B8A', 'B11', 'B12', 'SCL'])

Add optional parameters for StacAttack class instance called through Multiproc.fetch_func().

Parameters:
  • provider (str, optional) – stac provider. Defaults to ‘mpc’. Can be one of the following: ‘mpc’ (Microsoft Planetary Computer), ‘aws’ (Amazon Web Services).

  • collection (str, optional) – stac collection. Defaults to ‘sentinel-2-l2a’.

  • bands (list, optional) – name of the field describing Y coordinates. Defaults to [‘B02’, ‘B03’, ‘B04’, ‘B05’, ‘B06’, ‘B07’, ‘B08’, ‘B8A’, ‘B11’, ‘B12’, ‘SCL’]

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.addParams_stacAttack(bands=['B02', 'B03', 'B04'])
addParams_to_raster(ext='tif', driver='GTiff')

Add optional parameters for Labels.to_raster() called through Multiproc.fetch_func().

Parameters:
  • ext (str, optional) – raster file extension. Defaults to “tif”.

  • driver (str, optional) – output raster format (gdal standard). Defaults to “GTiff”.

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.addParams_to_raster(driver="COG")
add_label(geolayer, id_field)

Export an image of labels with the same dimensions than the datacube, by calling the method Labels.to_raster().

Parameters:
  • geolayer (GeoDataFrame) – vector file.

  • id_field (str) – attribute field name.

Example

>>> mproc = Multiproc('patch', 'nc', 'output')
>>> mproc.add_label(vlayer, 'myfield')
dask_compute(scheduler_type='processes')

Call of dask.compute to trigger the actual execution of delayed tasks (i.e. Multiproc.fetch_dask), gathering their results into a final output.

Parameters:

scheduler_type (str) –

type of scheduler. Defaults to ‘processes’. Can be one of the following: - Single-threaded Scheduler ‘single-threaded’ or ‘sync’:

  • Runs computations in a single thread without parallelism.

  • Suitable for debugging or when parallelism isn’t required.

  • Threaded Scheduler ‘threads’:
    • Utilizes a pool of threads to execute tasks concurrently.

    • Good for I/O-bound tasks and when tasks release the Global Interpreter Lock (GIL).

  • Multiprocessing Scheduler ‘processes’:
    • Uses a pool of separate processes to execute tasks in parallel.

    • Suitable for CPU-bound tasks and when tasks are limited by the GIL.

  • Distributed Scheduler ‘distributed’:
    • Uses a distributed cluster to execute tasks.

    • Best for large-scale computations across multiple machines.

Example

>>> mproc.dask_compute()
del_func()

Clear Multiproc.fetch_dask, the list of dask.delayed function’s instances.

fetch_func(aoi_latlong, aoi_proj, gid, mask=False, gapfill=False, **kwargs)

Call of dask.delayed to convert the Multiproc.__fdask() function into a delayed object, allowing for lazy evaluation and parallel execution, thus optimizing computational workflows.

Parameters:
  • aoi_latlong (list) – coordinates of bounding box.

  • aoi_proj (list) – coordinates of bounding box [xmin, ymin, xmax, ymax] in the output crs.

  • gid (int) – image/patch index.

  • **kwargs (dict) – additional arguments (i.e. StacAttack.searchItems(), StacAttack.loadImgs(), StacAttack.loadPatches()).

Returns:

list of dask.delayed function’s instances.

Return type:

Multiproc.fetch_dask

Example

>>> for bboxes, gid in enumerate(my_df['bboxes']):
        mproc.fetch_func(bboxes[0], bboxes[1], gid)