Python Data Services Engineering Library
========================================

The Data Service Engineering Utility Library (dselib) is a set of utility
functions developed to support the Climate tools (dse_tools), and processing of
raster data (dse_gdal)

dse_tools
---------

The goal of DSE Climate Tools is to eventually produce a set of DTK Climate
files. To do this we start with a raw data set, and through a series of tools
clean and reformat the data. This process involves:

1.  getting a list of files to process,

2.  preforming a series of step on the data,

3.  Generating a set of output files.

The dselib dse_tools provides a set of python classes for preforming many of the
common task for each tool.

Problems that we solved:

1.  input output file paring – for a given output file, associate one or more
    input files needed to build the out put

2.  flexible input file naming conventions using patterns

3.  added support for dealing with netCDF files

4.  Raster Calculations on netcdf or gdal raster files

5.  Humidity calculations tools using temperature and dew point raster

6.  Generation of DTK Climate Files

These tools can also be used demographic files and age distributions from
population raster and age distribution raster

dse_tools classes
-----------------

PathPattern (file: file_selector.py)
------------------------------------

Climate data, depending on the data set, can be very large. Because of this the
data is often temporarily partition into a hierarchical structure, For example a
structure of the ERA5 data set is partitioned by parameter/year/filename, and
the filename consist of a prefix-date-suffix structure.

PathPattern is a python class that allows the user to define a structure, of a
data set, and is used to build file input and output names

FileSelector (file: file_selector.py)
-------------------------------------

FileSelector is a python class that builds a dictionary for required input files
to output files. Required input files can be select by time (vertical
selection), parameter (horizontal selection or both.

NetCDF_Helper (file: netcdf_helper.py)
--------------------------------------

The most common data file format used by DSE is the netCDF. NetCDF_Helper
provides a set of methods for preforming the most common test of netCDF files,
such as getting dimensional values, reading an array of variable values, coping
dimensional information and updating variable values.

Calc_Helper(file: calc_helper.py)
---------------------------------

Calc_Helper provides methods for aggregating, averaging and converting raster
data. For example calculating the daily total for rain using ERA5 data, requires
summing the hourly rain fall data, then converting it from meters to
millimeters, and for temperate, requires calculating the daily average, and
converting it from degrees Kelvin to degrees C

HumidityCalculator (file:humidity_helper.py)
--------------------------------------------

HumiditCalculator provides a set of methods for calculation relative humidity
(rh) from tempera and dewpoint rasters in degree C

DSE_Common (file: dse_common.py
-------------------------------

Miscellaneous functions

\<add classes for DTK Climate files\>

dse_gdal
--------

Working with environmental and demographic data often requires that we work with
geospatial information such as map shape geo located raster information. The
Geospatial Data Abstraction Library (GDAL) provides a large set of tools for
processing and manipulation of geospatial data. These tools have been interfaced
to python with the Python GDAL library.

The dse_gdal library provides a set of wrapper classes that simplify the most
common task such as raster clipping and aggregation of raster data by shape.

GDALHelper(file gdal_helper.py)
-------------------------------

Wrapper class for common gdal operations on shape and raster data such as
opening a file, getting file meta data and projection information.

RasterHelper (file gdal_raster_helper.py)
-----------------------------------------

Wrapper class for clipping and aggregating raster data

Installing dselib:
==================

Prerequists:
------------

Before dselib can be install, you need to install the following items on you’re
your system:

1.  Python 3.6+ 64 bit or greater

2.  GDAL

3.  Shapely

4.  Fiona

5.  pyproj

6.  geopandas

7.  basemap

The procedure for installing these libraries will vary depending on your
operating system and choice of python platform. For the remainder of this
document, we assume that the user has installed either Anaconda Python, or
python.org python on their system. We also assume that the OS is either Windows
(preferably windows 10) or Linux.

Anaconda Python on Windows:
---------------------------

### Installing GDAL

Anaconda Python has a tool like pip (conda), it allows modules to be installed
using a requirements.txt. file.

The requirements_win.txt file is provided to load the other dependent modules
with the following command:

>   conda install --file requirements.txt

see this
[page](https://stackoverflow.com/questions/51042589/conda-version-pip-install-r-requirements-txt-target-lib/51043636)
for more information

Python.org Windows:
-------------------

### Windows:

Installing deslib from Artifactory Staging:

python  -m pip install dselib --index-url=*https://\<user
name\>%40idmod.org:\@packages.idmod.org/api/pypi/idm-pypi-staging/simple* –upgrade

If you want to load the gdal packages separately the pre-compiled packages from
Gohlke have been downloaded to idmod jfrog server
<https://packages.idmod.org/api/pypi/idm-pypi-production/simple>. To install the
perquisite packages execute the following commands:

1.  pip install GDAL==2.4.1 --extra-index-url
    <https://packages.idmod.org/api/pypi/idm-pypi-production/simple>

2.  pip install pyproj==1.9.6 --extra-index-url
    https://packages.idmod.org/api/pypi/idm-pypi-production/simple

3.  pip install Shapely==1.6.4.post2 --extra-index-url
    <https://packages.idmod.org/api/pypi/idm-pypi-production/simple>

4.  pip install Fiona==1.8.6 --extra-index-url
    <https://packages.idmod.org/api/pypi/idm-pypi-production/simple>

5.  pip install geopandas==0.5.0 --extra-index-url
    https://packages.idmod.org/api/pypi/idm-pypi-production/simple

6.  pip install basemap==1.2.1 --extra-index-url
    <https://packages.idmod.org/api/pypi/idm-pypi-production/simple>

>   Note on windows machines you may need run the command prompt as
>   administrator

If there are any problems, you can download the precompiled wheel packages
below:

1.  Shapely -\> Recomend
    ["Shapely-1.6.4.post2-cp36-cp36m-win_amd64.whl"](https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely)
    or newer

2.  Fiona -\> Recommend
    ["Fiona-1.8.6-cp36-cp36m-win_amd64.whl"](https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona)
    or newer

3.  Pyproj -\> Recommend
    ["pyproj-1.9.6.1-cp36-cp36m-win_amd64.whl"](https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyproj)
    or newer

4.  Geopandas -\> Recommend
    ["geopandas-0.5.0-py2.py3-none-any.whl"](https://www.lfd.uci.edu/~gohlke/pythonlibs/#geopandas)
    or newer

5.  Basemap -\> Recommend
    ["basemap-1.2.1-cp36-cp36m-win_amd64.whl"](https://www.lfd.uci.edu/~gohlke/pythonlibs/#basemap)
    or newer

Once download you can:

### Linux systems:

Follow the instructions for installing GDAL on linux found
[here](https://pypi.org/project/GDAL/).

python  -m pip install dselib --index-url=*https://\<user
name\>%40idmod.org:\@packages.idmod.org/api/pypi/idm-pypi-staging/simple* --upgrade

Note, you may need to run these as a sudo root user.

Using GDAL in Python Environments on Linux
==========================================

Instruction found [here](https://pypi.org/project/pygdal/) explain how to set up
GDAL in python environments. Note that you will still need to install GDAL on
your system.

To Uninstall:
=============

use pip uninstall dselib or pip3 uninstall dselib

Note you will need admin privileges to uninstall
