Metadata-Version: 2.1
Name: locopy
Version: 0.5.5
Summary: Loading/Unloading to Amazon Redshift using Python
Author-email: Faisal Dosani <faisal.dosani@capitalone.com>
License: Apache Software License
Project-URL: Homepage, https://github.com/capitalone/locopy
Project-URL: Documentation, https://capitalone.github.io/locopy/
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8.0
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: boto3<=1.28.72,>=1.9.92
Requires-Dist: PyYAML<=6.0.1,>=5.1
Requires-Dist: pandas<=2.1.2,>=0.25.2
Requires-Dist: numpy<=1.26.1,>=1.22.0
Provides-Extra: psycopg2
Requires-Dist: psycopg2-binary>=2.7.7; extra == "psycopg2"
Provides-Extra: pg8000
Requires-Dist: pg8000>=1.13.1; extra == "pg8000"
Provides-Extra: snowflake
Requires-Dist: snowflake-connector-python[pandas]>=2.1.2; extra == "snowflake"
Provides-Extra: docs
Requires-Dist: sphinx; extra == "docs"
Requires-Dist: sphinx_rtd_theme; extra == "docs"
Provides-Extra: tests
Requires-Dist: hypothesis; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"
Provides-Extra: qa
Requires-Dist: pre-commit; extra == "qa"
Requires-Dist: black; extra == "qa"
Requires-Dist: isort; extra == "qa"
Provides-Extra: build
Requires-Dist: build; extra == "build"
Requires-Dist: twine; extra == "build"
Requires-Dist: wheel; extra == "build"
Provides-Extra: edgetest
Requires-Dist: edgetest; extra == "edgetest"
Requires-Dist: edgetest-conda; extra == "edgetest"
Provides-Extra: dev
Requires-Dist: locopy[tests]; extra == "dev"
Requires-Dist: locopy[docs]; extra == "dev"
Requires-Dist: locopy[qa]; extra == "dev"
Requires-Dist: locopy[build]; extra == "dev"

.. image:: https://github.com/capitalone/locopy/workflows/Python%20package/badge.svg
    :target: https://github.com/capitalone/locopy/actions
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
    :target: https://github.com/ambv/black

locopy: Data Load and Copy using Python
========================================

A Python library to assist with ETL processing for:

- Amazon Redshift (``COPY``, ``UNLOAD``)
- Snowflake (``COPY INTO <table>``, ``COPY INTO <location>``)

In addition:

- The library supports Python 3.8 to 3.10
- DB Driver (Adapter) agnostic. Use your favourite driver that complies with
  `DB-API 2.0 <https://www.python.org/dev/peps/pep-0249/>`_
- It provides functionality to download and upload data to S3 buckets, and internal stages (Snowflake)


Quick Installation
==================

.. code-block:: bash

    pip install locopy

or install from conda-forge

.. code-block:: bash

    conda config --add channels conda-forge
    conda install locopy

Installation instructions
-------------------------

A virtual or conda environment is highly recommended

.. code-block:: bash

    $ virtualenv locopy
    $ source locopy/bin/activate
    $ pip install --upgrade setuptools pip
    $ pip install locopy


Python Database API Specification 2.0
-------------------------------------

Rather than using a specific Python DB Driver / Adapter for Postgres (which should supports Amazon
Redshift or Snowflake), ``locopy`` prefers to be agnostic. As an end user you can use any Python
Database API Specification 2.0 package.

The following packages have been tested:

- ``psycopg2``
- ``pg8000``
- ``snowflake-connector-python``

You can use which ever one you prefer by importing the package and passing it
into the constructor input ``dbapi``.



Usage
-----

You need to store your connection parameters in a YAML file (or pass them in directly).
The YAML would consist of the following items:

.. code-block:: yaml

    # required to connect to redshift
    host: my.redshift.cluster.com
    port: 5439
    database: db
    user: userid
    password: password
    ## optional extras for the dbapi connector
    sslmode: require
    another_option: 123



If you aren't loading data, you don't need to have AWS tokens set up.
The Redshift connection (``Redshift``) can be used like this:

.. code-block:: python

    import pg8000
    import locopy

    with locopy.Redshift(dbapi=pg8000, config_yaml="config.yml") as redshift:
        redshift.execute("SELECT * FROM schema.table")
        df = redshift.to_dataframe()
    print(df)


If you want to load data to Redshift via S3, the ``Redshift`` class inherits from ``S3``:

.. code-block:: python

    import pg8000
    import locopy

    with locopy.Redshift(dbapi=pg8000, config_yaml="config.yml") as redshift:
        redshift.execute("SET query_group TO quick")
        redshift.execute("CREATE TABLE schema.table (variable VARCHAR(20)) DISTKEY(variable)")
        redshift.load_and_copy(
            local_file="example/example_data.csv",
            s3_bucket="my_s3_bucket",
            table_name="schema.table",
            delim=",")
        redshift.execute("SELECT * FROM schema.table")
        res = redshift.cursor.fetchall()

    print(res)


If you want to download data from Redshift to a CSV, or read it into Python

.. code-block:: python

    my_profile = "some_profile_with_valid_tokens"
    with locopy.Redshift(dbapi=pg8000, config_yaml="config.yml", profile=my_profile) as redshift:
        ##Optionally provide export if you ALSO want the exported data copied to a flat file
        redshift.unload_and_copy(
            query="SELECT * FROM schema.table",
            s3_bucket="my_s3_bucket",
            export_path="my_output_destination.csv")



Note on tokens
^^^^^^^^^^^^^^

To load data to S3, you will need to be able to generate AWS tokens, or assume the IAM role on a EC2
instance. There are a few options for doing this, depending on where you're running your script and
how you want to handle tokens. Once you have your tokens, they need to be accessible to the AWS
command line interface. See
http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#config-settings-and-precedence
for more information, but you can:

- Populate environment variables ``AWS_ACCESS_KEY_ID``, ``AWS_SECRET_ACCESS_KEY``,
  etc.
- Leverage the AWS credentials file.  If you have multiple profiles configured
  you can either call ``locopy.Redshift(profile="my-profile")``, or set up an
  environment variable ``AWS_DEFAULT_PROFILE``.
- If you are on a EC2 instance you can assume the credentials associated with the IAM role attached.


Advanced Usage
--------------

See the `docs <https://capitalone.github.io/locopy/>`_ for
more detailed usage instructions and examples including Snowflake.


Contributors
------------

We welcome and appreciate your contributions! Before we can accept any contributions, we ask that you please be sure to
sign the `Contributor License Agreement (CLA) <https://cla-assistant.io/capitalone/locopy>`_.

This project adheres to the `Open Source Code of Conduct <https://developer.capitalone.com/resources/code-of-conduct/>`_.
By participating, you are expected to honor this code.


Roadmap
-------

Roadmap details can be found `here <https://github.com/capitalone/locopy/blob/develop/ROADMAP.rst>`_
