# Pachyderm's Python SDK

Official Python client/SDK for Pachyderm.
The successor to https://github.com/pachyderm/python-pachyderm.

This library provides the autogenerated gRPC/protobuf code for Pachyderm,
  generated using [a fork of the betterproto package](https://github.com/pachyderm/python-betterproto),
  along with higher-level functionality.

## Installation
```bash
pip install pachyderm_sdk
```

## A Small Taste
Here's an example that creates a repo and adds a file:
```python
from pachyderm_sdk import Client
from pachyderm_sdk.api import pfs

# Connects to a pachyderm cluster using your local config
#   at ~/.pachyderm/config.json
client = Client.from_config()

# Creates a pachyderm repo called `test`
repo = pfs.Repo(name="test")
client.pfs.create_repo(repo=repo)

# Create a new commit in `test@master` and upload a file.
branch = pfs.Branch.from_uri("test@master")
with client.pfs.commit(branch=branch) as commit:
    file = commit.put_file_from_bytes(path="/data/file.dat", data=b"DATA")

# Retrieve the uploaded file.
with client.pfs.pfs_file(file) as f:
    print(f.readall())
```

How to load a CAST file into a pandas dataframe
```python
from pachyderm_sdk import Client
from pachyderm_sdk.api import pfs
import pandas as pd

client = Client.from_config()
file = pfs.File.from_uri("test@master:/path/to/data.csv")
with client.pfs.pfs_file(file) as f:
    df = pd.read_csv(f)
```

## Changes from Python-Pachyderm
This package is a successor to the python-pachyderm package.
Listed below are some of the notable changes:
1. Organization of the API
    * Methods and Message objects are now organized according to the 
      service they are associated with, i.e. auth, pfs (pachyderm file-system),
      pps (pachyderm pipelining-system).
    * Message objects can be found within their respective submodule of the
      `pachyder_sdk.api` module, i.e. `pachyderm_sdk.api.pfs`.
    * Methods can be found within their respective attribute of the `Client`
      class, i.e. `client.pps.create_pipeline`.
      * Some methods have been renamed to remove redundancy due to this organization, i.e.
        `python_pachyderm.Client.get_enterprise_state` -> `pachyderm_sdk.Client.enterprise.get_state`
2. The autogenerated code is generated using a fork of the betterproto compiler.
    * Messages are now python dataclasses.
    * Methods require keyword arguments.
    * Pachyderm resources are specified using types.
      - python-pachyderm (old): `client.create_repo("test")`
      - pachyderm_sdk (new): `client.pfs.create_repo(repo=pfs.Repo(name="test"))`

## Contributing
Please see [the contributing guide](./CONTRIBUTING.md) for more info (including testing instructions)


## Developer Guide

Generate python APIs from protobuf:
```
./generate-protos.sh
```

Generate HTML documentation (writes to docs/pachyderm_sdk):
```bash
make docs
```

Running Tests:

```
pytest -vvv tests
```