Metadata-Version: 2.4
Name: pycachy
Version: 0.0.1
Summary: Cache your API calls and make your notebooks fast again
Home-page: https://github.com/AnswerDotAI/cachy
Author: Tommy
Author-email: tc@answer.ai
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastcore
Requires-Dist: httpx
Provides-Extra: dev
Requires-Dist: openai; extra == "dev"
Requires-Dist: anthropic; extra == "dev"
Requires-Dist: litellm; extra == "dev"
Requires-Dist: nbdev; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# cachy


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

We often call APIs while prototyping and testing our code. A single API
call (e.g. an Anthropic chat completion) can take 100’s of ms to run.
This can really slow down development especially if our notebook
contains many API calls 😞.

`cachy` caches API requests. It does this by saving the result of each
call to a local `cachy.jsonl` file. Before calling an API (e.g. OpenAI)
it will check if the request exists in `cachy.jsonl`. If it does it will
return the cached result.

**How does it work?**

Under the hood popular SDK’s like OpenAI, Anthropic and LiteLLM use
`httpx.Client` and `httpx.AsyncClient`.

`cachy` patches the `send` method of both clients and injects a simple
caching mechanism:

- create a cache key from the request
- if the key exists in `cachy.jsonl` return the cached response
- if not, call the API and save the response to `cachy.jsonl`

## Usage

To use `cachy`

- install the package: `pip install pycachy`
- add the snippet below to the top of your notebook

``` python
from cachy import enable_cachy

enable_cachy()
```

By default `cachy` will cache requests made to OpenAI, Anthropic, Gemini
and DeepSeek.

*Note: Gemini caching only works via the LiteLLM SDK.*

> [!NOTE]
>
> ### Custom APIs
>
> If you’re using the OpenAI or LiteLLM SDK for other LLM providers like
> Grok, Mistral you can cache these requests as shown below.
>
> ``` python
> from cachy import enable_cachy, doms
> enable_cachy(doms=doms+('api.x.ai', 'api.mistral.com'))
> ```

## Docs

Docs can be found hosted on this GitHub
[repository](https://github.com/AnswerDotAI/cachy)’s
[pages](https://AnswerDotAI.github.io/cachy/).

## How to use

First import and enable cachy

``` python
from cachy import enable_cachy
```

``` python
enable_cachy()
```

Now run your api calls as normal.

``` python
from openai import OpenAI
```

``` python
cli = OpenAI()
```

``` python
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r
```

Hey! How can I help you today? 😊

<details>

- id: resp_68b9978ecec48196aa3e77b09ed41c6403f00c61bc19c097
- created_at: 1756993423.0
- error: None
- incomplete_details: None
- instructions: None
- metadata: {}
- model: gpt-4.1-2025-04-14
- object: response
- output:
  \[ResponseOutputMessage(id=‘msg_68b9978f9f70819684b17b0f21072a9003f00c61bc19c097’,
  content=\[ResponseOutputText(annotations=\[\], text=‘Hey! How can I
  help you today? 😊’, type=‘output_text’, logprobs=\[\])\],
  role=‘assistant’, status=‘completed’, type=‘message’)\]
- parallel_tool_calls: True
- temperature: 1.0
- tool_choice: auto
- tools: \[\]
- top_p: 1.0
- background: False
- conversation: None
- max_output_tokens: None
- max_tool_calls: None
- previous_response_id: None
- prompt: None
- prompt_cache_key: None
- reasoning: Reasoning(effort=None, generate_summary=None, summary=None)
- safety_identifier: None
- service_tier: default
- status: completed
- text: ResponseTextConfig(format=ResponseFormatText(type=‘text’),
  verbosity=‘medium’)
- top_logprobs: 0
- truncation: disabled
- usage: ResponseUsage(input_tokens=9,
  input_tokens_details=InputTokensDetails(cached_tokens=0),
  output_tokens=11,
  output_tokens_details=OutputTokensDetails(reasoning_tokens=0),
  total_tokens=20)
- user: None
- store: True

</details>

If you run the same request again it will read it from the cache.

``` python
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r
```

Hey! How can I help you today? 😊

<details>

- id: resp_68b9978ecec48196aa3e77b09ed41c6403f00c61bc19c097
- created_at: 1756993423.0
- error: None
- incomplete_details: None
- instructions: None
- metadata: {}
- model: gpt-4.1-2025-04-14
- object: response
- output:
  \[ResponseOutputMessage(id=‘msg_68b9978f9f70819684b17b0f21072a9003f00c61bc19c097’,
  content=\[ResponseOutputText(annotations=\[\], text=‘Hey! How can I
  help you today? 😊’, type=‘output_text’, logprobs=\[\])\],
  role=‘assistant’, status=‘completed’, type=‘message’)\]
- parallel_tool_calls: True
- temperature: 1.0
- tool_choice: auto
- tools: \[\]
- top_p: 1.0
- background: False
- conversation: None
- max_output_tokens: None
- max_tool_calls: None
- previous_response_id: None
- prompt: None
- prompt_cache_key: None
- reasoning: Reasoning(effort=None, generate_summary=None, summary=None)
- safety_identifier: None
- service_tier: default
- status: completed
- text: ResponseTextConfig(format=ResponseFormatText(type=‘text’),
  verbosity=‘medium’)
- top_logprobs: 0
- truncation: disabled
- usage: ResponseUsage(input_tokens=9,
  input_tokens_details=InputTokensDetails(cached_tokens=0),
  output_tokens=11,
  output_tokens_details=OutputTokensDetails(reasoning_tokens=0),
  total_tokens=20)
- user: None
- store: True

</details>
