Metadata-Version: 2.4
Name: synthegrator
Version: 0.13.2.2
Summary: Framework for code synthesis and AI4SE research
Author: David Gros, Claudio Spiess
License-Expression: MIT
Project-URL: Homepage, https://github.com/DaiseyCode/synthegrator
Keywords: code synthesis,llm
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lmwrapper[hf]<0.18,>=0.17
Requires-Dist: numpy<2.3,>=1.24.3
Requires-Dist: Pygments<3.0,>=2.15.1
Requires-Dist: tqdm<5.0,>=4.65.0
Requires-Dist: datasets<3.2,>=3.1
Requires-Dist: diskcache<6.0,>=5.6.3
Requires-Dist: libcst<2.0,>=1.0.1
Requires-Dist: tree-sitter==0.23.2
Requires-Dist: tree-sitter-language-pack==0.9.0
Requires-Dist: pytest~=8.4.1
Requires-Dist: lxml>=4.9.3
Requires-Dist: xxhash<4.0,>=3.3.0
Requires-Dist: typeguard<5.0,>=4.1
Requires-Dist: rank-bm25<0.3,>=0.2.2
Requires-Dist: docker<8,>=7.1
Requires-Dist: python-dateutil>=2.4
Requires-Dist: requests>=2.14.2
Requires-Dist: structlog>=15.3
Provides-Extra: dev
Requires-Dist: ruff>=0.2.2; extra == "dev"
Requires-Dist: pytest-cov~=4.1.0; extra == "dev"
Dynamic: license-file

# Synthegrator

Synthegrator is a framework for code generation problems. It simplifies
the process of loading common datasets and solving them with language models.

# Installation
```bash
pip install synthegrator
```

Also, for execution you will need to [install docker](https://docs.docker.com/engine/install/).


# Example
Let's take a look at an example of how we can run a solver over
the HumanEval dataset, which collects 164 function synthesis problems.

```python
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df

# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())

# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
#    ^ Make sure to add your API key to OPENAI_API_KEY or a file. 
#    See https://github.com/DaiseyCode/lmwrapper for more.
solver = LmCodeSolverAutoRegressive(lm)

# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
    solver=solver,
    problems=problems,
    max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
    evals, 
    pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
```

# Architecture
## Guiding Design Requirements
- DR-1 **Support Diverse Datasets and Tasks.** We want an architecture that can
support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 **Consistent & Efficient Execution.** Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 **Adaptable to State-of-the-Art Models.** This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models
that might do complex retrieval or reasoning
- DR-4 **Maintainable.** Try to follow best practices around automated testing and continuous integration.

## Diagram
![Alt synthegrator diagram](https://rb2xb7.s3.amazonaws.com/synthegrator.png)
