Metadata-Version: 2.4
Name: spatial-reasoning
Version: 0.1.4
Summary: A PyPI package for object detection using advanced vision models
Home-page: https://github.com/QasimWani/spatial-reasoning
Author: Qasim Wani
Author-email: Qasim Wani <qasim31wani@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/QasimWani/spatial-reasoning
Project-URL: Bug Tracker, https://github.com/QasimWani/spatial-reasoning/issues
Project-URL: Documentation, https://github.com/QasimWani/spatial-reasoning#readme
Project-URL: Source Code, https://github.com/QasimWani/spatial-reasoning
Keywords: computer vision,object detection,AI,machine learning,OpenAI,Gemini
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Image Recognition
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Requires-Dist: torch>=2.0.0
Requires-Dist: torchvision>=0.15.0
Requires-Dist: Pillow>=9.0.0
Requires-Dist: opencv-python>=4.5.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: requests>=2.25.0
Requires-Dist: python-dotenv>=0.19.0
Requires-Dist: openai>=1.0.0
Requires-Dist: google-genai
Requires-Dist: transformers>=4.30.0
Requires-Dist: datasets>=2.0.0
Requires-Dist: accelerate>=1.6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=4.0.0; extra == "dev"
Requires-Dist: mypy>=0.990; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# Spatial Reasoning

A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.

![Example Results](assets/example_results.png)
*Comparison of detection results across different models - showing the superior performance of the advanced reasoning model*

## Features

- **Multiple Detection Models**: 
  - Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
  - Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
  - Vision Model - GroundingDino + SAM
  - Gemini Model (Google) - Fine-tuned LMM for object detection

- **Tool-Use Reasoning**: Our advanced model uses innovative grid-based reasoning for precise object detection
  
  ![Internal Workings](assets/internal_workings.png)
  *How the advanced reasoning model works under the hood - using grid cells for precise localization*

- **Simple API**: One function for all your detection needs
- **CLI Support**: Command-line interface for quick testing

## Installation

```bash
pip install spatial-reasoning
```

Or install from source:
```bash
git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .
```

### Optional: Flash Attention (for better performance)

For improved performance with transformer models, you can optionally install Flash Attention:

```bash
pip install flash-attn --no-build-isolation
```

Note: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.

## Setup

Create a `.env` file in your project root:

```bash
# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-here
```

Get your API keys:
- OpenAI: https://platform.openai.com/api-keys
- Gemini: https://makersuite.google.com/app/apikey

## Quick Start

### Python API

```python
from spatial_reasoning import detect

# Detect objects in an image
result = detect(
    image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg",  # or image-path
    object_of_interest="farthest scooter in the image",
    task_type="advanced_reasoning_model"
)

# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")

# Save the result
visualized_image.save("output.jpg")
```

### Command Line

```bash
# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person"  # "advanced_reasoning_model" used by default

# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"

# From URL with custom parameters
vision-evals \
  --image-path "https://example.com/image.jpg" \
  --object-of-interest "text in image" \
  --task-type "advanced_reasoning_model" \
  --task-kwargs '{"nms_threshold": 0.7}'
```

### Available Models

- `advanced_reasoning_model` (default) - Best accuracy, uses tool-use reasoning
- `vanilla_reasoning_model` - Faster, standard detection
- `vision_model` - Uses GroundingDino + (optional) SAM2 for segmentation
- `gemini` - Google's Gemini model

## License

MIT License
