Metadata-Version: 2.1
Name: apiprompting
Version: 0.1.0rc2
Summary: Package for an easy implementation of paper "Attention Prompting on Image for Large Vision-Language Models".
License: MIT
Author: runpeng yu
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: accelerate (==0.21.0)
Requires-Dist: apillava (==0.1.0)
Requires-Dist: bitsandbytes (==0.41.0)
Requires-Dist: einops (==0.6.1)
Requires-Dist: einops-exts (==0.0.4)
Requires-Dist: fastapi (>=0.112.2,<0.113.0)
Requires-Dist: ftfy (==6.2.3)
Requires-Dist: gradio (==3.35.2)
Requires-Dist: gradio_client (==0.2.9)
Requires-Dist: h5py (==3.11.0)
Requires-Dist: httpx (==0.24.0)
Requires-Dist: imageio (==2.35.1)
Requires-Dist: markdown2[all] (>=2.5.0,<3.0.0)
Requires-Dist: numpy (==1.24.0)
Requires-Dist: opencv-python (==4.10.0.84)
Requires-Dist: peft (==0.4.0)
Requires-Dist: pydantic (>=1,<2)
Requires-Dist: regex (==2024.7.24)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Requires-Dist: scikit-image (==0.24.0)
Requires-Dist: scikit-learn (==1.2.2)
Requires-Dist: scipy (==1.14.1)
Requires-Dist: sentencepiece (==0.1.99)
Requires-Dist: shortuuid (==1.0.13)
Requires-Dist: timm (==0.6.13)
Requires-Dist: tokenizers (>=0.12.1,<0.14)
Requires-Dist: torch (==2.0.1)
Requires-Dist: torchvision (==0.15.2)
Requires-Dist: transformers (==4.31.0)
Requires-Dist: uvicorn (>=0.30.6,<0.31.0)
Description-Content-Type: text/markdown


<div align="center">

  <h1>apiprompting: <u>A</u>ttention <u>P</u>rompting on <u>I</u>mage for Large Vision-Language Models</h1>

  <br>

  [![version](https://badge.fury.io/py/apiprompting.svg)](https://badge.fury.io/py/apiprompting)
  [![license](https://img.shields.io/pypi/l/apiprompting)](https://github.com/rp-yu/apiprompting/blob/main/LICENSE)
  [![python-version](https://img.shields.io/pypi/pyversions/apiprompting)](https://badge.fury.io/py/apiprompting)
  [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/rp-yu/apiprompting)

</div>

## 👋 hello

Package for an easy implementation of [Attention Prompting on Image for Large Vision-Language Models]().

<!-- Package for an easy implementation of paper "Attention Prompting on Image for Large Vision-Language Models". -->
## 💻 install

```bash
pip install apiprompting
```

## 📄 Quick Start

#### `clip_api`

  Generates image masks and blends them using CLIP_Based API.

  **Parameters**

  - **images** (`list`): 
    list of images. Each item can be a path to image (`str`) or a `PIL.Image`.
    
  - **queries** (`list`): 
    list of queries. Each item is a `str`.

  - **batch_size** (`int`): 
    Batch size for processing images. Default is 8.

  - **model_name** (`str`):  
    Name of the model to load the pretrained model. Available options include `"ViT-L-14-336"`, `"ViT-L-14"`, and `"ViT-B-32"`.

  - **layer_index** (`int`, optional, default=22):  
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  - **enhance_coe** (`int`, optional, default=10):  
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  - **kernel_size** (`int`, optional, default=3):  
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  - **interpolate_method_name** (`str`, optional, default="LANCZOS"):  
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by `PIL.Image.resize`, such as `"NEAREST"`, `"BILINEAR"`, `"BICUBIC"`, `"LANCZOS"`, etc.

  - **grayscale** (`float`, optional, default=0):  
    A flag indicating whether to convert the image to grayscale. A value of `0` means no grayscale conversion, while a value of `1` will convert the image to grayscale.

  **Returns**

  - **list**:  
    A list containing the masked images generated by the function. Each item is a PIL.Image.

#### `llava_api`

  Generates image masks and blends them using the LLaVA_Based API.

  **Parameters**

  - **images** (`list`): 
    list of images. Each item can be a path to image (`str`) or a `PIL.Image`.
    
  - **queries** (`list`): 
    list of queries. Each item is a `str`.

  - **batch_size** (`int`): 
    Batch size for processing images. Only support 1.

  - **model_name** (`str`):  
    Name of the model to load the pretrained model. One of "llava-v1.5-7b" and "llava-v1.5-13b".

  - **layer_index** (`int`, optional, default=20):  
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  - **enhance_coe** (`int`, optional, default=10):  
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  - **kernel_size** (`int`, optional, default=3):  
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  - **interpolate_method_name** (`str`, optional, default="LANCZOS"):  
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by `PIL.Image.resize`, such as `"NEAREST"`, `"BILINEAR"`, `"BICUBIC"`, `"LANCZOS"`, etc.

  - **grayscale** (`float`, optional, default=0):  
    A flag indicating whether to convert the image to grayscale. A value of `0` means no grayscale conversion, while a value of `1` will convert the image to grayscale.

  **Returns**

  - **list**:  
    A list containing the masked images generated by the function. Each item is a PIL.Image.

#### Example

```python
from apiprompting import clip_api, llava_api

images, queries = ["path/to/image"], ["query"]

# CLIP_Based API
masked_images = clip_api(images, queries, model_name="ViT-L-14-336")
# LLaVA_Based API
masked_images = llava_api(images, queries, model_name="llava-v1.5-13b")
```

## 💜 acknowledgement

The README file is adopted from [here](https://pypi.org/project/setofmark/).

## 🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, 
or if you have any suggestions for improvement, feel free to open an 
[issue](https://github.com/yu-rp/apiprompting/issues) or submit a 
[pull request](https://github.com/yu-rp/apiprompting/pulls).

