
<div align="center">

  <h1>apiprompting: <u>A</u>ttention <u>P</u>rompting on <u>I</u>mage for Large Vision-Language Models</h1>

  <br>

  [![version](https://badge.fury.io/py/apiprompting.svg)](https://badge.fury.io/py/apiprompting)
  [![license](https://img.shields.io/pypi/l/apiprompting)](https://github.com/rp-yu/apiprompting/blob/main/LICENSE)
  [![python-version](https://img.shields.io/pypi/pyversions/apiprompting)](https://badge.fury.io/py/apiprompting)
  [![Gradio](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/rp-yu/apiprompting)

</div>

## 👋 hello

Package for an easy implementation of [Attention Prompting on Image for Large Vision-Language Models]().

<!-- Package for an easy implementation of paper "Attention Prompting on Image for Large Vision-Language Models". -->
## 💻 install

```bash
pip install apiprompting
```

## 📄 Quick Start

#### `clip_api`

  Generates image masks and blends them using CLIP_Based API.

  **Parameters**

  - **images** (`list`): 
    list of images. Each item can be a path to image (`str`) or a `PIL.Image`.
    
  - **queries** (`list`): 
    list of queries. Each item is a `str`.

  - **batch_size** (`int`): 
    Batch size for processing images. Default is 8.

  - **model_name** (`str`):  
    Name of the model to load the pretrained model. Available options include `"ViT-L-14-336"`, `"ViT-L-14"`, and `"ViT-B-32"`.

  - **layer_index** (`int`, optional, default=22):  
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  - **enhance_coe** (`int`, optional, default=10):  
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  - **kernel_size** (`int`, optional, default=3):  
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  - **interpolate_method_name** (`str`, optional, default="LANCZOS"):  
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by `PIL.Image.resize`, such as `"NEAREST"`, `"BILINEAR"`, `"BICUBIC"`, `"LANCZOS"`, etc.

  - **grayscale** (`float`, optional, default=0):  
    A flag indicating whether to convert the image to grayscale. A value of `0` means no grayscale conversion, while a value of `1` will convert the image to grayscale.

  **Returns**

  - **list**:  
    A list containing the masked images generated by the function. Each item is a PIL.Image.

#### `llava_api`

  Generates image masks and blends them using the LLaVA_Based API.

  **Parameters**

  - **images** (`list`): 
    list of images. Each item can be a path to image (`str`) or a `PIL.Image`.
    
  - **queries** (`list`): 
    list of queries. Each item is a `str`.

  - **batch_size** (`int`): 
    Batch size for processing images. Only support 1.

  - **model_name** (`str`):  
    Name of the model to load the pretrained model. One of "llava-v1.5-7b" and "llava-v1.5-13b".

  - **layer_index** (`int`, optional, default=20):  
    Index of the layer in the model to hook. This is where the feature extraction occurs.

  - **enhance_coe** (`int`, optional, default=10):  
    Enhancement coefficient for mask blending, which determines the strength of the enhancement applied to the generated masks.

  - **kernel_size** (`int`, optional, default=3):  
    Kernel size for mask blending, which should be an odd number. This determines the size of the convolution kernel used in blending.

  - **interpolate_method_name** (`str`, optional, default="LANCZOS"):  
    Name of the interpolation method used for image resizing. It can be any interpolation method supported by `PIL.Image.resize`, such as `"NEAREST"`, `"BILINEAR"`, `"BICUBIC"`, `"LANCZOS"`, etc.

  - **grayscale** (`float`, optional, default=0):  
    A flag indicating whether to convert the image to grayscale. A value of `0` means no grayscale conversion, while a value of `1` will convert the image to grayscale.

  **Returns**

  - **list**:  
    A list containing the masked images generated by the function. Each item is a PIL.Image.

#### Example

```python
from apiprompting import clip_api, llava_api

images, queries = ["path/to/image"], ["query"]

# CLIP_Based API
masked_images = clip_api(images, queries, model_name="ViT-L-14-336")
# LLaVA_Based API
masked_images = llava_api(images, queries, model_name="llava-v1.5-13b")
```

## 💜 acknowledgement

The README file is adopted from [here](https://pypi.org/project/setofmark/).

## 🦸 contribution

We would love your help in making this repository even better! If you noticed any bug, 
or if you have any suggestions for improvement, feel free to open an 
[issue](https://github.com/yu-rp/apiprompting/issues) or submit a 
[pull request](https://github.com/yu-rp/apiprompting/pulls).
