Metadata-Version: 2.4
Name: askui
Version: 0.16.0
Summary: Automate computer tasks in Python
Author-email: askui GmbH <info@askui.com>
License: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.54.0
Requires-Dist: asyncer==0.0.8
Requires-Dist: bson>=0.5.10
Requires-Dist: fastapi>=0.115.12
Requires-Dist: fastmcp>=2.3.0
Requires-Dist: filetype>=1.2.0
Requires-Dist: google-genai>=1.20.0
Requires-Dist: gradio-client>=1.4.3
Requires-Dist: grpcio>=1.73.1
Requires-Dist: httpx>=0.28.1
Requires-Dist: jinja2>=3.1.4
Requires-Dist: jsonref>=1.1.0
Requires-Dist: markitdown[docx,xls,xlsx]>=0.1.2
Requires-Dist: openai>=1.61.1
Requires-Dist: pillow>=11.0.0
Requires-Dist: protobuf>=6.31.1
Requires-Dist: py-machineid>=0.7.0
Requires-Dist: pydantic-settings>=2.9.1
Requires-Dist: pydantic>=2.11.0
Requires-Dist: pyperclip>=1.9.0
Requires-Dist: python-dateutil>=2.9.0.post0
Requires-Dist: requests>=2.32.3
Requires-Dist: rich>=13.9.4
Requires-Dist: segment-analytics-python>=2.3.4
Requires-Dist: tenacity>=9.1.2
Provides-Extra: all
Requires-Dist: anyio>=4.10.0; extra == 'all'
Requires-Dist: mss>=10.0.0; extra == 'all'
Requires-Dist: playwright>=1.41.0; extra == 'all'
Requires-Dist: pure-python-adb>=0.3.0.dev0; extra == 'all'
Requires-Dist: pynput>=1.8.1; extra == 'all'
Requires-Dist: uvicorn>=0.34.3; extra == 'all'
Provides-Extra: android
Requires-Dist: pure-python-adb>=0.3.0.dev0; extra == 'android'
Provides-Extra: chat
Requires-Dist: anyio>=4.10.0; extra == 'chat'
Requires-Dist: playwright>=1.41.0; extra == 'chat'
Requires-Dist: pure-python-adb>=0.3.0.dev0; extra == 'chat'
Requires-Dist: uvicorn>=0.34.3; extra == 'chat'
Provides-Extra: pynput
Requires-Dist: mss>=10.0.0; extra == 'pynput'
Requires-Dist: pynput>=1.8.1; extra == 'pynput'
Provides-Extra: web
Requires-Dist: playwright>=1.41.0; extra == 'web'
Description-Content-Type: text/markdown

# 🤖 AskUI Vision Agent

[![Release Notes](https://img.shields.io/github/release/askui/vision-agent?style=flat-square)](https://github.com/askui/vision-agent/releases)
[![PyPI - License](https://img.shields.io/pypi/l/langchain-core?style=flat-square)](https://opensource.org/licenses/MIT)

**Enable AI agents to control your desktop (Windows, MacOS, Linux), mobile (Android, iOS) and HMI devices**

Join the [AskUI Discord](https://discord.gg/Gu35zMGxbx).

## Table of Contents

- [📖 Introduction](#-introduction)
- [📦 Installation](#-installation)
  - [AskUI Python Package](#askui-python-package)
  - [AskUI Agent OS](#askui-agent-os)
- [🚀 Quickstart](#-quickstart)
    - [🧑 Control your devices](#-control-your-devices)
    - [🤖 Let AI agents control your devices](#-let-ai-agents-control-your-devices)
- [📚 Further Documentation](#-further-documentation)
- [🤝 Contributing](#-contributing)
- [📜 License](#-license)

## 📖 Introduction

AskUI Vision Agent is a powerful automation framework that enables you and AI agents to control your desktop, mobile, and HMI devices and automate tasks. With support for multiple AI models, multi-platform compatibility, and enterprise-ready features,

https://github.com/user-attachments/assets/a74326f2-088f-48a2-ba1c-4d94d327cbdf

**🎯 Key Features**

- Support for Windows, Linux, MacOS, Android and iOS device automation (Citrix supported)
- Support for single-step UI automation commands (RPA like) as well as agentic intent-based instructions
- In-background automation on Windows machines (agent can create a second session; you do not have to watch it take over mouse and keyboard)
- Flexible model use (hot swap of models) and infrastructure for reteaching of models (available on-premise)
- Secure deployment of agents in enterprise environments

## 📦 Installation

### AskUI Python Package

```shell
pip install askui[all]
```

**Requires Python >=3.10**

### AskUI Agent OS

Agent OS is a device controller that allows agents to take screenshots, move the mouse, click, and type on the keyboard across any operating system. It is installed on a Desktop OS but can control also mobile devices and HMI devices connected.

It offers powerful features like

- multi-screen support,
- support for all major operating systems (incl. Windows, MacOS and Linux),
- process visualizations,
- real Unicode character typing
- and more exciting features like application selection, in background automation and video streaming are to be released soon.

<details>
<summary>Windows</summary>

#### AMD64
[AskUI Installer for AMD64](https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Win-AMD64-Web.exe)

#### ARM64
[AskUI Installer for ARM64](https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Win-ARM64-Web.exe)

</details>

<details>
<summary>Linux</summary>
<br>

**⚠️ Warning:** Agent OS currently does not work on Wayland. Switch to XOrg to use it.

#### AMD64
```shell
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run
bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-AMD64-Web.run
```

#### ARM64
```shell
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run
bash /tmp/AskUI-Suite-Latest-User-Installer-Linux-ARM64-Web.run
```

</details>

<details>
<summary>MacOS</summary>
<br>

**⚠️ Warning:** Agent OS currently does not work on MacOS with Intel chips (x86_64/amd64 architecture). Switch to a Mac with Apple Silicon (arm64 architecture), e.g., M1, M2, M3, etc.

#### ARM64
```shell
curl -L -o /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run https://files.askui.com/releases/Installer/Latest/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run
bash /tmp/AskUI-Suite-Latest-User-Installer-MacOS-ARM64-Web.run
```

</details>

## 🚀 Quickstart

### 🧑 Control your devices

Double click where-ever the cursor is currently at:

```python
from askui import VisionAgent

with VisionAgent() as agent:
    agent.click(button="left", repeat=2)
```

By default, the agent works within the context of a display that is selected which defaults to the primary display.

Run the script with `python <file path>`, e.g `python test.py` to see if it works.

### 🤖 Let AI agents control your devices

In order to let AI agents control your devices, you need to be able to connect to an AI model (provider). We host some models ourselves and support several other ones, e.g. Anthropic, OpenRouter, Hugging Face, etc. out of the box. If you want to use a model provider or model that is not supported, you can easily plugin your own (see [Custom Models](docs/custom-models.md)).

For this example, we will us AskUI as the model provider to easily get started.

#### 🔐 Sign up with AskUI

Sign up at [hub.askui.com](https://hub.askui.com) to:
- Activate your **free trial** by signing up (no credit card required)
- Get your workspace ID and access token

#### ⚙️ Configure environment variables

<details>
<summary>Linux & MacOS</summary>

```shell
export ASKUI_WORKSPACE_ID=<your-workspace-id-here>
export ASKUI_TOKEN=<your-token-here>
```
</details>

<details>
<summary>Windows PowerShell</summary>

```shell
$env:ASKUI_WORKSPACE_ID="<your-workspace-id-here>"
$env:ASKUI_TOKEN="<your-token-here>"
```

</details>

#### 💻 Example

```python
from askui import VisionAgent

with VisionAgent(log_level="DEBUG") as agent:
    # Give complex instructions to the agent (may have problems with virtual displays out of the box, so make sure there is no browser opened on a virtual display that the agent may not see)
    agent.act(
        "Look for a browser on the current device (checking all available displays, "
        "making sure window has focus),"
        " open a new window or tab and navigate to https://docs.askui.com"
        " and click on 'Search...' to open search panel. If the search panel is already "
        "opened, empty the search field so I can start a fresh search."
    )
    agent.type("Introduction")
    # Locates elements by text (you can also use images, natural language descriptions, coordinates, etc. to
    # describe what to click on)
    agent.click(
        "Documentation > Tutorial > Introduction",
    )
    first_paragraph = agent.get(
        "What does the first paragraph of the introduction say?"
    )
    print("\n--------------------------------")
    print("FIRST PARAGRAPH:\n")
    print(first_paragraph)
    print("--------------------------------\n\n")
```

Run the script with `python <file path>`, e.g `python test.py`.

**Note:** The `log_level` parameter is set to `DEBUG` to give you a better picture of what is happening. By default, it is set to `INFO` to see less logs.

If you see a lot of logs and the first paragraph of the introduction in the console, congratulations! You've successfully let AI agents control your device to automate a task! If you have any issues, please check the [documentation](https://docs.askui.com/01-tutorials/01-your-first-agent#common-issues-and-solutions) or join our [Discord](https://discord.gg/Gu35zMGxbx) for support.

## 📚 Further Documentation

Aside from our [official documentation](https://docs.askui.com), we also have some additional guides and examples under the [docs](docs) folder that you may find useful, for example:

- **[Chat](docs/chat.md)** - How to interact with agents through a chat
- **[Direct Tool Use](docs/direct-tool-use.md)** - How to use the tools, e.g., clipboard, the Agent OS etc.
- **[Extracting Data](docs/extracting-data.md)** - How to extract data from the screen and documents
- **[MCP](docs/mcp.md)** - How to use MCP servers to extend the capabilities of an agent
- **[Observability](docs/observability.md)** - Logging and reporting
- **[Telemetry](docs/telemetry.md)** - Which data we gather and how to disable it
- **[Using Models](docs/using-models.md)** - How to use different models including how to register your own custom models

## 🤝 Contributing

We'd love your help! Contributions, ideas, and feedback are always welcome. A proper contribution guide is coming soon—stay tuned!


## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
