jeomon27 on Hacker News

Show HN: I made an open-source alternative of computer-use for windows

Hi HN,

I made Windows-Use, an open-source tool that lets (all LangChain-supported) LLMs execute tasks directly on the Windows desktop using tool calling.

It allows you to build AI agents that can interact directly with GUI elements in Windows apps using natural language. Basically, it acts like a layer between the Windows OS and the AI, making desktop automation much simpler. It uses the coordinates of interactive elements to perform actions, so you don’t have to write a separate script for every task.

The goal was to take care of the hard parts, so others don’t have to:

- Accessing the accessibility tree and preprocessing it to make it LLM-friendly (interactive elements + screenshot).

- Providing solid tools to interact with the desktop (clicking, typing, etc.).

- Creating a reusable agent setup.

I built the first version last September. It only took 2 days to get something working, but improving the speed took much longer. I had to do a lot of experiments and dive deep into how the Windows accessibility tree works. Initially, grounding took around 20 seconds… then, 3 days later, Anthropic released their "computer-use"

Since then, I’ve been steadily improving it; now the grounding time is down to 1.7 seconds, and the toolset has improved a lot.

The vision: Just prompt the agent, and it does the task; no need to worry about how. I call it “vibe automation.”

Demos I’ve made:

- Generate Word docs on any topic: searches the web, writes the content, opens Word, and saves it.

- Book flights on Google Flights using a browser.

- Navigate files in Explorer and open a specific file (e.g., "Open this file in D:\ drive").

- Change the desktop theme from dark to light, like a user would do manually.

You can install it on Windows using:

pip install windows-use

Try it out and let me know how it works for you!

I'm Jeomon George

1jeomon2711mo ago0

Android-MCP: Bridging AI Agents and Android Devices

We've been working on Android-MCP, a lightweight, open-source bridge designed to enable AI agents (specifically large language models) to interact with Android devices. The goal is to allow LLMs to perform real-world tasks like app navigation, UI interaction, and automated QA testing without relying on traditional computer vision pipelines or pre-programmed scripts.

The core idea is to leverage ADB and the Android Accessibility API for native interaction with UI elements. This means an LLM can launch apps, tap, swipe, input text, and read view hierarchies directly. A key feature is that it works with any language model, with vision being optional – there's no need for fine-tuned computer vision models or OCR.

Android-MCP operates as an MCP server and offers a rich toolset for mobile automation, including pre-built tools for gestures, keystrokes, capturing device state, and accessing notifications. We've observed typical latency between actions (e.g., two taps) ranging from 2-5 seconds, depending on device specifications and load.

It supports Android 10+ and is built with Python 3.10+. The project is licensed under the MIT License, and contributions are welcome.

You can find more details, installation instructions, and the source code here: https://github.com/CursorTouch/Android-MCP

We're interested to hear thoughts on how this kind of direct interaction could be applied in various scenarios, particularly in areas like automated testing or accessibility enhancements for LLM-driven applications.

Custom MCP Client

I created an MCP Client from scratch from the API layer, without using the MCP library from Claude, which might seem an anti-pattern, but it's still worth it. Learning how the MCP works under the hood.

The main reason is I'm building an MCP Agent, I'm getting RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

While disconnecting the servers..

So looked for fixes from GitHub, but unfortunately none worked, but I wanted to build it, so I developed the MCP-Client.

https://github.com/Jeomon/MCP-Client

Web-Agent (AI Agent for web automation)

Web Agent is your intelligent browsing companion. It is built to seamlessly navigate websites, interact with dynamic content, perform smart searches, download files, and adapt to ever-changing pages with minimal effort from you. Powered by advanced LLMs and the robust Playwright framework, it transforms complex web tasks into streamlined, automated workflows that boost productivity and save time.

https://github.com/CursorTouch/Web-Agent

We are open to suggestions and contributions to make it way to better.

Show HN: I made an open-source alternative of computer-use for windows

Hi HN,

I made Windows-Use, an open-source tool that lets (all LangChain-supported) LLMs execute tasks directly on the Windows desktop using tool calling.

The goal was to take care of the hard parts, so others don’t have to:

- Accessing the accessibility tree and preprocessing it to make it LLM-friendly (interactive elements + screenshot).

- Providing solid tools to interact with the desktop (clicking, typing, etc.).

- Creating a reusable agent setup.

Since then, I’ve been steadily improving it; now the grounding time is down to 1.7 seconds, and the toolset has improved a lot.

The vision: Just prompt the agent, and it does the task; no need to worry about how. I call it “vibe automation.”

Demos I’ve made:

- Generate Word docs on any topic: searches the web, writes the content, opens Word, and saves it.

- Book flights on Google Flights using a browser.

- Navigate files in Explorer and open a specific file (e.g., "Open this file in D:\ drive").

- Change the desktop theme from dark to light, like a user would do manually.

You can install it on Windows using:

pip install windows-use

Try it out and let me know how it works for you!

I'm Jeomon George

Android-MCP: Bridging AI Agents and Android Devices

It supports Android 10+ and is built with Python 3.10+. The project is licensed under the MIT License, and contributions are welcome.

You can find more details, installation instructions, and the source code here: https://github.com/CursorTouch/Android-MCP

Custom MCP Client

The main reason is I'm building an MCP Agent, I'm getting RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

While disconnecting the servers..

So looked for fixes from GitHub, but unfortunately none worked, but I wanted to build it, so I developed the MCP-Client.

https://github.com/Jeomon/MCP-Client

Web-Agent (AI Agent for web automation)

https://github.com/CursorTouch/Web-Agent

We are open to suggestions and contributions to make it way to better.

jeomon27

Recent submissions

Windows-MCP: The MCP for AI to Interact with Windows Apps (opens in new tab)

Show HN: I made an open-source alternative of computer-use for windows

Android-MCP: Bridging AI Agents and Android Devices

Custom MCP Client

Web-Agent (AI Agent for web automation)

Recent submissions

Windows-MCP: The MCP for AI to Interact with Windows Apps (opens in new tab)

Show HN: I made an open-source alternative of computer-use for windows

Android-MCP: Bridging AI Agents and Android Devices

Custom MCP Client

Web-Agent (AI Agent for web automation)