Midscene.js

Your AI Operator for Web, Android, Automation & Testing.
GitHub
8.68k
Created 10 months ago, last commit 13 hours ago
31 contributors
531 commits
Stars added on GitHub, month by month
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
5
6
7
8
9
10
11
12
1
2
3
4
2024
2025
Stars added on GitHub, per day, on average
Yesterday
+28
Last week
+18.3
/day
Last month
+32.9
/day
README

Midscene.js

Midscene.js

English | 简体中文

Your AI Operator for Web, Android, Automation & Testing

npm version huagging face model downloads License discord twitter

Midscene.js allows AI to serve as your web and Android operator 🤖. Simply describe what you want to achieve in natural language, and it will assist you in operating the interface, validating content, and extracting data. Whether you seek a quick experience or in-depth development, you'll find it easy to get started.

Showcases

Instruction Video
Post a Tweet (By UI-TARS model)
twitter-video-1080p.mp4
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs (By UI-TARS model)
google-doc-1080p.mp4
Control Maps App on Android (By Qwen-2.5-VL model)
control-maps-app-on-android.mp4
Using midscene mcp to browse the page (https://www.saucedemo.com/), perform login, add products, place orders, and finally generate test cases based on mcp execution steps and playwright example
en-midscene-mcp-Sauce-Demo.mp4

📢 2025 Feb: New open-source model choice - UI-TARS and Qwen2.5-VL

Besides the default model GPT-4o, we have added two new recommended open-source models to Midscene.js: UI-TARS and Qwen2.5-VL. (Yes, Open Source models !) They are dedicated models for image recognition and UI automation, which are known for performing well in UI automation scenarios. Read more about it in Choose a model.

💡 Features

  • Natural Language Interaction 👆: Just describe your goals and steps, and Midscene will plan and operate the user interface for you.
  • UI Automation 🤖
  • MCP Integration 🔗: Allows other MCP Clients to directly use Midscene's capabilities. For more details, please read MCP Integration.
  • Visual Reports for Debugging 🎞️: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
  • Support Caching 🔄: The first time you execute a task through AI, it will be cached, and subsequent executions of the same task will significantly improve execution efficiency.
  • Completely Open Source 🔥: Experience a whole new automation development experience, enjoy!
  • Understand UI, JSON Format Responses 🔍: You can specify data format requirements and receive responses in JSON format.
  • Intuitive Assertions 🤔: Express your assertions in natural language, and AI will understand and process them.

✨ Model Choices

You can use multimodal LLMs like gpt-4o, or visual-language models like Qwen2.5-VL, gemini-2.5-pro and UI-TARS. In which UI-TARS is an open-source model dedicated for UI automation.

Read more about Choose a model

👀 Comparing to ...

There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?

  • Debugging Experience: You will soon realize that debugging and maintaining automation scripts is the real challenge. No matter how magical the demo looks, ensuring stability over time requires careful debugging. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need, and we're continually working to improve the debugging experience.

  • Open Source, Free, Deploy as you want: Midscene.js is an open-source project. It's decoupled from any cloud service and model provider, you can choose either public or private deployment. There is always a suitable plan for your business.

  • Integrate with Javascript: You can always bet on Javascript 😎

📄 Resources

🤝 Community

📝 Credits

We would like to thank the following projects:

  • Rsbuild for the build tool.
  • UI-TARS for the open-source agent model UI-TARS.
  • Qwen2.5-VL for the open-source VL model Qwen2.5-VL.
  • scrcpy and yume-chan allow us to control Android devices with browser.
  • appium-adb for the javascript bridge of adb.
  • YADB for the yadb tool which improves the performance of text input.
  • Puppeteer for browser automation and control.
  • Playwright for browser automation and control and testing.

Citation

If you use Midscene.js in your research or project, please cite:

@software{Midscene.js,
  author = {Xiao Zhou, Tao Yu, YiBing Lin},
  title = {Midscene.js: Your AI Operator for Web, Android, Automation & Testing.},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/web-infra-dev/midscene}
}

📝 License

Midscene.js is MIT licensed.


If this project helps you or inspires you, please give us a ⭐️