AgentQL

AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing as UI changes and work across similar sites. Users can define structured data output, making AgentQL versatile for developers and data scientists.
GitHub
197
Created 9 months ago, last commit 8 hours ago
18 contributors
82 commits
9 added yesterday
README

AgentQL

AI-powered web scraping and automation

Follow on X Follow on LinkedIn Join our Discord

Python version GitHub Repo stars GitHub watchers

What is AgentQL?

AgentQL is an AI-powered query language for scraping web sites and automating workflows. It uses natural language queries to pinpoint data and elements on any web page, including authenticated and dynamically generated content. Users can define structured data output and apply transforms within queries. AgentQL's natural language selectors find elements intuitively based on the content of the web page and work across similar web sites, self-healing as UI changes over time.

Features

  • Playwright AgentQL's Python SDK and JavaScript SDK seamlessly integrates with Playwright for advanced automation and testing.
  • Cross-site compatibility lets you use the same query across different sites with similar content.
  • Structured output defined by the shape of your query.
  • Natural language selectors find elements and data anywhere on a site using intuitive queries.
  • Transforms and extracts data in your queries.
  • Works on any page, public or private, any site, any URL, even behind authentication.
  • Resiliance to UI changes means queries work regardless of how a page's structure changes over time.

Tools

  • Python SDK for running automation and scraping scripts with AgentQL queries in Python.
  • JavaScript SDK for running automation and scraping scripts with AgentQL queries in JavaScript.
  • Debugger Browser Extension lets you debug and finesse queries in real-time on live sites.
  • AgentQL Query Language lets you define queries with natural language.
  • Playground for playing with AgentQL lets you export python scripts and optimize queries with prompts.

Quick Start

Python

  1. Install Python SDK and dependencies via your terminal:
pip3 install agentql
agentql init
  1. Copy and paste your API key into the terminal.

  2. Save one of the following scripts as example.py and run the following from your terminal:

python3 example.py

JavaScript

  1. Install JavaScript SDK via your terminal:
npm install agentql
  1. Install dependencies and set your API key by following the instructions here.

  2. Save one of the following scripts as example.js and run the following from your terminal:

node example.js

Example Scripts

Python

Data extraction with query_data

import agentql
from playwright.sync_api import sync_playwright

with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
    page = agentql.wrap(browser.new_page())
    page.goto("https://scrapeme.live/shop/")

    # use your own words to describe what you're looking for
    QUERY = """
    {
        products[] {
            name
            price
        }
    }
    """

    # query_data returns data from the page
    response = page.query_data(QUERY)

    print(response)

Automation with get_by_prompt and query_elements

import agentql
from playwright.sync_api import sync_playwright

with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
    page = agentql.wrap(browser.new_page())
    page.goto("https://duckduckgo.com")

    # use your own words to describe what you're looking for
    QUERY = """
    {
        search_box
        search_button
    }
    """
    # query_elements returns multiple elements to perform operations on
    response = page.query_elements(QUERY)

    response.search_box.fill("AgentQL")
    response.search_button.click()

    # get_by_prompt returns one element to perform operations on based on the content you pass to it
    images = page.get_by_prompt("images link")
    images.click()

    # Used only for demo purposes. It allows you to see the effect of the script.
    page.wait_for_timeout(10000)

JavaScript

Data extraction with queryData

const { wrap, configure } = require("agentql");
const { chromium } = require("playwright");

configure({ apiKey: process.env.AGENTQL_API_KEY });

async function main() {
  const browser = await chromium.launch();
  const page = await wrap(await browser.newPage());
  await page.goto("https://scrapeme.live/shop/");

  // use your own words to describe what you're looking for
  const QUERY = `
  {
      products[] {
          name
          price
      }
  }
  `;

  // query_data returns data from the page
  const response = await page.queryData(QUERY);

  console.log(response);
}

main();

Automation with getByPrompt and queryElements

const { wrap, configure } = require("agentql");
const { chromium } = require("playwright");

configure({ apiKey: process.env.AGENTQL_API_KEY });

async function main() {
  const browser = await chromium.launch();
  const page = await wrap(await browser.newPage());
  await page.goto("https://duckduckgo.com");

  // use your own words to describe what you're looking for
  const QUERY = `
  {
      search_box
      search_button
  }
  `;

  // query_elements returns multiple elements to perform operations on
  const response = await page.queryElements(QUERY);

  await response.search_box.fill("AgentQL");
  await response.search_button.click();

  // get_by_prompt returns one element to perform operations on based on the content you pass to it
  const images = page.getByPrompt("images link");
  await images.click();

  // Used only for demo purposes. It allows you to see the effect of the script.
  await page.waitForTimeout(10000);
}

main();

More examples

Example Name JavaScript Python
Getting Started Example Example
Close Cookie Dialog Example Example
Close Popup Windows Example Example
Compare Product Prices Example Example
Debug Script N/A Example
Get Element by Prompt Example Example
Infinite Scroll N/A Example
External Browser Integration Example Example
Query List Items Example Example
Site Login Example Example
Headless Browser Example Example
Save/Load Auth Session Example Example
Stealth Mode Example Example
Wait for Page Load Example Example
E-commerce Pricing Data Example Example
Sentiment Analysis Example Example
Get XPath Example Example
Submit Form Example Example
Collect YouTube Comments Example N/A
Run in Google Colab N/A Example

For comprehensive guides and API references, check out our official documentation.

Show Your Support 🌟

If you find AgentQL helpful, please consider giving us a star on GitHub! It helps us reach more developers and continue improving the project.

GitHub Repo stars

Get in touch

For questions, feedback, or support, join our Discord community. You can follow us on GitHub, Twitter, and LinkedIn!