ブラウザを自動操作するAI Browser-Useを使ってみる

https://github.com/browser-use/browser-use

$ mkdir browser-use
$ uv venv
$ source .venv/bin/activate
$ uv pip install browser-use
$ playwright install
$ vim main.py

from langchain_openai import ChatOpenAI
from browser_use import Agent
import asyncio
 
async def main():
    agent = Agent(
        task="Go to Reddit, search for 'browser-use' in the search bar, click on the first post and return the first comment.",
        llm=ChatOpenAI(model="gpt-4o"),
    )
    result = await agent.run()
    print(result)
 
asyncio.run(main())

$ python main.py
INFO     [root] Anonymized telemetry enabled. See https://github.com/gregpr07/browser-use for more information.
INFO     [agent] 🚀 Starting task: Go to Reddit, search for 'browser-use' in the search bar, click on the first post and return the first comment.
INFO     [agent]
📍 Step 1
INFO     [agent] 👍 Eval: Success - The task is clear and the page is ready for navigation.
INFO     [agent] 🧠 Memory: Need to navigate to Reddit, search for 'browser-use', click the first post, and extract the first comment.
INFO     [agent] 🎯 Next goal: Open Reddit in a new tab.
INFO     [agent] 🛠️  Action 1/1: {"open_tab":{"url":"https://www.reddit.com"}}
INFO     [controller] 🔗  Opened new tab with https://www.reddit.com
INFO     [agent]
📍 Step 2
INFO     [agent] 👍 Eval: Success - Reddit page loaded and search bar is available.
INFO     [agent] 🧠 Memory: Need to search for 'browser-use' on Reddit.
INFO     [agent] 🎯 Next goal: Enter 'browser-use' into the search bar and perform the search.
INFO     [agent] 🛠️  Action 1/1: {"input_text":{"index":2,"text":"browser-use"}}
INFO     [controller] ⌨️  Input "browser-use" into index 2
INFO     [agent]
📍 Step 3
INFO     [agent] 👍 Eval: Success - 'browser-use' search term entered successfully.
INFO     [agent] 🧠 Memory: Need to perform the search for 'browser-use'.
INFO     [agent] 🎯 Next goal: Click on 'Search for "browser-use"' to perform the search.
INFO     [agent] 🛠️  Action 1/1: {"click_element":{"index":14}}
INFO     [controller] 🖱️  Clicked index 14
INFO     [agent]
📍 Step 4
INFO     [agent] 👍 Eval: Success - Search results are displayed successfully.
INFO     [agent] 🧠 Memory: Need to click the first post related to 'browser-use'.
INFO     [agent] 🎯 Next goal: Click on the first relevant post about 'browser-use'.
INFO     [agent] 🛠️  Action 1/1: {"click_element":{"index":12}}
INFO     [controller] 🖱️  Clicked index 12
INFO     [agent]
📍 Step 5
INFO     [agent] 👍 Eval: Success - Reached the post page successfully.
INFO     [agent] 🧠 Memory: Need to extract the first comment from the post.
INFO     [agent] 🎯 Next goal: Click 'Go to comments' to view the comments section.
INFO     [agent] 🛠️  Action 1/1: {"click_element":{"index":11}}
INFO     [controller] 🖱️  Clicked index 11
INFO     [agent]
📍 Step 6
INFO     [agent] 👍 Eval: Success - Navigated to comments section successfully.
INFO     [agent] 🧠 Memory: Need to extract the first comment.
INFO     [agent] 🎯 Next goal: Extract the text of the first comment.
INFO     [agent] 🛠️  Action 1/1: {"done":{"text":"The first comment is:\n\n\"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required\""}}
INFO     [agent] 📄 Result: The first comment is:
 
"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required"
INFO     [agent] ✅ Task completed successfully
INFO     [agent] Created GIF at agent_history.gif
AgentHistoryList(all_results=[ActionResult(is_done=False, extracted_content='🔗  Opened new tab with https://www.reddit.com', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='⌨️  Input "browser-use" into index 2', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 14', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 12', error=None, include_in_memory=True), ActionResult(is_done=False, extracted_content='🖱️  Clicked index 11', error=None, include_in_memory=True), ActionResult(is_done=True, extracted_content='The first comment is:\n\n"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required"', error=None, include_in_memory=False)], all_model_outputs=[{'open_tab': {'url': 'https://www.reddit.com'}}, {'input_text': {'index': 2, 'text': 'browser-use'}}, {'click_element': {'index': 14}}, {'click_element': {'index': 12}}, {'click_element': {'index': 11}}, {'done': {'text': 'The first comment is:\n\n"The readme says it supports Llama 405B but no examples are provided :( It seems a model with multiple images and tool calling is required"'}}])

操作は割とゆっくりで、一瞬で情報収集ができるというレベルではないが調べ物だったり、予約だったりを勝手にやってくれるのは省力化にはなるかも。

ikorihn Digitalgarden

Explorer

Recent Notes

Prometheusのrate関数とincrease関数を使ってスパイク検知したい

termuxでcronieを使って定期処理を実行する

sqlxとXSAM otelsqlを組み合わせたときにplaceholderが変換されない

Git 特定のファイルパターンをdiffに表示しない

ブラウザを自動操作するAI Browser-Useを使ってみる

Obsidian Web Clipperからノートを作成する

Git shallow cloneしたリポジトリであとからより多くの履歴を取得する

Advanced URIとcronを使って毎日Daily noteが作られるようにする

PrometheusやOTelでmetricsのカーディナリティが高いとメモリ使用量が多くなるので注意する

Neovimで巨大なファイルを誤って開いてフリーズするのを防ぎたい

macOS VPN接続を特定のドメインのみにする

Nginx

Nginxでgzip圧縮

shell script 引数と標準入力両方受け付けられる関数を作る

Obsidian Copilot

GitHub Actionsでマージ済みブランチのキャッシュをクリアする

ikorihn

termuxとGitを使って、ObsidianをPCとAndroidで同期する方法

Pro Microが文鎮化してしまったので復旧させたい

AppleScript

ブラウザを自動操作するAI Browser-Useを使ってみる

Graph View

Backlinks

Archive