Voice-enabled browser automation agent built with Next.js and TypeScript. It parses natural-language commands into a structured plan, executes steps in a live browser session (via Browserbase), and streams events and screenshots to the UI. State is persisted to SQLite for replay and debugging.
- Natural language to actions: lightweight parser with optional LLM fallback
- Planner → Executor pipeline with retry and backtrack on failure
- Live browser via Browserbase (viewer URL + MJPEG streaming endpoint)
- Streaming event log (SSE) with steps and screenshots
- Session capture controls (capture on error, capture every N steps)
- SQLite persistence for sessions, commands, events, executions, and advisory selector memory
- Fully typed (TypeScript), tested with Vitest and Testing Library
- Next.js 14 (App Router), React 18, Tailwind CSS
- TypeScript, Zod for config validation
- SQLite via
better-sqlite3and Drizzle ORM (manual schema) - Playwright runner abstraction, Browserbase CDP integration
- Vitest + JSDOM for unit tests
- Node.js >= 18.17
- npm (or pnpm/yarn). Examples below use npm.
Optional (only if you plan to run a local Playwright browser instead of Browserbase):
- Playwright browsers:
npx playwright install --with-deps
- Install dependencies
npm install
- Configure environment
Copy .env.example to .env and edit as needed. Key variables:
- Runners
PLAYWRIGHT_ENABLED: set totrueto enable a local Playwright runner (not the default route).BROWSERBASE_ENABLED: set totrueto use Browserbase for the live browser.
- Browserbase
BROWSERBASE_API_KEY: required if using the SDK to create sessions.BROWSERBASE_PROJECT_ID: optional, depends on your project setup.BROWSERBASE_WS_ENDPOINT: optional direct CDP endpoint; if set, SDK is bypassed.
- Persistence
SQLITE_FILE: path to SQLite file (default./data.sqlite).
- Screenshots
SCREENSHOT_EVERY: number of steps between screenshots (0/off if unset).
- LLM fallback (optional)
LLM_FALLBACK_ENABLED: settrueto enable parsing fallback with OpenAI.OPENAI_API_KEY: OpenAI API key for fallback parsing.LLM_MODEL: model name (defaultgpt-4o-mini).
Note: With BROWSERBASE_ENABLED=false and PLAYWRIGHT_ENABLED=false, the executor uses a no-op runner. You’ll still see planning and events, but no real browsing will occur.
- Run the dev server
This repo does not define dev/build scripts; you can invoke Next directly:
npx next dev -p 3000
Then open http://localhost:3000.
Build/start for production:
npx next build
npx next start -p 3000
- Create a session using the “New Session” button.
- Enter a command, e.g., “search amazon for headphones under 200 and sort by price low to high”.
- Review the parsed command and computed plan.
- Watch live events and screenshots as steps execute.
- If using Browserbase, click “Open Live Browser” to view the cloud session.
- Adjust capture settings (capture on error or every N steps) per session.
If the command targets a domain not in the session allowlist, the UI will prompt for confirmation before planning/execute.
- Parser (
src/lib/parser.ts): heuristics to extract intent, entities (site, query, filters, sort), and safety flags. - Planner (
src/lib/planner.ts): maps aCommandto anActionPlanof steps (NAVIGATE,WAIT_FOR,FILL,PRESS,APPLY_FILTER,SORT,CLICK). - Executor (
src/runner/executor.ts): runs steps with retry/backtrack, emitsSTATE/STEP/SCREENSHOT/ERRORevents, and records executions/steps in SQLite. - Runner abstraction (
src/runner/runner.ts): interface implemented by a Playwright-based runner. In production the route uses Browserbase viacreateBrowserbaseRunner. - Browserbase integration (
src/runner/createBrowserbaseRunner.ts): creates or attaches to a Browserbase session (SDK or directBROWSERBASE_WS_ENDPOINT) and returns a Playwright-backed runner. PersistsviewerUrlfor the UI. - Event pipeline
- Services (
src/server/services.ts) emit events and manage in-memory sessions. - SSE API (
src/app/api/events/route.ts) replays backlog from SQLite and streams live events. - Live MJPEG (
src/app/api/live/route.ts) for periodic page screenshots if a live page is registered.
- Services (
- Persistence (
src/server/persistence/sqlite.ts): minimal schema creation; stores sessions, commands, events, executions, and advisory selector memory. - UI (Next.js App Router): session controls, command panel, plan card, event log, screenshots, timeline, and capture settings.
POST /api/sessions→{ sessionId }POST /api/commandsbody{ sessionId, utterance }→{ command, plan }POST /api/confirmationsbody{ sessionId, commandId, confirmed, passphrase? }→{ status, plan? }GET /api/events?sessionId=...&live=1→ text/event-stream (SSE)POST /api/cancelbody{ sessionId }→{ status: 'cancelling' }GET /api/viewer-url?sessionId=...→{ viewerUrl: string | null }GET /api/live?sessionId=...→ multipart/x-mixed-replace stream (JPEG frames)
- Run all tests:
npm test
- Watch mode:
npm run test:watch
Some tests exercise API routes and SSE using JSDOM, and runner logic in isolation. For Playwright E2E you would need to provision browsers and wire a direct runner path.
- Browserbase
- Ensure
BROWSERBASE_ENABLED=trueand setBROWSERBASE_API_KEY(or provideBROWSERBASE_WS_ENDPOINT). - The route persists the session’s
viewerUrlso the UI can open “Live Browser”.
- Ensure
- Playwright (local)
- The default
/api/commandsroute currently prefers Browserbase. If you want a local browser, you can adapt the route to construct aPlaywrightRunnerdirectly and setPLAYWRIGHT_ENABLED=true. - Install browsers:
npx playwright install --with-deps.
- The default
- SQLite
- File is controlled by
SQLITE_FILE(default./data.sqlite). Schema is created on startup if missing.
- File is controlled by
- LLM Fallback
- Enable with
LLM_FALLBACK_ENABLED=trueand setOPENAI_API_KEY. Used when parser confidence is low or intent is unknown.
- Enable with
src/app/Next.js app (pages, components, APIs)src/lib/parser, planner, types, eventssrc/runner/runner interface + Playwright integration and execution enginesrc/server/services, persistence (SQLite), live streamingsrc/memory/advisory selector memory (mem0)
- No live browser view
- Ensure Browserbase is enabled and configured; check
/api/viewer-urlreturns a URL.
- Ensure Browserbase is enabled and configured; check
- No screenshots
- Set capture policy in the UI (Capture Settings), or set
SCREENSHOT_EVERY; screenshots on error are enabled by default.
- Set capture policy in the UI (Capture Settings), or set
- Database errors
- Verify write permissions to
SQLITE_FILEpath and that only one process is writing.
- Verify write permissions to
- Playwright errors
- Install browsers (
npx playwright install) and verify your runner path uses a local Playwright page/context.
- Install browsers (
Design summary.md,Detailed-Design.md,Project-Requirements.mdPlaywright-Runner-Design.md,Browserbase-Integration.mdIntents.mdfor NLP intents and examples
Made for rapid prototyping of voice-driven web automation. Contributions and suggestions welcome.
