Voice Web Agent

Voice-enabled browser automation agent built with Next.js and TypeScript. It parses natural-language commands into a structured plan, executes steps in a live browser session (via Browserbase), and streams events and screenshots to the UI. State is persisted to SQLite for replay and debugging.

Features

Natural language to actions: lightweight parser with optional LLM fallback
Planner → Executor pipeline with retry and backtrack on failure
Live browser via Browserbase (viewer URL + MJPEG streaming endpoint)
Streaming event log (SSE) with steps and screenshots
Session capture controls (capture on error, capture every N steps)
SQLite persistence for sessions, commands, events, executions, and advisory selector memory
Fully typed (TypeScript), tested with Vitest and Testing Library

Tech Stack

Next.js 14 (App Router), React 18, Tailwind CSS
TypeScript, Zod for config validation
SQLite via better-sqlite3 and Drizzle ORM (manual schema)
Playwright runner abstraction, Browserbase CDP integration
Vitest + JSDOM for unit tests

Getting Started

Prerequisites

Node.js >= 18.17
npm (or pnpm/yarn). Examples below use npm.

Optional (only if you plan to run a local Playwright browser instead of Browserbase):

Playwright browsers: npx playwright install --with-deps

Install

Install dependencies

npm install

Configure environment

Copy .env.example to .env and edit as needed. Key variables:

Runners
- PLAYWRIGHT_ENABLED: set to true to enable a local Playwright runner (not the default route).
- BROWSERBASE_ENABLED: set to true to use Browserbase for the live browser.
Browserbase
- BROWSERBASE_API_KEY: required if using the SDK to create sessions.
- BROWSERBASE_PROJECT_ID: optional, depends on your project setup.
- BROWSERBASE_WS_ENDPOINT: optional direct CDP endpoint; if set, SDK is bypassed.
Persistence
- SQLITE_FILE: path to SQLite file (default ./data.sqlite).
Screenshots
- SCREENSHOT_EVERY: number of steps between screenshots (0/off if unset).
LLM fallback (optional)
- LLM_FALLBACK_ENABLED: set true to enable parsing fallback with OpenAI.
- OPENAI_API_KEY: OpenAI API key for fallback parsing.
- LLM_MODEL: model name (default gpt-4o-mini).

Note: With BROWSERBASE_ENABLED=false and PLAYWRIGHT_ENABLED=false, the executor uses a no-op runner. You’ll still see planning and events, but no real browsing will occur.

Run the dev server

This repo does not define dev/build scripts; you can invoke Next directly:

npx next dev -p 3000

Then open http://localhost:3000.

Build/start for production:

npx next build
npx next start -p 3000

Basic Workflow

Create a session using the “New Session” button.
Enter a command, e.g., “search amazon for headphones under 200 and sort by price low to high”.
Review the parsed command and computed plan.
Watch live events and screenshots as steps execute.
If using Browserbase, click “Open Live Browser” to view the cloud session.
Adjust capture settings (capture on error or every N steps) per session.

If the command targets a domain not in the session allowlist, the UI will prompt for confirmation before planning/execute.

Architecture Overview

Parser (src/lib/parser.ts): heuristics to extract intent, entities (site, query, filters, sort), and safety flags.
Planner (src/lib/planner.ts): maps a Command to an ActionPlan of steps (NAVIGATE, WAIT_FOR, FILL, PRESS, APPLY_FILTER, SORT, CLICK).
Executor (src/runner/executor.ts): runs steps with retry/backtrack, emits STATE/STEP/SCREENSHOT/ERROR events, and records executions/steps in SQLite.
Runner abstraction (src/runner/runner.ts): interface implemented by a Playwright-based runner. In production the route uses Browserbase via createBrowserbaseRunner.
Browserbase integration (src/runner/createBrowserbaseRunner.ts): creates or attaches to a Browserbase session (SDK or direct BROWSERBASE_WS_ENDPOINT) and returns a Playwright-backed runner. Persists viewerUrl for the UI.
Event pipeline
- Services (src/server/services.ts) emit events and manage in-memory sessions.
- SSE API (src/app/api/events/route.ts) replays backlog from SQLite and streams live events.
- Live MJPEG (src/app/api/live/route.ts) for periodic page screenshots if a live page is registered.
Persistence (src/server/persistence/sqlite.ts): minimal schema creation; stores sessions, commands, events, executions, and advisory selector memory.
UI (Next.js App Router): session controls, command panel, plan card, event log, screenshots, timeline, and capture settings.

API Endpoints

POST /api/sessions → { sessionId }
POST /api/commands body { sessionId, utterance } → { command, plan }
POST /api/confirmations body { sessionId, commandId, confirmed, passphrase? } → { status, plan? }
GET /api/events?sessionId=...&live=1 → text/event-stream (SSE)
POST /api/cancel body { sessionId } → { status: 'cancelling' }
GET /api/viewer-url?sessionId=... → { viewerUrl: string | null }
GET /api/live?sessionId=... → multipart/x-mixed-replace stream (JPEG frames)

Testing

Run all tests:

npm test

Watch mode:

npm run test:watch

Some tests exercise API routes and SSE using JSDOM, and runner logic in isolation. For Playwright E2E you would need to provision browsers and wire a direct runner path.

Notes and Tips

Browserbase
- Ensure BROWSERBASE_ENABLED=true and set BROWSERBASE_API_KEY (or provide BROWSERBASE_WS_ENDPOINT).
- The route persists the session’s viewerUrl so the UI can open “Live Browser”.
Playwright (local)
- The default /api/commands route currently prefers Browserbase. If you want a local browser, you can adapt the route to construct a PlaywrightRunner directly and set PLAYWRIGHT_ENABLED=true.
- Install browsers: npx playwright install --with-deps.
SQLite
- File is controlled by SQLITE_FILE (default ./data.sqlite). Schema is created on startup if missing.
LLM Fallback
- Enable with LLM_FALLBACK_ENABLED=true and set OPENAI_API_KEY. Used when parser confidence is low or intent is unknown.

Folder Structure

src/app/ Next.js app (pages, components, APIs)
src/lib/ parser, planner, types, events
src/runner/ runner interface + Playwright integration and execution engine
src/server/ services, persistence (SQLite), live streaming
src/memory/ advisory selector memory (mem0)

Troubleshooting

No live browser view
- Ensure Browserbase is enabled and configured; check /api/viewer-url returns a URL.
No screenshots
- Set capture policy in the UI (Capture Settings), or set SCREENSHOT_EVERY; screenshots on error are enabled by default.
Database errors
- Verify write permissions to SQLITE_FILE path and that only one process is writing.
Playwright errors
- Install browsers (npx playwright install) and verify your runner path uses a local Playwright page/context.

Design Docs

Design summary.md, Detailed-Design.md, Project-Requirements.md
Playwright-Runner-Design.md, Browserbase-Integration.md
Intents.md for NLP intents and examples

Made for rapid prototyping of voice-driven web automation. Contributions and suggestions welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
src		src
README.md		README.md
tailwind.config.ts		tailwind.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Web Agent

Features

Tech Stack

Getting Started

Prerequisites

Install

Basic Workflow

Architecture Overview

API Endpoints

Testing

Notes and Tips

Folder Structure

Troubleshooting

Design Docs

About

Uh oh!

Releases

Packages

Languages

iqbal-sk/voice-web-agent

Folders and files

Latest commit

History

Repository files navigation

Voice Web Agent

Features

Tech Stack

Getting Started

Prerequisites

Install

Basic Workflow

Architecture Overview

API Endpoints

Testing

Notes and Tips

Folder Structure

Troubleshooting

Design Docs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages