Skip to content

Conversation

@seanmcguire12
Copy link
Member

@seanmcguire12 seanmcguire12 commented Dec 24, 2025

why

  • when the agent is replaying cached actions and an xpath (from the cache) does not resolve, the agent will heal and reprocess the page
  • however, the cached entry is not getting updated with the "healed" selector

what changed

  • added logic so agent cache replays detect when deterministic actions self-heal and immediately rewrite the cache entry with the updated selectors/timestamps, preventing perpetual failures on corrupted entries

test plan

  • added agent-cache-self-heal.spec.ts end-to-end test that runs an agent with a real cacheDir, corrupts a cached selector, reruns the workflow, and asserts the cache file now contains the healed XPath

Summary by cubic

Fixes agent cache not refreshing after self-heal, so healed selectors are saved and future replays don’t keep failing. Addresses Linear STG-1106.

  • Bug Fixes
    • Detect updated actions during replay (act/fillForm) and update the cache entry when actions change.
    • Persist refreshed steps and timestamp via refreshAgentCacheEntry.
    • Add e2e test and CI job to verify a corrupted selector is healed and the cache file is rewritten.

Written for commit 9634d84. Summary will update automatically on new commits.

@changeset-bot
Copy link

changeset-bot bot commented Dec 24, 2025

🦋 Changeset detected

Latest commit: 9634d84

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@seanmcguire12
Copy link
Member Author

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 24, 2025

Greptile Summary

Fixed agent cache not refreshing after self-heal, preventing perpetual replay failures on corrupted selectors.

  • Core Fix: Modified replayAgentCacheEntry to track when deterministic actions change during replay (lines 506-529)
  • Change Detection: Implemented haveActionsChanged to compare original vs healed actions across selector, description, method, and arguments (lines 720-751)
  • Cache Refresh: Added refreshAgentCacheEntry to persist healed selectors and update timestamps (lines 753-787)
  • Return Type Updates: Changed replay methods to return updated steps instead of void, enabling change tracking
  • Test Coverage: Comprehensive e2e test validates the full flow by corrupting a selector and verifying it gets healed and persisted

The implementation correctly integrates with the existing self-heal mechanism in ActHandler.takeDeterministicAction, which already handles selector failures by re-processing the page with fresh snapshots. This PR bridges the gap by detecting when that healing occurs during cache replay and immediately updating the cache file.

Confidence Score: 4/5

  • This PR is safe to merge with minor considerations
  • The implementation is well-designed and thoroughly tested with an e2e test that validates the core functionality. The change detection logic is comprehensive, comparing all relevant action properties. The code integrates cleanly with existing self-heal mechanisms. Score is 4/5 due to the inherent complexity of cache invalidation logic and the fact that shallow equality checks on arguments assume they remain strings (which they do per the type definition, but this could be a maintenance concern if the type evolves)
  • No files require special attention

Important Files Changed

Filename Overview
packages/core/lib/v3/tests/agent-cache-self-heal.spec.ts Well-structured e2e test that validates cache self-healing by corrupting a selector and verifying it gets updated after replay
packages/core/lib/v3/cache/AgentCache.ts Core fix that detects when deterministic actions self-heal during replay and refreshes cache entries with updated selectors. Implementation is thorough with proper change detection and logging

Sequence Diagram

sequenceDiagram
    participant Agent
    participant AgentCache
    participant ActHandler
    participant Page

    Note over Agent,AgentCache: Cache Replay Flow with Self-Heal

    Agent->>AgentCache: tryReplay(context)
    AgentCache->>AgentCache: Load cached entry from disk
    
    loop For each cached step
        AgentCache->>AgentCache: executeAgentReplayStep(step)
        
        alt Step type is "act" or "fillForm"
            AgentCache->>ActHandler: takeDeterministicAction(cachedAction)
            ActHandler->>Page: Try to execute with cached selector
            
            alt Cached selector works
                Page-->>ActHandler: Success
                ActHandler-->>AgentCache: result.success=true, actions=[original]
                Note over AgentCache: No change detected
            else Cached selector fails (e.g., xpath=/yeee)
                Page-->>ActHandler: Error (selector not found)
                
                Note over ActHandler: Self-heal mode activated
                ActHandler->>Page: captureHybridSnapshot()
                Page-->>ActHandler: Fresh DOM snapshot
                ActHandler->>ActHandler: getActionFromLLM(fresh snapshot)
                ActHandler->>Page: Execute with healed selector
                Page-->>ActHandler: Success
                ActHandler-->>AgentCache: result.success=true, actions=[healed]
                Note over AgentCache: Change detected!
            end
            
            AgentCache->>AgentCache: haveActionsChanged(original, updated)
            alt Actions changed
                AgentCache->>AgentCache: Mark stepsChanged=true
                AgentCache->>AgentCache: Store updated step
            end
        else Other step types
            AgentCache->>Page: Execute step (goto, scroll, wait, etc.)
            Note over AgentCache: No action tracking needed
        end
    end
    
    alt stepsChanged == true
        AgentCache->>AgentCache: refreshAgentCacheEntry(updatedSteps)
        AgentCache->>AgentCache: Write updated cache to disk
        Note over AgentCache: Cache now contains healed selectors
    end
    
    AgentCache-->>Agent: Return result with cacheHit=true
Loading

@seanmcguire12 seanmcguire12 marked this pull request as ready for review December 24, 2025 20:16
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

await this.executeAgentReplayStep(step, ctx, handler);
const replayedStep =
(await this.executeAgentReplayStep(step, ctx, handler)) ?? step;
stepsChanged ||= replayedStep !== step;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first time I see this shorthand

@seanmcguire12 seanmcguire12 merged commit 638efc7 into main Dec 26, 2025
23 checks passed
@github-actions github-actions bot mentioned this pull request Dec 26, 2025
miguelg719 pushed a commit that referenced this pull request Dec 27, 2025
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/[email protected]

### Patch Changes

- [#1461](#1461)
[`0f3991e`](0f3991e)
Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out
of experimental

- [#1433](#1433)
[`e0e22e0`](e0e22e0)
Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode
behind experimental

- [#1456](#1456)
[`f261051`](f261051)
Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for
agent move action

- [#1473](#1473)
[`e021674`](e021674)
Thanks [@shrey150](https://github.com/shrey150)! - Add safety
confirmation support for OpenAI + Google CUA

- [#1399](#1399)
[`6a5496f`](6a5496f)
Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is
killed when stagehand.close is called

- [#1436](#1436)
[`fea1700`](fea1700)
Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key
for act/extract/observe parametrized models on api

- [#1439](#1439)
[`5b288d9`](5b288d9)
Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from
agent actions array ( still present in messages object )

- [#1408](#1408)
[`e822f5a`](e822f5a)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for
act() cache hit when variable values change

- [#1472](#1472)
[`638efc7`](638efc7)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent
cache not refreshed on action failure

- [#1424](#1424)
[`a890f16`](a890f16)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix:
"Error: -32000 Failed to convert response to JSON: CBOR: stack limit
exceeded"

- [#1418](#1418)
[`934f492`](934f492)
Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers
and bus listeners on close

- [#1430](#1430)
[`bd2db92`](bd2db92)
Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model
coordinate translation

- [#1465](#1465)
[`51e0170`](51e0170)
Thanks [@miguelg719](https://github.com/miguelg719)! - Add media
resolution high provider option to gemini 3 hybrid agent

- [#1431](#1431)
[`05f5580`](05f5580)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache
handling for agent

- [#1432](#1432)
[`f56a9c2`](f56a9c2)
Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true
in favor of mode: "cua"

- [#1406](#1406)
[`b40ae11`](b40ae11)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
hovering with coordinates ( page.hover )

- [#1407](#1407)
[`0d2b398`](0d2b398)
Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods

- [#1412](#1412)
[`cd01f29`](cd01f29)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load
GOOGLE_API_KEY from .env

- [#1462](#1462)
[`a734fca`](a734fca)
Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass
userDataDir to chrome launcher

- [#1466](#1466)
[`b342acf`](b342acf)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move
playwright to optional dependencies

- [#1440](#1440)
[`2987cd1`](2987cd1)
Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support
excluding tools from agent

- [#1455](#1455)
[`dfab1d5`](dfab1d5)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update
aisdk client to better enforce structured output with deepseek models

- [#1428](#1428)
[`4d71162`](4d71162)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to
stagehand agent

## @browserbasehq/[email protected]

### Minor Changes

- [#1459](#1459)
[`abb3469`](abb3469)
Thanks [@monadoid](https://github.com/monadoid)! - Added building of
binaries

- [#1457](#1457)
[`5fc1281`](5fc1281)
Thanks [@monadoid](https://github.com/monadoid)! - First changeset for
stagehand-server

- [#1469](#1469)
[`d634d45`](d634d45)
Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary
builds

### Patch Changes

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

## @browserbasehq/[email protected]

### Patch Changes

- [#1373](#1373)
[`cadd192`](cadd192)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot
collector in agent evals cli

- Updated dependencies
\[[`0f3991e`](0f3991e),
[`e0e22e0`](e0e22e0),
[`f261051`](f261051),
[`e021674`](e021674),
[`6a5496f`](6a5496f),
[`fea1700`](fea1700),
[`5b288d9`](5b288d9),
[`e822f5a`](e822f5a),
[`638efc7`](638efc7),
[`a890f16`](a890f16),
[`934f492`](934f492),
[`bd2db92`](bd2db92),
[`51e0170`](51e0170),
[`05f5580`](05f5580),
[`f56a9c2`](f56a9c2),
[`b40ae11`](b40ae11),
[`0d2b398`](0d2b398),
[`cd01f29`](cd01f29),
[`a734fca`](a734fca),
[`b342acf`](b342acf),
[`2987cd1`](2987cd1),
[`dfab1d5`](dfab1d5),
[`4d71162`](4d71162)]:
    -   @browserbasehq/[email protected]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants