[fix]: refresh agent cache on action failure #1472

seanmcguire12 · 2025-12-24T00:50:01Z

why

when the agent is replaying cached actions and an xpath (from the cache) does not resolve, the agent will heal and reprocess the page
however, the cached entry is not getting updated with the "healed" selector

what changed

added logic so agent cache replays detect when deterministic actions self-heal and immediately rewrite the cache entry with the updated selectors/timestamps, preventing perpetual failures on corrupted entries

test plan

added agent-cache-self-heal.spec.ts end-to-end test that runs an agent with a real cacheDir, corrupts a cached selector, reruns the workflow, and asserts the cache file now contains the healed XPath

Summary by cubic

Fixes agent cache not refreshing after self-heal, so healed selectors are saved and future replays don’t keep failing. Addresses Linear STG-1106.

Bug Fixes
- Detect updated actions during replay (act/fillForm) and update the cache entry when actions change.
- Persist refreshed steps and timestamp via refreshAgentCacheEntry.
- Add e2e test and CI job to verify a corrupted selector is healed and the cache file is rewritten.

^{Written for commit 9634d84. Summary will update automatically on new commits.}

changeset-bot · 2025-12-24T00:50:05Z

🦋 Changeset detected

Latest commit: 9634d84

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@browserbasehq/stagehand	Patch
@browserbasehq/stagehand-evals	Patch
@browserbasehq/stagehand-server	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

seanmcguire12 · 2025-12-24T00:53:00Z

@greptileai

greptile-apps · 2025-12-24T00:59:32Z

Greptile Summary

Fixed agent cache not refreshing after self-heal, preventing perpetual replay failures on corrupted selectors.

Core Fix: Modified replayAgentCacheEntry to track when deterministic actions change during replay (lines 506-529)
Change Detection: Implemented haveActionsChanged to compare original vs healed actions across selector, description, method, and arguments (lines 720-751)
Cache Refresh: Added refreshAgentCacheEntry to persist healed selectors and update timestamps (lines 753-787)
Return Type Updates: Changed replay methods to return updated steps instead of void, enabling change tracking
Test Coverage: Comprehensive e2e test validates the full flow by corrupting a selector and verifying it gets healed and persisted

The implementation correctly integrates with the existing self-heal mechanism in ActHandler.takeDeterministicAction, which already handles selector failures by re-processing the page with fresh snapshots. This PR bridges the gap by detecting when that healing occurs during cache replay and immediately updating the cache file.

Confidence Score: 4/5

This PR is safe to merge with minor considerations
The implementation is well-designed and thoroughly tested with an e2e test that validates the core functionality. The change detection logic is comprehensive, comparing all relevant action properties. The code integrates cleanly with existing self-heal mechanisms. Score is 4/5 due to the inherent complexity of cache invalidation logic and the fact that shallow equality checks on arguments assume they remain strings (which they do per the type definition, but this could be a maintenance concern if the type evolves)
No files require special attention

Important Files Changed

Filename	Overview
packages/core/lib/v3/tests/agent-cache-self-heal.spec.ts	Well-structured e2e test that validates cache self-healing by corrupting a selector and verifying it gets updated after replay
packages/core/lib/v3/cache/AgentCache.ts	Core fix that detects when deterministic actions self-heal during replay and refreshes cache entries with updated selectors. Implementation is thorough with proper change detection and logging

Sequence Diagram

sequenceDiagram
    participant Agent
    participant AgentCache
    participant ActHandler
    participant Page

    Note over Agent,AgentCache: Cache Replay Flow with Self-Heal

    Agent->>AgentCache: tryReplay(context)
    AgentCache->>AgentCache: Load cached entry from disk
    
    loop For each cached step
        AgentCache->>AgentCache: executeAgentReplayStep(step)
        
        alt Step type is "act" or "fillForm"
            AgentCache->>ActHandler: takeDeterministicAction(cachedAction)
            ActHandler->>Page: Try to execute with cached selector
            
            alt Cached selector works
                Page-->>ActHandler: Success
                ActHandler-->>AgentCache: result.success=true, actions=[original]
                Note over AgentCache: No change detected
            else Cached selector fails (e.g., xpath=/yeee)
                Page-->>ActHandler: Error (selector not found)
                
                Note over ActHandler: Self-heal mode activated
                ActHandler->>Page: captureHybridSnapshot()
                Page-->>ActHandler: Fresh DOM snapshot
                ActHandler->>ActHandler: getActionFromLLM(fresh snapshot)
                ActHandler->>Page: Execute with healed selector
                Page-->>ActHandler: Success
                ActHandler-->>AgentCache: result.success=true, actions=[healed]
                Note over AgentCache: Change detected!
            end
            
            AgentCache->>AgentCache: haveActionsChanged(original, updated)
            alt Actions changed
                AgentCache->>AgentCache: Mark stepsChanged=true
                AgentCache->>AgentCache: Store updated step
            end
        else Other step types
            AgentCache->>Page: Execute step (goto, scroll, wait, etc.)
            Note over AgentCache: No action tracking needed
        end
    end
    
    alt stepsChanged == true
        AgentCache->>AgentCache: refreshAgentCacheEntry(updatedSteps)
        AgentCache->>AgentCache: Write updated cache to disk
        Note over AgentCache: Cache now contains healed selectors
    end
    
    AgentCache-->>Agent: Return result with cacheHit=true

cubic-dev-ai

No issues found across 4 files

miguelg719 · 2025-12-26T21:04:07Z

packages/core/lib/v3/cache/AgentCache.ts

-        await this.executeAgentReplayStep(step, ctx, handler);
+        const replayedStep =
+          (await this.executeAgentReplayStep(step, ctx, handler)) ?? step;
+        stepsChanged ||= replayedStep !== step;


first time I see this shorthand

@tkattkat

This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#1461](#1461) [`0f3991e`](0f3991e) Thanks [@tkattkat](https://github.com/tkattkat)! - Move hybrid mode out of experimental - [#1433](#1433) [`e0e22e0`](e0e22e0) Thanks [@tkattkat](https://github.com/tkattkat)! - Put hybrid mode behind experimental - [#1456](#1456) [`f261051`](f261051) Thanks [@shrey150](https://github.com/shrey150)! - Invoke page.hover for agent move action - [#1473](#1473) [`e021674`](e021674) Thanks [@shrey150](https://github.com/shrey150)! - Add safety confirmation support for OpenAI + Google CUA - [#1399](#1399) [`6a5496f`](6a5496f) Thanks [@tkattkat](https://github.com/tkattkat)! - Ensure cua agent is killed when stagehand.close is called - [#1436](#1436) [`fea1700`](fea1700) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix auto-load key for act/extract/observe parametrized models on api - [#1439](#1439) [`5b288d9`](5b288d9) Thanks [@tkattkat](https://github.com/tkattkat)! - Remove base64 from agent actions array ( still present in messages object ) - [#1408](#1408) [`e822f5a`](e822f5a) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - allow for act() cache hit when variable values change - [#1472](#1472) [`638efc7`](638efc7) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: agent cache not refreshed on action failure - [#1424](#1424) [`a890f16`](a890f16) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: "Error: -32000 Failed to convert response to JSON: CBOR: stack limit exceeded" - [#1418](#1418) [`934f492`](934f492) Thanks [@miguelg719](https://github.com/miguelg719)! - Cleanup handlers and bus listeners on close - [#1430](#1430) [`bd2db92`](bd2db92) Thanks [@shrey150](https://github.com/shrey150)! - Fix CUA model coordinate translation - [#1465](#1465) [`51e0170`](51e0170) Thanks [@miguelg719](https://github.com/miguelg719)! - Add media resolution high provider option to gemini 3 hybrid agent - [#1431](#1431) [`05f5580`](05f5580) Thanks [@tkattkat](https://github.com/tkattkat)! - Update the cache handling for agent - [#1432](#1432) [`f56a9c2`](f56a9c2) Thanks [@tkattkat](https://github.com/tkattkat)! - Deprecate cua: true in favor of mode: "cua" - [#1406](#1406) [`b40ae11`](b40ae11) Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for hovering with coordinates ( page.hover ) - [#1407](#1407) [`0d2b398`](0d2b398) Thanks [@tkattkat](https://github.com/tkattkat)! - Clean up page methods - [#1412](#1412) [`cd01f29`](cd01f29) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: load GOOGLE_API_KEY from .env - [#1462](#1462) [`a734fca`](a734fca) Thanks [@shrey150](https://github.com/shrey150)! - fix: correctly pass userDataDir to chrome launcher - [#1466](#1466) [`b342acf`](b342acf) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - move playwright to optional dependencies - [#1440](#1440) [`2987cd1`](2987cd1) Thanks [@tkattkat](https://github.com/tkattkat)! - [Feature] support excluding tools from agent - [#1455](#1455) [`dfab1d5`](dfab1d5) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - update aisdk client to better enforce structured output with deepseek models - [#1428](#1428) [`4d71162`](4d71162) Thanks [@tkattkat](https://github.com/tkattkat)! - Add "hybrid" mode to stagehand agent ## @browserbasehq/[email protected] ### Minor Changes - [#1459](#1459) [`abb3469`](abb3469) Thanks [@monadoid](https://github.com/monadoid)! - Added building of binaries - [#1457](#1457) [`5fc1281`](5fc1281) Thanks [@monadoid](https://github.com/monadoid)! - First changeset for stagehand-server - [#1469](#1469) [`d634d45`](d634d45) Thanks [@monadoid](https://github.com/monadoid)! - Bump to test binary builds ### Patch Changes - Updated dependencies \[[`0f3991e`](0f3991e), [`e0e22e0`](e0e22e0), [`f261051`](f261051), [`e021674`](e021674), [`6a5496f`](6a5496f), [`fea1700`](fea1700), [`5b288d9`](5b288d9), [`e822f5a`](e822f5a), [`638efc7`](638efc7), [`a890f16`](a890f16), [`934f492`](934f492), [`bd2db92`](bd2db92), [`51e0170`](51e0170), [`05f5580`](05f5580), [`f56a9c2`](f56a9c2), [`b40ae11`](b40ae11), [`0d2b398`](0d2b398), [`cd01f29`](cd01f29), [`a734fca`](a734fca), [`b342acf`](b342acf), [`2987cd1`](2987cd1), [`dfab1d5`](dfab1d5), [`4d71162`](4d71162)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - [#1373](#1373) [`cadd192`](cadd192) Thanks [@tkattkat](https://github.com/tkattkat)! - Update screenshot collector in agent evals cli - Updated dependencies \[[`0f3991e`](0f3991e), [`e0e22e0`](e0e22e0), [`f261051`](f261051), [`e021674`](e021674), [`6a5496f`](6a5496f), [`fea1700`](fea1700), [`5b288d9`](5b288d9), [`e822f5a`](e822f5a), [`638efc7`](638efc7), [`a890f16`](a890f16), [`934f492`](934f492), [`bd2db92`](bd2db92), [`51e0170`](51e0170), [`05f5580`](05f5580), [`f56a9c2`](f56a9c2), [`b40ae11`](b40ae11), [`0d2b398`](0d2b398), [`cd01f29`](cd01f29), [`a734fca`](a734fca), [`b342acf`](b342acf), [`2987cd1`](2987cd1), [`dfab1d5`](dfab1d5), [`4d71162`](4d71162)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

refresh agent cache on action failure

8ef4d9b

update log level

8d01911

seanmcguire12 added 15 commits December 23, 2025 17:13

fix test

144f207

fix test

db3ab46

revert agent example change

babbf05

prettier

813a473

wait for networkidle in test

935681d

prettier

cdaf54f

add retry logic for file finding

ce378a7

debug logging for failing cache test

ef71095

add temp ci step to diagnose failing test

b2fda3b

fix typing

498b5e3

bump verbosity level

6e1b556

add logging for filepath

27dd43d

add openai key to turbo.json

916fad0

revert temp changes

f170317

revert debug logging in test

9634d84

seanmcguire12 marked this pull request as ready for review December 24, 2025 20:16

cubic-dev-ai bot reviewed Dec 24, 2025

View reviewed changes

miguelg719 reviewed Dec 26, 2025

View reviewed changes

miguelg719 approved these changes Dec 26, 2025

View reviewed changes

seanmcguire12 merged commit 638efc7 into main Dec 26, 2025
23 checks passed

github-actions bot mentioned this pull request Dec 26, 2025

Version Packages #1414

Merged

coderabbitai bot mentioned this pull request Dec 26, 2025

reverse merge TestsigmaInc/stagehand#12

Open

github-actions bot mentioned this pull request Dec 25, 2025

Version Packages CloudEngineHub/stagehand#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix]: refresh agent cache on action failure #1472

[fix]: refresh agent cache on action failure #1472

Uh oh!

seanmcguire12 commented Dec 24, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

changeset-bot bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

seanmcguire12 commented Dec 24, 2025

Uh oh!

greptile-apps bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

miguelg719 Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[fix]: refresh agent cache on action failure #1472

[fix]: refresh agent cache on action failure #1472

Uh oh!

Conversation

seanmcguire12 commented Dec 24, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

seanmcguire12 commented Dec 24, 2025

Uh oh!

greptile-apps bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

miguelg719 Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seanmcguire12 commented Dec 24, 2025 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Dec 24, 2025 •

edited

Loading

greptile-apps bot commented Dec 24, 2025 •

edited

Loading