Skip to content

Conversation

@adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Feb 9, 2026

Related to ENG-2492

Description Of Changes

Increases the grace period (from default 1 second to 5 minutes) automatically on any jobs scheduled for daily execution that use APScheduler.

See ticket comments for more context motivating the change: we saw our 'GVL cache reload' task, which is scheduled to run once daily, miss an execution on a production deployment. the likely root cause was a scheduler 'misfire' that lasted for longer than the default 1s grace period.

as a mitigation, we're overriding the scheduler to detect when scheduling a daily (or less frequent) job, and providing a significantly longer grace period. there should be no negative impact here: these jobs are scheduled to run once daily, so running within 5 minutes after the scheduled time should be totally acceptable.

Code Changes

  • Mixin/override our core APSchedulers so that they set misfire_grace_time=300 for any jobs scheduled for daily execution that don't have a misfire_grace_time otherwise set

Steps to Confirm

  1. a bit hard to test this manually, but unit tests should cover us

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Contributor

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Feb 10, 2026 8:49pm
fides-privacy-center Ignored Ignored Feb 10, 2026 8:49pm

Request Review

@adamsachs adamsachs changed the title allow 5 minute grace period for daily scheduled jobs ENG-2492: allow 5 minute grace period for daily scheduled jobs Feb 9, 2026
@adamsachs adamsachs marked this pull request as ready for review February 10, 2026 15:32
@adamsachs adamsachs requested a review from a team as a code owner February 10, 2026 15:32
@adamsachs adamsachs requested review from thabofletcher and removed request for a team February 10, 2026 15:32
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

Successfully addresses previous review feedback by implementing support for both string-based cron triggers and CronTrigger instances when applying the 5-minute grace period.

Key Changes:

  • Added _DailyJobGracePeriodMixin that intercepts add_job calls and automatically applies a 300-second grace period to daily (or less frequent) cron jobs
  • Enhanced _is_daily_or_less_frequent_cron() to handle both trigger="cron" string format and CronTrigger object instances by extracting and checking the hour field
  • Used regex pattern to detect specific hour values (e.g., 3, "2") vs wildcards/ranges/lists (e.g., *, */2, 2,14)
  • Respects explicit misfire_grace_time values when provided by callers
  • Applied to both GracefulBackgroundScheduler and GracefulAsyncIOScheduler
  • Comprehensive test coverage including both trigger formats, edge cases, and override behavior

Previous feedback addressed:

  • Both previous thread comments have been fully resolved with CronTrigger support in implementation and tests

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation is well-designed with comprehensive test coverage. All previous review feedback has been addressed by adding CronTrigger instance support. The change is backwards-compatible (respects explicit misfire_grace_time values), solves a real production issue (missed daily job executions), and includes thorough testing of both string and CronTrigger-based job scheduling. The logic is clear, well-documented, and appropriately scoped to daily jobs only.
  • No files require special attention

Important Files Changed

Filename Overview
src/fides/api/tasks/scheduled/scheduler.py Implemented mixin to auto-apply 5-minute grace period to daily cron jobs, supports both string triggers and CronTrigger instances
tests/ops/tasks/test_scheduler_grace_period.py Comprehensive test coverage for grace period logic, including both string triggers and CronTrigger instances

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Additional Comments (2)

src/fides/api/service/privacy_request/email_batch_service.py
Misfire handling may run jobs late

With misfire_grace_time=300 (src/fides/api/service/privacy_request/email_batch_service.py:160-168), a scheduler outage/lag under 5 minutes will now execute the missed run after restart. Since this job triggers batch email sends, that changes operational behavior (it will run even if the service was down at the scheduled time). If this is intentional, it should be called out explicitly and/or made configurable so operators can disable/adjust it per environment; otherwise it can produce unexpected sends after deployments/restarts.


src/fides/api/service/privacy_request/request_service.py
Hardcoded grace period constant

misfire_grace_time=300 is duplicated as a bare literal here (src/fides/api/service/privacy_request/request_service.py:214-227) and in email_batch_service.py. Since this value is meant to apply across “daily” jobs, it’s easy for these to drift. Consider hoisting to a single named constant (or config) used by all daily scheduled jobs so it can’t diverge unintentionally.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@adamsachs
Copy link
Contributor Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 38 to 45
if (
trigger == "cron"
and "misfire_grace_time"
not in kwargs # respect user-specified grace period
and _is_daily_or_less_frequent(kwargs)
):
kwargs["misfire_grace_time"] = DEFAULT_DAILY_JOBS_GRACE_PERIOD
return super().add_job(func, trigger=trigger, **kwargs) # type: ignore[misc]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cron detection is incorrect

_DailyJobGracePeriodMixin.add_job() only applies the grace period when trigger == "cron", but the codebase schedules cron jobs both via trigger="cron" and by passing a CronTrigger instance (e.g. trigger=cron_trigger in src/fides/api/service/privacy_request/email_batch_service.py:160-167). In those cases trigger is not the string "cron", so the 5-minute grace period will never be applied to those daily jobs (including the production job motivating this PR if it uses create_cron_trigger()). This should detect CronTrigger objects as well (and extract their fields) so daily jobs consistently get the intended misfire_grace_time.

Comment on lines 53 to 100
class TestDailyJobGracePeriodMixin:
@pytest.mark.parametrize(
"trigger,add_job_kwargs,expect_grace_time",
[
# Daily cron: should auto-inject grace period
(
"cron",
{"hour": 3, "minute": 0},
DEFAULT_DAILY_JOBS_GRACE_PERIOD,
),
(
"cron",
{"hour": "2", "minute": "0", "day": "*", "timezone": "US/Eastern"},
DEFAULT_DAILY_JOBS_GRACE_PERIOD,
),
# Hourly/sub-hourly cron: should NOT inject
("cron", {"hour": "*"}, None),
("cron", {"minute": "*/5"}, None),
("cron", {"hour": "*/2", "minute": 0}, None),
# Non-cron trigger: should NOT inject
("interval", {"seconds": 300}, None),
# Explicit override: should preserve caller's value
("cron", {"hour": 3, "misfire_grace_time": 60}, 60),
],
ids=[
"daily-cron-injects",
"daily-cron-with-extras-injects",
"hourly-cron-skipped",
"sub-hourly-cron-skipped",
"every-n-hours-cron-skipped",
"interval-trigger-skipped",
"explicit-override-preserved",
],
)
def test_background_scheduler_grace_period(
self, trigger, add_job_kwargs, expect_grace_time
):
sched = GracefulBackgroundScheduler()
with patch.object(BackgroundScheduler, "add_job") as mock_add_job:
sched.add_job(lambda: None, trigger=trigger, **add_job_kwargs)

mock_add_job.assert_called_once()
call_kwargs = mock_add_job.call_args.kwargs

if expect_grace_time is not None:
assert call_kwargs["misfire_grace_time"] == expect_grace_time
else:
assert "misfire_grace_time" not in call_kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test misses CronTrigger case

The injected grace period logic is only exercised with trigger="cron" (string) in these tests, but much of the codebase schedules cron jobs with an actual CronTrigger instance (e.g. trigger=cron_trigger). Given the current implementation, those jobs won’t get the grace period, and this test suite wouldn’t catch it. Add a test that schedules with trigger=create_cron_trigger(...) (or a CronTrigger directly) and asserts the grace period is applied.

def _is_daily_or_less_frequent(trigger_args: dict) -> bool:
"""A cron job with a specific hour value runs at most once per day."""
hour = trigger_args.get("hour")
return hour is not None and "*" not in str(hour) and "/" not in str(hour)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs detection for "," not in str(hour) and "-" not in str(hour) as well to handle cases like hour being 2,14 (twice daily) and 0-3 (4 times)

@thabofletcher
Copy link
Contributor

@adamsachs generally looks good, please see my comment about the cron check for is daily or less frequent. Also the greptile gripe seems valid but I'm not familiar enough to say definitively, but I'd say it is probably worht applying this to CronTrigger as well

@adamsachs
Copy link
Contributor Author

@greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@thabofletcher thabofletcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 - even more robust than my proposed solution (although as you mentioned in the PR I can't test)

@adamsachs adamsachs added this pull request to the merge queue Feb 10, 2026
Merged via the queue into main with commit d0db3bc Feb 10, 2026
55 checks passed
@adamsachs adamsachs deleted the asachs/ENG-2492 branch February 10, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants