-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Jobs are considered "stuck" based on the timeout defined at the client level only, and not the timeout that can be set by individual workers (per Worker-level job timeouts).
Repro
- Create a worker that defines a timeout function greater than the client-level timeout
- Start a job on that worker that runs for a time greater than the client-level timeout
// configure river client to have a job timeout of 1 minute
type StuckJobArgs struct {}
func (StuckJobArgs) Kind() string {
return "stuck"
}
type StuckWorker struct {
river.WorkerDefaults[StuckJobArgs]
}
func (w *StuckWorker) Timeout(job *river.Job[StuckJobArgs]) time.Duration {
// this job has a higher timeout that the default
return 2 * time.Minute
}
func (w *StuckWorker) Work(ctx context.Context, job *river.Job[StuckJobArgs]) error {
time.Sleep(2 * time.Minute)
}In logs - note the timeout is not the custom one defined above:
WRN jobexecutor.JobExecutor: Job appears to be stuck source=river job_id=577 kind=stuck timeout=1m0s
INF producer: Producer job counts source=river num_completed_jobs=0 num_jobs_running=1 num_jobs_stuck=1 queue=default
Expected
When a worker-level timeout is defined, the job should be considered stuck once it has exeeded the worker-level timeout.
Actual
The worker-level timeout is not taken into account, and the job is considered stuck after it exceeds client-level timeout.
Thanks for your work on River!
Metadata
Metadata
Assignees
Labels
No labels