-
-
Notifications
You must be signed in to change notification settings - Fork 533
[18.0] queue_job: refactor job acquisition #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 18.0
Are you sure you want to change the base?
Conversation
|
Hi @guewen, |
265b06e to
8d0b9c9
Compare
eff0043 to
d6c907a
Compare
In this commit we cleanly separate the job acquisition (i.e. verifying the job is in the exepected state, marking it started and locking it) from job execution. We also avoid trying to start the job if it is already locked by using SKIP LOCKED and exiting early. Indeed in such situations the job is likely already being handled by another worker so there is no point trying to start it, so we exit early and let it be handled either by the other worker or the dead job requeuer.
d6c907a to
367ab80
Compare
| " to make this tests work", | ||
| ) | ||
|
|
||
| return job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to JobCommonCase
|
This is ready to review. It can be tested with a small number of workers and a root channel capacity greater than the number of workers and test jobs of duration >= 10 sec. If the test graph has enough parallelism, you will see warnings about dead jobs being requeued (the jobs in state enqueued that are not picked up by workers in due time), but each job should execute only once. Example: The |
In this PR we cleanly separate the job acquisition (i.e. verifying the job is in the exepected state, marking it started and locking it) from job execution.
We also avoid trying to start the job if it is already locked by using SKIP LOCKED and exiting early. Indeed, in such situations, the job is likely already being handled by another worker so there is no point trying to start it, so we exit early and let it be handled either by the other worker or the dead job requeuer.
Following-up on #859 (comment)
maybe fixes #858