-
-
Notifications
You must be signed in to change notification settings - Fork 11
job: Close the health reporter instead of stopping #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When timer/one-shot/observer job finishes it should close the health reporting instead of stopping it in order not to accumulate endless number of health statuses for one-off etc. jobs. Signed-off-by: Jussi Maki <[email protected]>
|
There's also another related issue on the |
|
I'm slightly unsure of this change. Do we want the jobs themselves to close their health scopes, or should we just have them call My main concern is that right now it's way too easy to leave a lot of these around and if we ever start spawning bunch of these with generated names we'll end up with a huge health table, so I feel like erring on the side of cleaning up might be better. Thoughts? |
|
I'm more into regular GC of the stopped health entries. Keeping their status stopped as long as the cilium-agent runs sound like it could easily become a memory leak. JobGroups might never get stopped, I wouldn't rely on that. (agree on this) Could we mark them as stopped, the metric should report that they were stopped, and immediately after closing them. Closing them instead of reporting as stopped could produce some weird visuals on the metrics which are not that intuitive. |
Guess this begs the question when is it appropriate to stop vs close the reporter. In my view deleting reporters following something like a Job ceasing to run makes sense. For the oneshot, thats supposed to be auto-resilient right? So its either in states:
There may be instances where we are interested in retaining a report from a OneShot that failed and we decided we didn't want to keep retrying maybe? But in that case I think the above behaviour should be the default and we could account for that use case seperately. |
When timer/one-shot/observer job finishes it should close the health reporting instead of stopping it in order not to accumulate endless number of health statuses for one-off etc. jobs.