-
Notifications
You must be signed in to change notification settings - Fork 321
Mitigate Gradle daemon oomkills by monkey patching cgroup v2 into Gradle
#10247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
AlexeyKuznetsov-DD
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution! Hope that will fix DaemonDisappearedException and freezing jobs on CI!
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 13 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.081 s) : 0, 1080682
Total [baseline] (8.792 s) : 0, 8791758
Agent [candidate] (1.081 s) : 0, 1081282
Total [candidate] (8.727 s) : 0, 8726663
section iast
Agent [baseline] (1.219 s) : 0, 1218553
Total [baseline] (9.334 s) : 0, 9333701
Agent [candidate] (1.222 s) : 0, 1222274
Total [candidate] (9.289 s) : 0, 9288621
gantt
title insecure-bank - break down per module: candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.183 ms) : 0, 1183
crashtracking [candidate] (1.205 ms) : 0, 1205
BytebuddyAgent [baseline] (648.755 ms) : 0, 648755
BytebuddyAgent [candidate] (649.376 ms) : 0, 649376
GlobalTracer [baseline] (282.092 ms) : 0, 282092
GlobalTracer [candidate] (282.495 ms) : 0, 282495
AppSec [baseline] (32.274 ms) : 0, 32274
AppSec [candidate] (32.277 ms) : 0, 32277
Debugger [baseline] (67.593 ms) : 0, 67593
Debugger [candidate] (67.095 ms) : 0, 67095
Remote Config [baseline] (636.389 µs) : 0, 636
Remote Config [candidate] (608.533 µs) : 0, 609
Telemetry [baseline] (9.026 ms) : 0, 9026
Telemetry [candidate] (9.081 ms) : 0, 9081
Flare Poller [baseline] (3.691 ms) : 0, 3691
Flare Poller [candidate] (3.739 ms) : 0, 3739
section iast
crashtracking [baseline] (1.184 ms) : 0, 1184
crashtracking [candidate] (1.182 ms) : 0, 1182
BytebuddyAgent [baseline] (787.833 ms) : 0, 787833
BytebuddyAgent [candidate] (790.278 ms) : 0, 790278
GlobalTracer [baseline] (255.16 ms) : 0, 255160
GlobalTracer [candidate] (255.87 ms) : 0, 255870
AppSec [baseline] (32.62 ms) : 0, 32620
AppSec [candidate] (32.685 ms) : 0, 32685
Debugger [baseline] (66.629 ms) : 0, 66629
Debugger [candidate] (67.117 ms) : 0, 67117
Remote Config [baseline] (614.333 µs) : 0, 614
Remote Config [candidate] (587.288 µs) : 0, 587
Telemetry [baseline] (8.45 ms) : 0, 8450
Telemetry [candidate] (8.537 ms) : 0, 8537
Flare Poller [baseline] (3.516 ms) : 0, 3516
Flare Poller [candidate] (3.563 ms) : 0, 3563
IAST [baseline] (27.201 ms) : 0, 27201
IAST [candidate] (27.134 ms) : 0, 27134
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.081 s) : 0, 1080690
Total [baseline] (10.792 s) : 0, 10792447
Agent [candidate] (1.09 s) : 0, 1089567
Total [candidate] (10.765 s) : 0, 10764839
section appsec
Agent [baseline] (1.281 s) : 0, 1281073
Total [baseline] (11.041 s) : 0, 11041398
Agent [candidate] (1.274 s) : 0, 1274211
Total [candidate] (11.0 s) : 0, 11000298
section iast
Agent [baseline] (1.229 s) : 0, 1229102
Total [baseline] (11.16 s) : 0, 11159730
Agent [candidate] (1.223 s) : 0, 1223382
Total [candidate] (11.213 s) : 0, 11213289
section profiling
Agent [baseline] (1.207 s) : 0, 1206671
Total [baseline] (10.951 s) : 0, 10950670
Agent [candidate] (1.207 s) : 0, 1207121
Total [candidate] (10.943 s) : 0, 10943020
gantt
title petclinic - break down per module: candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.193 ms) : 0, 1193
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (647.647 ms) : 0, 647647
BytebuddyAgent [candidate] (654.162 ms) : 0, 654162
GlobalTracer [baseline] (282.504 ms) : 0, 282504
GlobalTracer [candidate] (283.833 ms) : 0, 283833
AppSec [baseline] (32.251 ms) : 0, 32251
AppSec [candidate] (32.659 ms) : 0, 32659
Debugger [baseline] (67.538 ms) : 0, 67538
Debugger [candidate] (68.604 ms) : 0, 68604
Remote Config [baseline] (629.152 µs) : 0, 629
Remote Config [candidate] (610.519 µs) : 0, 611
Telemetry [baseline] (9.069 ms) : 0, 9069
Telemetry [candidate] (9.101 ms) : 0, 9101
Flare Poller [baseline] (4.452 ms) : 0, 4452
Flare Poller [candidate] (3.787 ms) : 0, 3787
section appsec
crashtracking [baseline] (1.206 ms) : 0, 1206
crashtracking [candidate] (1.19 ms) : 0, 1190
BytebuddyAgent [baseline] (701.888 ms) : 0, 701888
BytebuddyAgent [candidate] (695.857 ms) : 0, 695857
GlobalTracer [baseline] (261.545 ms) : 0, 261545
GlobalTracer [candidate] (260.642 ms) : 0, 260642
AppSec [baseline] (173.514 ms) : 0, 173514
AppSec [candidate] (175.353 ms) : 0, 175353
Debugger [baseline] (68.453 ms) : 0, 68453
Debugger [candidate] (67.153 ms) : 0, 67153
Remote Config [baseline] (717.506 µs) : 0, 718
Remote Config [candidate] (703.912 µs) : 0, 704
Telemetry [baseline] (9.175 ms) : 0, 9175
Telemetry [candidate] (8.951 ms) : 0, 8951
Flare Poller [baseline] (3.839 ms) : 0, 3839
Flare Poller [candidate] (3.85 ms) : 0, 3850
IAST [baseline] (24.854 ms) : 0, 24854
IAST [candidate] (24.838 ms) : 0, 24838
section iast
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.185 ms) : 0, 1185
BytebuddyAgent [baseline] (795.136 ms) : 0, 795136
BytebuddyAgent [candidate] (791.304 ms) : 0, 791304
GlobalTracer [baseline] (256.896 ms) : 0, 256896
GlobalTracer [candidate] (255.405 ms) : 0, 255405
AppSec [baseline] (32.798 ms) : 0, 32798
AppSec [candidate] (33.666 ms) : 0, 33666
Debugger [baseline] (67.584 ms) : 0, 67584
Debugger [candidate] (66.687 ms) : 0, 66687
Remote Config [baseline] (576.245 µs) : 0, 576
Remote Config [candidate] (599.939 µs) : 0, 600
Telemetry [baseline] (8.522 ms) : 0, 8522
Telemetry [candidate] (8.565 ms) : 0, 8565
Flare Poller [baseline] (3.469 ms) : 0, 3469
Flare Poller [candidate] (3.479 ms) : 0, 3479
IAST [baseline] (27.402 ms) : 0, 27402
IAST [candidate] (27.128 ms) : 0, 27128
section profiling
crashtracking [baseline] (1.213 ms) : 0, 1213
crashtracking [candidate] (1.216 ms) : 0, 1216
BytebuddyAgent [baseline] (702.605 ms) : 0, 702605
BytebuddyAgent [candidate] (703.805 ms) : 0, 703805
GlobalTracer [baseline] (221.277 ms) : 0, 221277
GlobalTracer [candidate] (221.519 ms) : 0, 221519
AppSec [baseline] (32.173 ms) : 0, 32173
AppSec [candidate] (32.247 ms) : 0, 32247
Debugger [baseline] (68.335 ms) : 0, 68335
Debugger [candidate] (68.497 ms) : 0, 68497
Remote Config [baseline] (656.8 µs) : 0, 657
Remote Config [candidate] (637.373 µs) : 0, 637
Telemetry [baseline] (8.839 ms) : 0, 8839
Telemetry [candidate] (8.87 ms) : 0, 8870
Flare Poller [baseline] (3.703 ms) : 0, 3703
Flare Poller [candidate] (3.716 ms) : 0, 3716
ProfilingAgent [baseline] (97.987 ms) : 0, 97987
ProfilingAgent [candidate] (96.805 ms) : 0, 96805
Profiling [baseline] (98.576 ms) : 0, 98576
Profiling [candidate] (97.378 ms) : 0, 97378
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 0 performance regressions! Performance is the same for 15 metrics, 18 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section baseline
no_agent (1.172 ms) : 1161, 1184
. : milestone, 1172,
iast (3.344 ms) : 3299, 3390
. : milestone, 3344,
iast_FULL (5.651 ms) : 5594, 5707
. : milestone, 5651,
iast_GLOBAL (3.658 ms) : 3605, 3711
. : milestone, 3658,
profiling (2.059 ms) : 2040, 2077
. : milestone, 2059,
tracing (1.766 ms) : 1751, 1780
. : milestone, 1766,
section candidate
no_agent (1.171 ms) : 1159, 1182
. : milestone, 1171,
iast (3.22 ms) : 3179, 3261
. : milestone, 3220,
iast_FULL (5.622 ms) : 5566, 5677
. : milestone, 5622,
iast_GLOBAL (3.443 ms) : 3389, 3497
. : milestone, 3443,
profiling (1.949 ms) : 1932, 1966
. : milestone, 1949,
tracing (1.791 ms) : 1776, 1805
. : milestone, 1791,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section baseline
no_agent (18.4 ms) : 18209, 18592
. : milestone, 18400,
appsec (18.641 ms) : 18453, 18829
. : milestone, 18641,
code_origins (18.453 ms) : 18265, 18640
. : milestone, 18453,
iast (17.805 ms) : 17628, 17983
. : milestone, 17805,
profiling (21.009 ms) : 20796, 21222
. : milestone, 21009,
tracing (17.647 ms) : 17471, 17824
. : milestone, 17647,
section candidate
no_agent (18.309 ms) : 18121, 18498
. : milestone, 18309,
appsec (18.685 ms) : 18495, 18875
. : milestone, 18685,
code_origins (18.047 ms) : 17865, 18228
. : milestone, 18047,
iast (17.931 ms) : 17750, 18111
. : milestone, 17931,
profiling (19.044 ms) : 18848, 19241
. : milestone, 19044,
tracing (18.083 ms) : 17904, 18263
. : milestone, 18083,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section baseline
no_agent (1.473 ms) : 1461, 1485
. : milestone, 1473,
appsec (3.657 ms) : 3443, 3871
. : milestone, 3657,
iast (2.215 ms) : 2150, 2280
. : milestone, 2215,
iast_GLOBAL (2.253 ms) : 2188, 2318
. : milestone, 2253,
profiling (2.065 ms) : 2012, 2118
. : milestone, 2065,
tracing (2.049 ms) : 1997, 2101
. : milestone, 2049,
section candidate
no_agent (1.472 ms) : 1460, 1483
. : milestone, 1472,
appsec (3.715 ms) : 3497, 3934
. : milestone, 3715,
iast (2.207 ms) : 2142, 2272
. : milestone, 2207,
iast_GLOBAL (2.26 ms) : 2194, 2325
. : milestone, 2260,
profiling (2.073 ms) : 2020, 2126
. : milestone, 2073,
tracing (2.039 ms) : 1988, 2091
. : milestone, 2039,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.58.0-SNAPSHOT~a618c4f808, baseline=1.58.0-SNAPSHOT~3428502f80
dateFormat X
axisFormat %s
section baseline
no_agent (14.896 s) : 14896000, 14896000
. : milestone, 14896000,
appsec (15.134 s) : 15134000, 15134000
. : milestone, 15134000,
iast (18.429 s) : 18429000, 18429000
. : milestone, 18429000,
iast_GLOBAL (17.92 s) : 17920000, 17920000
. : milestone, 17920000,
profiling (15.132 s) : 15132000, 15132000
. : milestone, 15132000,
tracing (14.764 s) : 14764000, 14764000
. : milestone, 14764000,
section candidate
no_agent (15.569 s) : 15569000, 15569000
. : milestone, 15569000,
appsec (14.79 s) : 14790000, 14790000
. : milestone, 14790000,
iast (18.464 s) : 18464000, 18464000
. : milestone, 18464000,
iast_GLOBAL (17.881 s) : 17881000, 17881000
. : milestone, 17881000,
profiling (14.919 s) : 14919000, 14919000
. : milestone, 14919000,
tracing (14.728 s) : 14728000, 14728000
. : milestone, 14728000,
|
cgroup v2 into Gradle
What Does This Do
Modify Gradle CGroup reporting to support CGroup V2. Via a custom and specific rewriting agent.
Motivation
CI Stability.
Some of our runners are using cgroup v2, but Gradle 8.x doesn't support cgroup v2 (cgroup v1 was added in 8.1 RC1). The support for CGroup v2 is targeted for 9.3 (already present in 9.3 RC1).
Without it, Gradle assumes the host memory which may be quite superior to the requested memory (k8s). In a way Gradle use the available memory as backpressure to how to schedules workers.
...and we're not yet ready to pass the 9.x peak yet.
Additional Notes
Multiple runs showed
~2 GiBmargin on jobs that regularly failed. This won't reduce memory consumption but should likely help to prevent exhaustion or oomkills of daemons.Patch here: https://github.com/bric3/gradle-cgroup2-patcher
Contributor Checklist
type:and (comp:orinst:) labels in addition to any useful labelsclose,fixor any linking keywords when referencing an issue.Use
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]