Skip to content

Conversation

@jack-berg
Copy link
Member

As mentioned #7986, I've been working through some ideas to improve the performance of the metric SDK under high contention.

To illustrate the impact on these changes, I've reworked MetricsBenchmark to include dimensions that impact record performance. The set of dimensions that play some role include:

  • Instrument type / aggregation (5): counter + sum, up down counter + sum, gauge + last value, histogram + explicit histogram, histogram + base2 expo histogram
  • instrument value type (2): double, long
  • memory mode (2): immutable, reuseable
  • temporality (2): cumulative, delta
  • exemplars recorded (2): true, false
  • threads (2): 1, 2
  • cardinality (2): 1, 100

That forms 2222225 = 320 unique test cases, which is just impractical. And so I narrow it down to the most meaningful dimensions:

  • eliminated instrument value type: while long vs. double matters somewhat, its not much
  • eliminated memory mode: immutable vs reusable mostly matters for the collect path
  • exemplars: can impact performance, but less important than other factors

With these eliminated, were down to 222*5 = 40 test cases, which is more reasonable.

I'm also using this as an opportunity to finish what @tylerbenson started and get into the routine of running benchmarks on each change on dedicated hardwhere, and publishing the results on https://open-telemetry.github.io/opentelemetry-java/benchmarks/

The unfinished problem was that the benchmarks in this repo are micro benchmarks. Their not very meaningful for end users and may even do more harm then good. What we need is a curated set of somewhat high level benchmarks, intentionally built to demonstrate / report on the types of performance characteristics that matter to end users.

This revamped MetricRecordBenchmark is the first of these. I will followup with dedicated benchmarks for other areas:

  • Log SDK record and export
  • Trace SDK record and export
  • Metric SDK export
  • Noop implementation

For reference, here are the results of the revamped MetricRecordBenchmark on my machine:

Benchmark                       (aggregationTemporality)  (cardinality)  (instrumentTypeAndAggregation)   Mode  Cnt      Score     Error  Units
MetricRecordBenchmark.threads1                     DELTA              1                     COUNTER_SUM  thrpt    5  13414.208 ± 243.504  ops/s
MetricRecordBenchmark.threads1                     DELTA              1             UP_DOWN_COUNTER_SUM  thrpt    5  12276.148 ± 105.900  ops/s
MetricRecordBenchmark.threads1                     DELTA              1                GAUGE_LAST_VALUE  thrpt    5  10896.580 ± 705.898  ops/s
MetricRecordBenchmark.threads1                     DELTA              1              HISTOGRAM_EXPLICIT  thrpt    5   6642.787 ± 674.574  ops/s
MetricRecordBenchmark.threads1                     DELTA              1     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5   3651.887 ± 304.134  ops/s
MetricRecordBenchmark.threads1                     DELTA            100                     COUNTER_SUM  thrpt    5   8359.025 ± 777.598  ops/s
MetricRecordBenchmark.threads1                     DELTA            100             UP_DOWN_COUNTER_SUM  thrpt    5   9247.253 ± 423.551  ops/s
MetricRecordBenchmark.threads1                     DELTA            100                GAUGE_LAST_VALUE  thrpt    5   9165.700 ± 143.755  ops/s
MetricRecordBenchmark.threads1                     DELTA            100              HISTOGRAM_EXPLICIT  thrpt    5   7300.896 ± 684.395  ops/s
MetricRecordBenchmark.threads1                     DELTA            100     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5   3858.246 ±  34.989  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE              1                     COUNTER_SUM  thrpt    5  12433.135 ± 148.315  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE              1             UP_DOWN_COUNTER_SUM  thrpt    5  13341.423 ± 242.611  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE              1                GAUGE_LAST_VALUE  thrpt    5  10628.592 ± 101.145  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE              1              HISTOGRAM_EXPLICIT  thrpt    5   6895.783 ± 740.681  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE              1     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5   4087.396 ± 435.895  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE            100                     COUNTER_SUM  thrpt    5  10402.076 ± 240.933  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE            100             UP_DOWN_COUNTER_SUM  thrpt    5   9199.368 ± 107.627  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE            100                GAUGE_LAST_VALUE  thrpt    5   9056.580 ± 297.773  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE            100              HISTOGRAM_EXPLICIT  thrpt    5   7475.743 ± 979.090  ops/s
MetricRecordBenchmark.threads1                CUMULATIVE            100     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5   3836.227 ± 131.765  ops/s
MetricRecordBenchmark.threads4                     DELTA              1                     COUNTER_SUM  thrpt    5   1577.822 ± 219.796  ops/s
MetricRecordBenchmark.threads4                     DELTA              1             UP_DOWN_COUNTER_SUM  thrpt    5   1615.582 ± 335.284  ops/s
MetricRecordBenchmark.threads4                     DELTA              1                GAUGE_LAST_VALUE  thrpt    5   1208.008 ± 165.999  ops/s
MetricRecordBenchmark.threads4                     DELTA              1              HISTOGRAM_EXPLICIT  thrpt    5    904.243 ±  22.615  ops/s
MetricRecordBenchmark.threads4                     DELTA              1     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5    869.229 ±  31.214  ops/s
MetricRecordBenchmark.threads4                     DELTA            100                     COUNTER_SUM  thrpt    5   1725.486 ± 240.360  ops/s
MetricRecordBenchmark.threads4                     DELTA            100             UP_DOWN_COUNTER_SUM  thrpt    5   1422.319 ± 594.337  ops/s
MetricRecordBenchmark.threads4                     DELTA            100                GAUGE_LAST_VALUE  thrpt    5   1560.890 ± 654.561  ops/s
MetricRecordBenchmark.threads4                     DELTA            100              HISTOGRAM_EXPLICIT  thrpt    5   1587.582 ± 458.715  ops/s
MetricRecordBenchmark.threads4                     DELTA            100     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5   1688.229 ± 181.653  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE              1                     COUNTER_SUM  thrpt    5   1540.747 ± 137.303  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE              1             UP_DOWN_COUNTER_SUM  thrpt    5   1429.698 ± 220.415  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE              1                GAUGE_LAST_VALUE  thrpt    5   1215.367 ± 546.045  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE              1              HISTOGRAM_EXPLICIT  thrpt    5   1237.215 ±  18.528  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE              1     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5    837.980 ±  23.871  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE            100                     COUNTER_SUM  thrpt    5   1602.628 ± 813.536  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE            100             UP_DOWN_COUNTER_SUM  thrpt    5   1717.663 ± 577.817  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE            100                GAUGE_LAST_VALUE  thrpt    5   1565.824 ± 298.550  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE            100              HISTOGRAM_EXPLICIT  thrpt    5   1352.174 ± 594.439  ops/s
MetricRecordBenchmark.threads4                CUMULATIVE            100     HISTOGRAM_BASE2_EXPONENTIAL  thrpt    5   1465.394 ± 313.072  ops/s

@jack-berg jack-berg requested a review from a team as a code owner January 21, 2026 23:04
with:
tool: 'jmh'
output-file-path: sdk/trace/build/jmh-result.json
output-file-path: sdk/all/build/jmh-result.json
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerbenson this benchmark-action only allows you to have a single output file path. This means we need to all the published benchmarks to be in a single module, such that we can run with them a single java -jar *-jmh.jar ... command. I think the opentelemetry-sdk artifact is a good spot for this.

This turns out to be a useful constraint as I think it will be nice to have all the public benchmarks colocated.

@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.14%. Comparing base (cbab60c) to head (1b4ce35).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #8000   +/-   ##
=========================================
  Coverage     90.13%   90.14%           
- Complexity     7469     7471    +2     
=========================================
  Files           833      833           
  Lines         22523    22524    +1     
  Branches       2234     2235    +1     
=========================================
+ Hits          20301    20304    +3     
+ Misses         1517     1515    -2     
  Partials        705      705           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant