Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
/*
* Copyright 2008-present MongoDB, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.mongodb.internal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the existing com.mongodb.internal.time package is a better place for ExponentialBackoff?


import com.mongodb.annotations.NotThreadSafe;

import java.util.concurrent.ThreadLocalRandom;
import java.util.function.DoubleSupplier;

import static com.mongodb.internal.VisibleForTesting.AccessModifier.PRIVATE;

/**
* Implements exponential backoff with jitter for retry scenarios.
* Formula: delayMS = jitter * min(maxBackoffMs, baseBackoffMs * growthFactor^retryCount)
* where jitter is random value [0, 1).
Comment on lines +28 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formula, as well as the constants are internal detail, irrelevant even to the internal user of this internal API. Let's remove this.

*
* <p>This class provides factory methods for common use cases:
* <ul>
* <li>{@link #forTransactionRetry()} - For withTransaction retries (5ms base, 500ms max, 1.5 growth)</li>
* <li>{@link #forCommandRetry()} - For command retries with overload (100ms base, 10000ms max, 2.0 growth)</li>
Comment on lines +33 to +34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not duplicate method-level documentation in the class-level documentation. Any duplication opens up opportunities for inconsistencies. We duplicate public API documentation for the sake of user convenience (though I has always been opposed to that, because it have lead to inconsistencies, and will lead to them again, which harm users), but internally, there seem to be no reason to duplicate documentation at all.

* </ul>
*/
@NotThreadSafe
public final class ExponentialBackoff {
// Transaction retry constants (per spec)
private static final double TRANSACTION_BASE_BACKOFF_MS = 5.0;
private static final double TRANSACTION_MAX_BACKOFF_MS = 500.0;
private static final double TRANSACTION_BACKOFF_GROWTH = 1.5;

// Command retry constants (per spec)
private static final double COMMAND_BASE_BACKOFF_MS = 100.0;
private static final double COMMAND_MAX_BACKOFF_MS = 10000.0;
private static final double COMMAND_BACKOFF_GROWTH = 2.0;
Comment on lines +39 to +47
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Let's link to the relevant specification section that defines the constants.
  • The COMMAND_ constants should not be introduced in this PR: they are dead code in this PR, and the specification defining them is just a PR for now.
  • These constants actually serve no purpose as of now. When implementing forTransactionRetry, we can simply write return new ExponentialBackoff(5.0, 500.0, 1.5). However, if you insist on the internal API documentation for the forTransactionRetry, which repeats the values, then the constants allow us to use @value. I don't think that particular documentation is useful.


private final double baseBackoffMs;
private final double maxBackoffMs;
private final double growthFactor;
private int retryCount = 0;

// Test-only jitter supplier - when set, overrides ThreadLocalRandom
private static volatile DoubleSupplier testJitterSupplier = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to comment on a private field by writing a documentation comment for the field. However here, the comment not only does not add anything of value to the reader, but actually confuses him: I had to look at the usage of the field to understand what this comment says; and this is likely to happen to any reader because no one knows what ThreadLocalRandom has to do with this field by looking at just the field, and not its usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile is not needed here:

  • From the logical perspective, for the tests that indirectly rely on the testJitterSupplier field to work correctly, the value of the field must not change concurrently with any test code reading it by calling calculateDelayMs.
  • Therefore, if our tests are written correctly, the methods setTestJitterSupplier/clearTestJitterSupplier are not invoked concurrently with calculateDelayMs.
  • There are no other methods reading/writing the testJitterSupplier field.
  • Therefore, there are no executions where there are concurrent conflicting accesses to the field. That is, no synchronization is needed.

If we actually had setTestJitterSupplier/clearTestJitterSupplier being invoked concurrently with the tests that indirectly rely on testJitterSupplier, those tests would have been failing sporadically regardless of whether the field is volatile or not.


/**
* Creates an exponential backoff instance with specified parameters.
*
* @param baseBackoffMs Initial backoff in milliseconds
* @param maxBackoffMs Maximum backoff cap in milliseconds
* @param growthFactor Exponential growth factor (e.g., 1.5 or 2.0)
*/
public ExponentialBackoff(final double baseBackoffMs, final double maxBackoffMs, final double growthFactor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to expose this constructor. It should be private. The only methods that need to be tested are forTransactionRetry and calculateDelayMs.

this.baseBackoffMs = baseBackoffMs;
this.maxBackoffMs = maxBackoffMs;
this.growthFactor = growthFactor;
}

/**
* Creates a backoff instance configured for withTransaction retries.
* Uses: 5ms base, 500ms max, 1.5 growth factor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist that documenting this internally is valuable, which I doubt, let's do that in a way that prevents the documentation and the actual values from becoming inconsistent:

Suggested change
* Uses: 5ms base, 500ms max, 1.5 growth factor.
* Uses: {@value TRANSACTION_BASE_BACKOFF_MS} ms base, {@value TRANSACTION_MAX_BACKOFF_MS} ms max,
* {@value TRANSACTION_BACKOFF_GROWTH} growth factor.

P.S. Notice that I used a space between the numerical value and unit symbol. This is in accordance with the "5.4.3 Formatting the value of a quantity" section in The International System of Units (SI) Brochure released by The International Bureau of Weights and Measures (BIPM).

*
* @return ExponentialBackoff configured for transaction retries
*/
public static ExponentialBackoff forTransactionRetry() {
return new ExponentialBackoff(
TRANSACTION_BASE_BACKOFF_MS,
TRANSACTION_MAX_BACKOFF_MS,
TRANSACTION_BACKOFF_GROWTH
);
}
Comment on lines +76 to +82
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within this PR, I failed to find any benefits of expressing the backoff computation as behavior of an object (an instance of ExponentialBackoff), rather than just a static method; regardless of the approach, the ClientSessionImpl.withTransaction method has to maintain one new local variable: either the lazily initialized transactionBackoff, or the transactionAttempt (that's how it is named in the spec, but we are free to use a different name). Given this, I propose to go with the more straightforward approach of expressing the backoff computation as a static method1, rather than as an object behavior.

If in the future we observe that this is not enough and the "object" approach is needed, we'll be able to change the approach. But we will have a clear reason for that.

Also, storing all the constants in each instance of ExponentialBackoff as instance fields is unnecessary. If we have to use the "object" approach in the future, we better implement it in such a way that does not duplicate constants as instance fields in each object instance (an interface / abstract class with two implementations is one way to achieve that).

P.S. Some other review comments I left are written within the current "object" approach (as opposed to he proposed "method" approach). If the suggestion in this comment is applied, those comments should not be automatically discarded, but rather each should be examined and adopted, if applicable, to the "method" approach.


1 If in the future we need another static method for command retries, we will be able to move the computational logic in a private static method, and call that method from two public methods, passing the suitable constants as method arguments.


/**
* Creates a backoff instance configured for command retries during overload.
* Uses: 100ms base, 10000ms max, 2.0 growth factor.
*
* @return ExponentialBackoff configured for command retries
*/
public static ExponentialBackoff forCommandRetry() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not introduce unused code that we foresee to be needed in the future. In this PR, we only need backoffs for the convenient transaction API.

return new ExponentialBackoff(
COMMAND_BASE_BACKOFF_MS,
COMMAND_MAX_BACKOFF_MS,
COMMAND_BACKOFF_GROWTH
);
}

/**
* Calculate next backoff delay with jitter.
*
* @return delay in milliseconds
*/
public long calculateDelayMs() {
Comment on lines +98 to +103
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename the method to calculateNextDelayMs, or something like that. To emphasize that each invocation may return a value different from the previous one.

// Use test jitter supplier if set, otherwise use ThreadLocalRandom
double jitter = testJitterSupplier != null
? testJitterSupplier.getAsDouble()
: ThreadLocalRandom.current().nextDouble();
Comment on lines +104 to +107
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment simply repeats what the trivial expression it is about does. Such comments have pollute the code without benefitting us in any way. Let's remove it.

double exponentialBackoff = baseBackoffMs * Math.pow(growthFactor, retryCount);
double cappedBackoff = Math.min(exponentialBackoff, maxBackoffMs);
retryCount++;
return Math.round(jitter * cappedBackoff);
}

/**
* Set a custom jitter supplier for testing purposes.
* This overrides the default ThreadLocalRandom jitter generation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class using ThreadLocalRandom is an internal detail even when it comes to the internal users of this internal API. Let's not put it here.

P.S. The majority of the internal docs written for this trivial class seem to be not useful. But it takes up the resources when reviewing, and also when producing it (assuming it was produced manually).

*
* @param supplier A DoubleSupplier that returns values in [0, 1) range, or null to use default
*/
@VisibleForTesting(otherwise = PRIVATE)
public static void setTestJitterSupplier(final DoubleSupplier supplier) {
testJitterSupplier = supplier;
}

/**
* Clear the test jitter supplier, reverting to default ThreadLocalRandom behavior.
*/
@VisibleForTesting(otherwise = PRIVATE)
public static void clearTestJitterSupplier() {
testJitterSupplier = null;
}
Comment on lines +114 to +131
Copy link
Member

@stIncMale stIncMale Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such global states and methods allowing us to fiddle with the behavior for testing purposes is a dangerous and obscure way of programming. This approach reminds me about https://pvs-studio.com/en/blog/posts/0439/. Here are some quotes from there:

  • "we counted that 89 people died from 2000 to 2010 because of your screwed up your electronics and software"
  • "we found a couple of bugs in your code, namely 7134 MISRA standards violations, recursion, 740-string long function and 9000 global variables"

I know we introduced a similar thing in CSOT: InternalStreamConnection.recordEverything. And it is equally bad.

Instead, we should try and do what was done in DefaultConnectionPool, where we introduced InternalConnectionPoolSettings in addition to ConnectionPoolSettings. However, this time, since the tests create MongoClient, and not DefaultConnectionPool, the new internal settings will have to be supplied from the very top: MongoClients. Since our MongoClients classes are public API, we can't just add the new internal settings to their methods, but the idea is still doable as described below:

  1. Introduce InternalMongoClientSettings similar to MongoClientSettings, but for internal settings. Currently, we only need it to incorporate InternalConnectionPoolSettings, and the new top-level (a field in InternalMongoClientSettings) retry backoff jitter setting relevant to the current PR.
  2. Introduce com.mongodb.client.internal.InternalMongoClients/com.mongodb.reactivestreams.client.internal.InternalMongoClients. These classes will actually contain the code from the current com.mongodb.client.MongoClients/com.mongodb.reactivestreams.client.MongoClients classes.
  3. Add the InternalMongoClientSettings parameters to the methods in InternalMongoClients that accept MongoClientSettings.
    3.1. Add the InternalMongoClientSettings parameters to the constructors of com.mongodb.client.internal.MongoClientImpl/com.mongodb.reactivestreams.client.internal.MongoClientImpl that accept MongoClientSettings.
  4. Re-implement all the API methods in MongoClients such that they delegate to the corresponding methods in InternalMongoClients, and pass InternalMongoClientSettings.builder().build() (the default InternalMongoClientSettings, representing the internal settings for production use) as the value of the InternalMongoClientSettings parameter.
    5.1. Update the calls to the modified MongoClientImpl constructors.
  5. Replace all the internal calls to the methods in MongoClients with the calls to the corresponding methods in InternalMongoClients.

When the above is done, the new WithTransactionProseTest.testRetryBackoffIsEnforced will be able to create MongoClient with the jitter value it needs without fiddling with any non-explicit global state.

The proposed approach will also enable us to get rid of InternalStreamConnection.recordEverything (technical debt), and to avoid introducing similar global states in the future.


/**
* Reset retry counter for new sequence of retries.
*/
public void reset() {
retryCount = 0;
}
Comment on lines +133 to +138
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not used, let's remove it.


/**
* Get current retry count for testing.
*
* @return current retry count
*/
public int getRetryCount() {
return retryCount;
}

/**
* Get the base backoff in milliseconds.
*
* @return base backoff
*/
public double getBaseBackoffMs() {
return baseBackoffMs;
}

/**
* Get the maximum backoff in milliseconds.
*
* @return maximum backoff
*/
public double getMaxBackoffMs() {
return maxBackoffMs;
}

/**
* Get the growth factor.
*
* @return growth factor
*/
public double getGrowthFactor() {
return growthFactor;
}
Comment on lines +140 to +174
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, we should make these methods package-access, and annotate with @VisibleForTesting(otherwise = PRIVATE). But actually, they are not useful, and expose the internals for no good reason: the actual behavior that is worth testing is tested via the assertions that use the calculateDelayMs method. We should remove these methods with the assertions that use them.

}
Loading