Reactor -Exponential Backoff

Numerous components on a network, such as DNS servers, switches, load balancers, and others can generate errors anywhere in the life of a given request. A common way to handle theses failures is through the use of retries, and retries with backoff and jitter.

As an engineer, your should clearly enforce these practices when dealing with network connectivity or similar communication protocols over the internet.

Retries, as mentioned previously, are a nice way for dealing with transient remote API errors in client applications. When a client receives an error response for a timeout, it is the responsibility of the client to retry.

Therefore, having a good retry mechanism is important for making our operations run smoothly.

Backoff

Backoff is a technique for performing retries gracefully, without overloading or burning out your backend systems. A simple way to perform retries is by adding a delay between calls. This approach is called a linear backoff. While this is easy to implement and can handle transient failures in a majority of cases, it does not help when a downstream service is impacted for a prolonged period of time, as the retries sent at a fixed rate will continue to overload the service.

Exponential backoff

Exponential backoff is a less aggressive form of backoff. As the name suggests, with this approach the delay between each retry increases exponentially, means that clients multiply their backoff by a constant after each attempt until the request succeeds or a maximum backoff limit is hit. This is a more graceful strategy because it avoids overloading downstream servers, which can result in resource starvation.

Exponential backoff with Jitter

Most exponential backoff algorithms use jitter (randomized delay) to prevent successive collisions. In this case, we introduce randomness to the retry intervals.

This is especially beneficial when using concurrent clients.

Reactor - RetryBackoffSpec

RetryBackoffSpec is Retry strategy based on exponential backoffs with jitter.

The client blocks for a brief initial wait time on the first failure, but as the operation continues to fail, it waits proportionally to 2^n, where n is the number of failures that have occurred, a well choosen amount of random jitter is added to each client’s wait time.

Explanation

I will refer to minBackoff and maxBackOff as min and max respectively.

I will refer to jitterOffset as j.

jitterOffset is obtained by multiplying the jitterFactor (default =0.5) and the computed delay , it defines the interval from which we will pick the jitter.

The delay is calculated as the following:

For each retry the minBackoff is multiplied by 2^n, where n is the number of failures that have occurred and we check if maxBackoff has not been hit.

the next delay would be

where minimum is the usual minimum function, max and min are the maxBackOff and minBackOff respectively.

so far it’s an Exponential backoff, as discussed earlier we want to add some randomness,

where

where j is the jitterOffset.

but we have to ensure that the nextBackoff is in the correct interval, which means

but we have also that

this leads us to define epsilon as

this explains this part of the code:

If the retry requests fails after exceeding the max, an error is reported:

Conclusion

In this tutorial, we've explored how we can improve how client applications retry failed calls by augmenting exponential backoff with jitter.

References:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store