How to Design Robust and Predictable APIs with Idempotency?
Why are APIs unreliable?
- Networks can fail.
- Servers can fail.
How can we solve this problem? Three principles:
-
The client uses "retry" to ensure state consistency.
-
The retry requests must include an ==idempotent unique ID==.
- In RESTful API design, the semantics of PUT and DELETE are inherently idempotent.
- However, POST in online payment scenarios may lead to the ==“duplicate payment” issue==, so we use an "idempotent unique ID" to identify whether a request has been sent multiple times.
- If the error occurs before reaching the server, after retrying, the server sees it for the first time and processes it normally.
- If the error occurs on the server, based on this "unique ID," an ACID-compliant database ensures that this transaction occurs only once.
- If the error occurs after the server returns a result, after retrying, the server only needs to return the cached successful result.
-
Retries must be responsible, such as following the ==exponential backoff algorithm==, because we do not want a large number of clients to retry simultaneously.
For example, Stripe's client calculates the wait time for retries like this:
def self.sleep_time(retry_count)
# Apply exponential backoff with initial_network_retry_delay on the
# number of attempts so far as inputs. Do not allow the number to exceed
# max_network_retry_delay.
sleep_seconds = [Stripe.initial_network_retry_delay * (2 ** (retry_count - 1)), Stripe.max_network_retry_delay].min
# Apply some jitter by randomizing the value in the range of (sleep_seconds
# / 2) to (sleep_seconds).
sleep_seconds = sleep_seconds * (0.5 * (1 + rand()))
# But never sleep less than the base sleep seconds.
sleep_seconds = [Stripe.initial_network_retry_delay, sleep_seconds].max
sleep_seconds
end