Designing payment webhook
1. Clarifying Requirements
- Webhook will call the merchant back once the payment succeeds.
- Merchant developer registers webhook information with us.
- Make a POST HTTP request to the webhooks reliably and securely.
- High availability, error-handling, and failure-resilience.
- Async design. Assuming that the servers of merchants are located across the world, and may have a very high latency like 15s.
- At-least-once delivery. Idempotent key.
- Order does not matter.
- Robust & predictable retry and short-circuit.
- Security, observability & scalability
- Anti-spoofing.
- Notify the merchant when their receivers are broken.
- easy to extend and scale.
2. Sketch out the high-level design
async design + retry + queuing + observability + security
3. Features and Components
Core Features
- Users go to dashboard frontend to register webhook information with us - like the URL to call, the scope of events they want to subscribe, and then get an API key from us.
- When there is a new event, publish it into the queue and then get consumed by callers. Callers get the registration and make the HTTP call to external services.
Webhook callers
-
Subscribe to the event queue for payment success events published by a payment state machine or other services.
-
Once callers accept an event, fetch webhook URI, secret, and settings from the user settings service. Prepare the request based on those settings. For security...
-
All webhooks from user settings must be in HTTPs
-
If the payload is huge, the prospect latency is high, and we wants to make sure the target reciever is alive, we can verify its existance with a ping carrying a challenge. e.g. Dropbox verifies webhook endpoints by sending a GET request with a “challenge” param (a random string) encoded in the URL, which your endpoint is required to echo back as a response.
-
All callback requests are with header
x-webhook-signature
. So that the receiver can authenticate the request.- For symetric signature, we can use HMAC/SHA256 signature. Its value is
HMAC(webhook secret, raw request payload);
. Telegram takes this. - For asymmetric signature, we can use RSA/SHA256 signature. Its value is
RSA(webhook private key, raw request payload);
Stripe takes this. - If it's sensitive information, we can also consider encryption for the payload instead of just signing.
- For symetric signature, we can use HMAC/SHA256 signature. Its value is
- Make an HTTP POST request to the external merchant's endpoints with event payload and security headers.
API Definition
// POST https://example.com/webhook/
{
"id": 1,
"scheduled_for": "2017-01-31T20:50:02Z",
"event": {
"id": "24934862-d980-46cb-9402-43c81b0cdba6",
"resource": "event",
"type": "charge:created",
"api_version": "2018-03-22",
"created_at": "2017-01-31T20:49:02Z",
"data": {
"code": "66BEOV2A", // or order ID the user need to fulfill
"name": "The Sovereign Individual",
"description": "Mastering the Transition to the Information Age",
"hosted_url": "https://commerce.coinbase.com/charges/66BEOV2A",
"created_at": "2017-01-31T20:49:02Z",
"expires_at": "2017-01-31T21:49:02Z",
"metadata": {},
"pricing_type": "CNY",
"payments": [
// ...
],
"addresses": {
// ...
}
}
}
}
The merchant server should respond with a 200 HTTP status code to acknowledge receipt of a webhook.
Error-handling
If there is no acknowledgment of receipt, we will retry with idempotency key and exponential backoff for up to three days. The maximum retry interval is 1 hour. If it's reaching a certain limit, short-circuit / mark it as broken. Sending out an Email to the merchant.
Metrics
The Webhook callers service emits statuses into the time-series DB for metrics.
Using Statsd + Influx DB vs. Prometheus?
- InfluxDB: Application pushes data to InfluxDB. It has a monolithic DB for metrics and indices.
- Prometheus: Prometheus server pulls the metrics values from the running application periodically. It uses LevelDB for indices, but each metric is stored in its own file.
Or use the expensive DataDog or other APM services if you have a generous budget.