How Does Facebook Store a Large-Scale Social Graph? TAO
· 2 min read
What Are the Challenges?
Before TAO, using the cache-aside pattern
The social graph is stored in MySQL and cached in Memcached.
Three existing problems:
- The efficiency of updating the edge list of the social graph in Memcached is too low. Instead of adding an edge to the end of the list, the entire list needs to be updated.
- The logic for managing the cache on the client side is very complex.
- It is difficult to maintain ==consistency in database reads after writes==.
To solve these problems, we have three goals:
- Efficient graph storage even with large-scale data.
- Optimize read operations (read-write ratio is 500:1).
- Reduce the duration of read operations.
- Improve the availability of read operations (eventual consistency).
- Complete write operations in a timely manner (write first, read later).
Data Model
- Objects with unique IDs (e.g., users, addresses, comments).
- Associations between two IDs (e.g., tagged, liked, posted).
- Both of the above data models have key-value data and time-related data.
Solution: TAO
-
Accelerate read operations and efficiently handle large-scale reads.
- Cache specifically for graphs.
- Add a layer of cache between the stateless server layer and the database layer (see Business Splitting).
- Split data centers (see Data Partitioning).
-
Complete write operations in a timely manner.
- Write-through cache.
- Use follower/leader caching to solve the ==thundering herd problem==.
- Asynchronous replication.
-
Improve the availability of read operations.
- If a read fails, read from other available sources.
Architecture of TAO
- MySQL Database → Durability.
- Leader Cache → Coordinates write operations for each object.
- Follower Cache → Used for reading rather than writing. Shift all write operations to the leader cache.
Fault tolerance for read operations.