Skip to main content

42 posts tagged with "system-design"

View all tags

What is Apache Kafka?

· 4 min read

Apache Kafka is a distributed streaming platform.

Why use Apache Kafka?

Its abstraction is a ==queue==, and its features include:

  • A distributed publish-subscribe (pub-sub) messaging system that simplifies N ^ 2 relationships into N. Publishers and subscribers can operate at their own rates.
  • Ultra-fast zero-copy technology.
  • Support for fault-tolerant data persistence.

It can be applied to:

  • Logging by topic.
  • Messaging systems.
  • Off-site backups.
  • Stream processing.

Why is Kafka so fast?

Kafka uses zero-copy technology, where the CPU does not perform the task of copying data across storage area replicas.

Without zero-copy technology:

With zero-copy technology:

Architecture

From the outside, producers write to the Kafka cluster, while users read from the Kafka cluster.

Data is stored by topic and divided into partitions of replicable replicas.

Kafka Cluster Overview

  1. Producers publish messages to specific topics.
    • First, they are written to an in-memory buffer and then updated to disk.
    • To achieve fast writes, an append-only sequential write is used.
    • Messages can be read only after being written.
  2. Consumers fetch messages from specific topics.
    • They use an "offset pointer" (offset is the SEQ ID) to track/control their unique reading progress.
  3. A topic includes partitions and load balancing, where each partition is an ordered, immutable sequence of records.
    • Partitions determine the parallelism of users (groups). At any given time, a user can read from only one partition.

How to serialize data? Avro

What is its network protocol? TCP

What is the storage layout within a partition? O(1) disk reads.

How does fault tolerance work?

==In-Sync Replicas (ISR) protocol==. It allows (numReplicas - 1) nodes to fail. Each partition has one leader and one or more followers.

Total replicas = In-sync replicas + Out-of-sync replicas

  1. ISR is a set of live replicas that are in sync with the leader (note that the leader is always in the ISR).
  2. When publishing new messages, the leader waits to commit the message until it has been received by all replicas in the ISR.
  3. ==If a follower fails to stay in sync, it will exit the ISR, and then the leader will continue to commit new messages with fewer replicas in the ISR. Note that at this point, the system is operating in a low-replica state.== If a leader fails, another ISR will be elected as a new leader.
  4. Out-of-sync replicas continuously pull messages from the leader. Once they catch up to the leader, they will be added back to the ISR.

Is Kafka an AP or CP system in the CAP theorem?

Jun Rao believes it is CA because "our goal is to support replication within a Kafka cluster in a single data center, where network partitions are rare, so our design focuses on maintaining high availability and strong consistency of replicas."

However, it actually depends on the configuration.

  1. If using the initial configuration (min.insync.replicas=1, default.replication.factor=1), you will have an AP system (at most once).
  2. If you want to achieve CP, you can set min.insync.replicas=2, topic replication factor to 3, and then generate acks=all messages to guarantee CP settings (at least once). However, if there are not enough replicas (replica count < 2) for a specific topic/partition, writing will not succeed.

How Does Facebook Store a Large-Scale Social Graph? TAO

· 2 min read

What Are the Challenges?

Before TAO, using the cache-aside pattern

Before TAO

The social graph is stored in MySQL and cached in Memcached.

Three existing problems:

  1. The efficiency of updating the edge list of the social graph in Memcached is too low. Instead of adding an edge to the end of the list, the entire list needs to be updated.
  2. The logic for managing the cache on the client side is very complex.
  3. It is difficult to maintain ==consistency in database reads after writes==.

To solve these problems, we have three goals:

  • Efficient graph storage even with large-scale data.
  • Optimize read operations (read-write ratio is 500:1).
    • Reduce the duration of read operations.
    • Improve the availability of read operations (eventual consistency).
  • Complete write operations in a timely manner (write first, read later).

Data Model

  • Objects with unique IDs (e.g., users, addresses, comments).
  • Associations between two IDs (e.g., tagged, liked, posted).
  • Both of the above data models have key-value data and time-related data.

Solution: TAO

  1. Accelerate read operations and efficiently handle large-scale reads.

    • Cache specifically for graphs.
    • Add a layer of cache between the stateless server layer and the database layer (see Business Splitting).
    • Split data centers (see Data Partitioning).
  2. Complete write operations in a timely manner.

    • Write-through cache.
    • Use follower/leader caching to solve the ==thundering herd problem==.
    • Asynchronous replication.
  3. Improve the availability of read operations.

    • If a read fails, read from other available sources.

Architecture of TAO

  • MySQL Database → Durability.
  • Leader Cache → Coordinates write operations for each object.
  • Follower Cache → Used for reading rather than writing. Shift all write operations to the leader cache.

Architecture of Facebook TAO

Fault tolerance for read operations.

Fault Tolerance for Read Operations in Facebook TAO

How to Design Robust and Predictable APIs with Idempotency?

· 2 min read

Why are APIs unreliable?

  1. Networks can fail.
  2. Servers can fail.

How can we solve this problem? Three principles:

  1. The client uses "retry" to ensure state consistency.

  2. The retry requests must include an ==idempotent unique ID==.

    1. In RESTful API design, the semantics of PUT and DELETE are inherently idempotent.
    2. However, POST in online payment scenarios may lead to the ==“duplicate payment” issue==, so we use an "idempotent unique ID" to identify whether a request has been sent multiple times.
      1. If the error occurs before reaching the server, after retrying, the server sees it for the first time and processes it normally.
      2. If the error occurs on the server, based on this "unique ID," an ACID-compliant database ensures that this transaction occurs only once.
      3. If the error occurs after the server returns a result, after retrying, the server only needs to return the cached successful result.
  3. Retries must be responsible, such as following the ==exponential backoff algorithm==, because we do not want a large number of clients to retry simultaneously.

For example, Stripe's client calculates the wait time for retries like this:

def self.sleep_time(retry_count)
# Apply exponential backoff with initial_network_retry_delay on the
# number of attempts so far as inputs. Do not allow the number to exceed
# max_network_retry_delay.
sleep_seconds = [Stripe.initial_network_retry_delay * (2 ** (retry_count - 1)), Stripe.max_network_retry_delay].min

# Apply some jitter by randomizing the value in the range of (sleep_seconds
# / 2) to (sleep_seconds).
sleep_seconds = sleep_seconds * (0.5 * (1 + rand()))

# But never sleep less than the base sleep seconds.
sleep_seconds = [Stripe.initial_network_retry_delay, sleep_seconds].max

sleep_seconds
end

How to Build a Scalable Web Service?

· One min read

==One Word: Split==

==The AKF Scale Cube== tells us the three dimensions of "splitting":

AKF Scale Cube

  1. ==Horizontal Scaling== Place many stateless servers behind a load balancer or reverse proxy, so that each request can be handled by any of those servers, eliminating single points of failure.
  2. ==Business Splitting== Typical microservices divided by function, such as auth service, user profile service, photo service, etc.
  3. ==Data Partitioning== Separate the entire technology stack and data storage specifically for a large group of users, for example, Uber has data centers in China and the United States, with different Pods for different cities or regions within each data center.

What are the use cases for key-value caching?

· 3 min read

The essence of KV Cache is to reduce data access latency. For example, it transforms the O(logN) read/write and complex queries on a database that is expensive and slow into O(1) read/writes on a medium that is fast but also costly. There are many strategies for cache design, with common ones being read-through/write-through (or write-back) and cache aside.

The typical read/write ratio for internet services ranges from 100:1 to 1000:1, and we often optimize for reads.

In distributed systems, these patterns represent trade-offs between consistency, availability, and partition tolerance, and the specific choice should be based on your business needs.

General Strategies

  • Read
    • Read-through: A cache layer is added between clients and databases, so clients do not access the database directly but instead access it indirectly through the cache. If the cache is empty, it updates from the database and returns the data; if not, it returns the data directly.
  • Write
    • Write-through: Clients first write data to the cache, which then updates the database. The operation is considered complete only when the database is updated.
    • Write-behind/Write-back: Clients first write data to the cache and receive a response immediately. The cache is then asynchronously updated to the database. Generally, write-back is the fastest.
    • Write-around: Clients write directly to the database, bypassing the cache.

Cache Aside Pattern

Use the Cache Aside pattern when the cache does not support read-through and write-through/write-behind.

Reading data? If the cache hits, read from the cache; if it misses, read from the database and store in the cache. Modifying data? First modify the database, then delete the cache entry.

Why not update the cache after writing to the database? The main concern is that two concurrent database write operations could lead to two concurrent cache updates, resulting in dirty data.

Does using Cache Aside eliminate concurrency issues? There is still a low probability of dirty data occurring, especially when reading from the database and updating the cache while simultaneously updating the database and deleting the cache entry.

Where to Place the Cache?

  • Client side,
  • Distinct layer,
  • Server side.

What to Do If the Cache Size Is Insufficient? Cache Eviction Strategies

  • LRU - Least Recently Used: Keeps track of time and retains the most recently used items, evicting those that have not been used recently.
  • LFU - Least Frequently Used: Tracks usage frequency, retaining the most frequently used items and evicting the least frequently used ones.
  • ARC: Performs better than LRU by maintaining both recently used (RU) and frequently used (FU) items, while also recording the history of recently evicted items.

Which Cache Solution Is the Best?

Facebook TAO

What Can You Discuss in a Soft Skills Interview?

· 3 min read

Why Should We Value Soft Skills?

Because your job can be taken by someone with strong soft skills.

Americans have excellent speaking abilities, as their elementary education emphasizes expression, leading to articulate communication. In equal circumstances, even if Chinese individuals have better technical skills, job opportunities can still be snatched away by Americans. This is not about racial discrimination; it’s a matter of self-expression ability.

For instance, Indians have a strong presence in the U.S., especially in the management of high-tech companies, where the influence of Chinese individuals is far less. This is also due to the Indians' exceptional storytelling abilities. Although their English pronunciation may not be standard, they are willing to speak up and often get to the point. Consequently, we often see an Indian manager overshadowing several Chinese employees who may be technically superior. We often mock Indians for their “PPT governance,” but their storytelling ability is something to take note of.

This illustrates that the soft skills of “free artistry” are a shortcoming for contemporary Chinese individuals.

The Essence of an Interview is to Answer the Following Three Questions

  1. Can you do it or not?
  2. Do you want to do it or not?
  3. Are you a good fit or not?

How to Answer These Three Questions?

The five discussion points in an interview.

  1. Adversity. It’s not about how big the difficulties are, but how you overcame them. You need to prove that you were not only not defeated by adversity but became stronger. Ideally, downplay significant challenges with an optimistic tone. Also, take a moment to express gratitude to those who helped you during tough times, making it clear that you are a grateful person.

  2. Influence. All communication issues are essentially leadership issues, and all leadership issues are fundamentally communication issues. If you are good at persuading others, it indicates you possess inherent leadership qualities.

  3. Technical Proficiency. What stories can showcase your technical skills?

  4. Fit. When the FBI used to interview candidates, they liked to ask what books applicants had read. Candidates would list numerous titles, but what the FBI really wanted to hear was that they had read Tom Clancy's spy novels. For a while, anyone who mentioned reading Clancy's novels had a higher chance of being hired. Eventually, this insider information leaked, and then everyone started saying the same thing, making that tactic ineffective.

  5. Achievements. Compared to others, do you have any standout qualities? This is your opportunity to boast about past accomplishments. Achievements don’t necessarily have to be actual work experience.

On a larger scale, even U.S. presidential campaigns follow this pattern. Obama would say, “My father was an immigrant, and he abandoned my mother. I grew up in a single-parent household as a Black man, and it was tough for me… but none of that matters now; I am optimistic and strong.” Trump would say, “I have worked on this project, that project, and this project, and now I want to undertake a major project that benefits America.”

How to Prepare for These Five Discussion Points?

  • Try more, experience failures, and enrich your life experiences;
  • Engage with more people to practice your communication and organizational skills;
  • Learn some technical skills and master practical tools;
  • Be good at research to understand what is happening in your field; seek opportunities to achieve results that make you stand out.