Skip to main content

76 docs tagged with "system design"

View all tags

3 Programming Paradigms

Structured programming is a discipline imposed upon the direct transfer of control. OO programming is a discipline imposed upon the indirect transfer of control. Functional programming is discipline imposed upon variable assignment.

4 Kinds of No-SQL

When reading data from a hard disk, a database join operation is time-consuming and 99% of the time is spent on disk seek. To optimize read performance, denormalization is introduced and four categories of NoSQL are here to help.

ACID vs BASE

ACID and BASE indicate different designing philosophy. ACID focuses on consistency over availability. In ACID, the C means that a transaction pre-serves all the database rules. Meanwhile, BASE focuses more on availability indicating the system is guaranteed to be available.

B tree vs. B+ tree

A B+ tree can be seen as B tree in which each node contains only keys. Pros of B+ tree can be summarized as fewer cache misses. In B tree, the data is associated with each key and can be accessed more quickly.

Bloom Filter

A bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way. A query returns either "possibly in set" or "definitely not in set".

Bloom Filter

A Bloom filter is a data structure that is used to determine whether an element is a member of a set with a much higher space and time efficiency than other general algorithms. The results obtained using a Bloom filter may yield false positive matches, but cannot yield false negative matches. Elements can be added to the set, but cannot be removed; the more elements added to the set, the greater the likelihood of false positives.

Cloud Design Patterns

There are three types of cloud design patterns. Availability patterns have health endpoint monitoring and throttling. Data management patterns have cache-aside and static content hosting. Security patterns have federated identity.

Concurrency Models

Five concurrency models you may want to know: Single-threaded; Multiprocessing and lock-based concurrency; Communicating Sequential Processes (CSP); Actor Model (AM); Software Transactional Memory (STM).

Data Partition and Routing

The advantages of implementing data partition and routing are availability and read efficiency while consistency is the weakness. The routing abstract model is essentially two maps: key-partition map and partition-machine map.

Designing a Load Balancer

Internet services often need to handle traffic from around the world, but a single server can only serve a limited number of requests at the same time. Therefore, we typically have a server cluster to collectively manage this traffic. The question arises: how can we evenly distribute this traffic across different servers?

Designing a Load Balancer or Dropbox Bandaid

Large-scale web services deal with high-volume traffic, but one host could only serve a limited amount of requests. There is usually a server farm to take the traffic altogether. How to route them so that each host could evenly receive the request?

Designing a URL shortener

If you are asked to design a system to take user-provided URLs and transform them to shortened URLs, what would you do? How would you allocate the shorthand URLs? How would you implement the redirect servers? How would you store the click stats?

Designing a URL Shortener System

Design a system that can convert URLs provided by users into short URLs, allowing users to access their original long URLs using these short URLs. The operation of the system should include, but not be limited to, the following questions: How to allocate short URLs? How to store the mapping between short URLs and long URLs? How to implement the redirection service? How to store access data?

Designing Airbnb or a hotel booking system

For guests and hosts, we store data with a relational database and build indexes to search by location, metadata, and availability. We can use external vendors for payment and remind the reservations with a priority queue.

Designing Facebook photo storage

Traditional NFS based design has metadata bottleneck: large metadata size limits the metadata hit ratio. Facebook photo storage eliminates the metadata by aggregating hundreds of thousands of images in a single haystack store file.

Designing Facebook's Photo Storage System

There are two reasons why Facebook handles photo storage: the petabyte-scale volume of blob data; traditional NFS-based designs face metadata bottlenecks, where massive metadata severely limits hit rates. The solution is to aggregate hundreds of thousands of images into a single Haystack storage file, thereby eliminating the metadata burden.

Designing Human-Centric Internationalization (i18n) Engineering Solutions

Most products from Silicon Valley companies target the global market, and internationalization is a strategic key for multinational companies venturing into this market. The i18n engineering solution we designed primarily addresses three major issues in the development process of websites and mobile apps: 1. Language, 2. Time and Time Zones, 3. Numbers and Currency. Like all software system development, there is no silver bullet for internationalization; great works are crafted through diligent foundational work.

Designing Online Judge or Leetcode

An online judge is primarily a place where you can execute code remotely for educational or recruitment purposes. In this design, we focus on designing an OJ for interview preparation like Leetcode.

Designing payment webhook

Design a webhook that notifies the merchant when the payment succeeds. We need to aggregate the metrics (e.g., success vs. failure) and display it on the dashboard.

Designing typeahead search or autocomplete

How to design a realtime typeahead autocomplete service? Linkedin's Cleo lib answers with a multi-layer architecture (browser cache / web tier / result aggregator / various typeahead backend) and 4 elements (inverted / forward index, bloom filter, scorer).

Designing Uber

Disclaimer: All things below are collected from public sources or purely original. No Uber-confidential stuff here.

Designing Uber Ride-Hailing Service

Requirements for designing Uber ride-hailing: providing services for the global transportation market; large-scale real-time scheduling; backend design; Uber ride-hailing design process: architecture; microservices; scheduling services; payment services; user profile services and trip record services, notification push services.

Enterprise Authorization Services 2022

Authorization determines whether an individual or system has the right to access a particular resource. And this process is a typical scenario that could be automated with software. We will review Google's Zanzibar, Zanzibar-inspired solutions and other AuthZ services on the market.

Experience Deep Dive

For those who had little experience in leadership positions, we have some tips for interviews. It is necessary to describe your previous projects including challenges or improvements. Also, remember to demonstrate your communication skills.

Fraud Detection with Semi-supervised Learning

Fraud Detection fights against account takeovers and Botnet attacks during login. Semi-supervised learning has better learning accuracy than unsupervised learning and less time and costs than supervised learning.

How Does Facebook Store a Large-Scale Social Graph? TAO

The efficiency of updating the edge list of the social graph in Memcached is too low, the logic for managing the cache on the client side is complex, and it is difficult to maintain consistency in database reads after writes. How to solve these three problems: accelerate read operations, efficiently handle large-scale reads; complete write operations in a timely manner; improve the availability of read operations.

How Facebook Scale its Social Graph Store? TAO

Before Tao, Facebook used the cache-aside pattern to scale its social graph store. There were three problems: list update operation is inefficient; clients have to manage cache and hard to offer read-after-write consistency. With Tao, these problems are solved.

How to Build a Scalable Web Service?

How to build a scalable web service? One word: Split. The AKF Scale Cube tells us the three dimensions of "splitting": horizontal scaling; business splitting; data partitioning.

How to Design Robust and Predictable APIs with Idempotency?

Why are APIs unreliable? Networks can fail, and servers can fail. Three principles to solve this problem: the client uses "retry" to ensure state consistency; the retry requests must include an idempotent unique ID; retries must be responsible, such as following an exponential backoff algorithm, because we do not want a large number of clients to retry simultaneously.

How to Design the Architecture of a Blockchain Server?

A distributed blockchain accounting and smart contract system. It requires minimal trust between nodes while incentivizing them to cooperate: transactions are irreversible, do not rely on trusted third parties, protect privacy, disclose minimal information, and ensure that money cannot be spent twice. Assuming performance is not an issue, we will not consider how to optimize performance.

How to Stream Video to Mobile Devices Using HTTP? HTTP Live Streaming (HLS)

Mobile video playback services using HTTP Live Streaming face two main challenges: limited memory and storage on mobile devices; and the need to dynamically adjust video quality during transmission due to unstable network connections and varying bandwidth. We can address these issues at both the server and client levels.

How to write solid code?

Empathy plays the most important role in writing solid code. Besides, you need to choose a sustainable architecture to decrease human resource costs in total as the project scales. Then, adopt patterns and best practices; avoid anti-patterns. Finally, refactor if necessary.

Improving availability with failover

To improve availability with failover, there are serval ways to achieve the goal such as cold standby, hot standby, warm standby, checkpointing and all active.

Improving System Availability through Failover

Failover: Failover is a backup operational mode used to enhance system stability and availability. When the primary component fails or is scheduled for downtime, the functions of system components (such as processors, servers, networks, or databases) are transferred to secondary system components.

Intro to Relational Database

The relational database is the default choice for most storage use cases, by reason of atomicity, consistency, isolation, and durability. How is consistency here different from the one in CAP theorem? Why do we need 3NF and DB proxy?

Introduction to Architecture

Architecture serves the full lifecycle of the software system to make it easy to understand, develop, test, deploy and operate. The O’Reilly book Software Architecture Patterns gives a simple but effective introduction to five fundamental architectures.

Introduction to Architecture

Architecture serves the entire lifecycle of software systems, making them easy to understand, develop, test, deploy, and operate, with the goal of minimizing the human resource costs for each business use case. O'Reilly's

iOS Architecture Patterns Revisited

Architecture can directly impact costs per feature. Let's compare Tight-coupling MVC, Cocoa MVC, MVP, MVVM, and VIPER in three dimensions: balanced distribution of responsibility among feature actors, testability and ease of use and maintainability.

Key value cache

The key-value cache is used to reduce the latency of data access. What are read-through, write-through, write-behind, write-back, write-behind, and cache-aside patterns?

Lambda Architecture

Lambda architecture = CQRS (batch layer + serving layer) + speed layer. It solves accuracy, latency, throughput problems of big data.

Lambda Architecture

Using Lambda can address three issues brought by big data: accuracy (good); latency (fast); throughput (high). The lambda architecture can guide us on how to scale a data system.

Load Balancer Types

Usually, load balancers have three categories: DNS Round Robin, Network Load balancer and Application Load balancer. DNS Round Robin is rarely used as it is hard to control and not responsive. The network Load balancer has better granularity and is simple and responsive.

Lyft's Marketing Automation Platform Symphony

How can advertising campaigns achieve higher returns with less money and fewer people? Lyft's answer is automation, which includes an LTV prediction module, a budget allocation module, and a delivery module. When people are freed from tedious delivery tasks and can focus on understanding users, channels, and the messages they need to convey to their audience, they can achieve better campaign results.

Public API Choices

There are several tools for the public API, API gateway or Backend for Frontend gateway. GraphQL distinguishes itself from others for its features like tailing results, batching nested queries, performance tracing, and explicit caching.

Replica, Consistency, and CAP theorem

Any networked system has three desirable properties: consistency, availability and partition tolerance. Systems can have only two of those three. For example, RDBMS prefers consistency and partition tolerance and becomes an ACID system.

Skip List

A skip list is essentially a linked list that allows for binary search. It achieves this by adding extra nodes that enable you to "skip" parts of the linked list. Given a random number generator to create these extra nodes, a skip list has O(log n) complexity for search, insert, and delete operations.

Skiplist

A skip-list is essentially a linked list that allows you to do a binary search on. The way it accomplishes this is by adding extra nodes that will enable you to ‘skip’ sections of the linked-list. There are LevelDB MemTable, Redis SortedSet and Lucene inverted index using this.

SOLID Design Principles

SOLID is an acronym of design principles that help software engineers write solid code. S is for single responsibility principle, O for open/closed principle, L for Liskov’s substitution principle, I for interface segregation principle and D for dependency inversion principle.

Stream and Batch Processing Frameworks

Stream and Batch processing frameworks can process high throughput at low latency. Why is Flink gaining popularity? And how to make an architectural choice among Storm, Storm-trident, Spark, and Flink?

Toutiao Recommendation System: P1 Overview

In order to evaluate user satisfaction, machine learning models are implemented. These models observe and measure the reality by feature engineering and further reduce latencies by recall strategy.

Toutiao Recommendation System: P2 Content Analysis

Content analysis and data mining of user tags are the cornerstones of the recommendation system. The content analysis derives intermedia data from raw articles and user behaviors. With content analysis, we are able to tag users, recommend and prepare content.

What are the use cases for key-value caching?

The essence of Key Value Cache is to reduce data access latency. Common strategies for cache design include read-through/write-through and cache aside. The specific strategy should be chosen based on your business needs.

What can we communicate in soft skills interview?

An interview is a process for workers to find future co-workers. The candidate will be evaluated based on answers to three key questions: capability, willingness, and culture-fit. Any question above can not be answered without good communication.

What Can You Discuss in a Soft Skills Interview?

Without the ability to express oneself at the same skill level, job opportunities can be taken away. The essence of an interview revolves around three questions: Can you do it or not; Do you want to do it or not; Are you a good fit or not. The five discussion points in an interview are: Adversity; Influence; Technical proficiency; Fit; Achievements. How to prepare for these five discussion points: Engage with more people, accumulate experiences, learn more technical skills, and be good at research.

What is Apache Kafka?

Apache Kafka is a distributed streaming platform, which can be used for logging by topics, messaging system geo-replication or stream processing. It is much faster than other platforms due to its zero-copy technology.

What is Apache Kafka?

Apache Kafka is a distributed streaming platform. Its features include a distributed publish-subscribe (pub-sub) messaging system that simplifies N ^ 2 relationships into N, allowing publishers and subscribers to operate at their own rates; ultra-fast zero-copy technology; and support for fault-tolerant data persistence.