76 posts tagged with "system design"

Designing Uber Ride-Hailing Service

February 8, 2019 · 3 min read

Disclaimer: All content below is sourced from public resources or purely original. No confidential information regarding Uber is included here.

Requirements

Provide services for the global transportation market
Large-scale real-time scheduling
Backend design

Architecture

uber architecture

Why Microservices?

==Conway's Law== The structure of a software system corresponds to the organizational structure of the company.

	Monolithic ==Service==	Microservices
When team size and codebase are small, productivity	✅ High	❌ Low
==When team size and codebase are large, productivity==	❌ Low	✅ High (Conway's Law)
==Quality requirements for engineering==	❌ High (Inadequately skilled developers can easily disrupt the entire system)	✅ Low (Runtime is isolated)
Dependency version upgrades	✅ Fast (Centralized management)	❌ Slow
Multi-tenant support / Production-staging state isolation	✅ Easy	❌ Difficult (Each service must 1) either establish a staging environment connected to other services in staging 2) or support multi-tenancy across request contexts and data storage)
Debuggability, assuming the same modules, parameters, logs	❌ Low	✅ High (if distributed tracing is available)
Latency	✅ Low (Local)	❌ High (Remote)
DevOps costs	✅ Low (High cost of build tools)	❌ High (Capacity planning is difficult)

Combining monolithic ==codebase== and microservices can leverage the strengths of both.

Scheduling Service

Consistent hash addresses provided by geohash
Data is transient in memory, so no need for duplication. (CAP: AP over CP)
Use single-threaded or locked sharding to prevent double scheduling

Payment Service

==The key is to have asynchronous design==, as ACID transaction payment systems across multiple systems often have very long latencies.

Utilize event queues
Payment interface integrations with Braintree, PayPal, Card.io, Alipay, etc.
Record all events through detailed logs
Use APIs with idempotency, exponential backoff, and random jitter

User Profile Service and Trip Record Service

Use caching to reduce latency
As 1) support for more countries and regions increases 2) user roles (drivers, riders, restaurant owners, diners, etc.) gradually expand, providing user profile services for these users also faces significant challenges.

Notification Push Service

Apple Push Notification Service (unreliable)
Google Cloud Messaging (GCM) (can detect successful delivery) or
SMS services are generally more reliable

A Closer Look at iOS Architecture Patterns

January 17, 2019 · 3 min read

Why Should We Care About Architecture?

The answer is: to reduce the human resources spent on each feature.

Mobile developers evaluate the quality of an architecture on three levels:

Whether the responsibilities of different features are evenly distributed
Whether it is easy to test
Whether it is easy to use and maintain

	Responsibility Distribution	Testability	Usability
Tight Coupling MVC	❌	❌	✅
Cocoa MVC	❌ V and C are coupled	❌	✅⭐
MVP	✅ Independent view lifecycle	✅	Average: more code
MVVM	✅	Average: View has a dependency on UIKit	Average
VIPER	✅⭐️	✅⭐️	❌

Tight Coupling MVC

Traditional MVC

For example, in a multi-page web application, when you click a link to navigate to another page, the entire page reloads. The problem with this architecture is that the View is tightly coupled with the Controller and Model.

Cocoa MVC

Cocoa MVC is the architecture recommended by Apple for iOS developers. Theoretically, this architecture allows the Controller to decouple the Model from the View.

Cocoa MVC

However, in practice, Cocoa MVC encourages the use of massive view controllers, ultimately leading to the view controller handling all operations.

Realistic Cocoa MVC

Although testing such tightly coupled massive view controllers is quite difficult, Cocoa MVC performs the best in terms of development speed among the existing options.

MVP

In MVP, the Presenter has no relationship with the lifecycle of the view controller, allowing the view to be easily replaced. We can think of UIViewController as the View.

Variant of MVC

There is another type of MVP: MVP with data binding. As shown in the figure, the View is tightly coupled with the Model and Controller.

MVP

MVVM

MVVM is similar to MVP, but MVVM binds the View to the View Model.

MVVM

VIPER

Unlike the three-layer structure of MV(X), VIPER has a five-layer structure (VIPER View, Interactor, Presenter, Entity, and Routing). This structure allows for good responsibility distribution but has poorer maintainability.

VIPER

Compared to MV(X), VIPER has the following differences:

The logic processing of the Model is transferred to the Interactor, so Entities have no logic and are purely data storage structures.
==UI-related business logic is handled in the Presenter, while data modification functions are handled in the Interactor==.
VIPER introduces a routing module, Router, to implement inter-module navigation.

How to Stream Video to Mobile Devices Using HTTP? HTTP Live Streaming (HLS)

January 14, 2019 · One min read

Why Is Such a Protocol Needed?

Mobile video playback services using HTTP Live Streaming encounter the following issues:

==Limited memory and storage on mobile devices==.
Due to unstable network connections and varying bandwidth, there is a need to ==dynamically adjust video quality during transmission==.

Solutions

Server Side: In a typical setup, encoding hardware receives audio and video input, encodes it into H.264 format video and AAC format audio, and then streams it out in MPEG-2 format.
1. A software multiplexer then splits the raw output stream into a series of short media files (with lengths possibly around 10 seconds in .ts format).
2. The multiplexer also maintains an index file (.m3u8 format) that contains a list of all media files.
3. The generated media files and index file are published on a web server.
Client Side: The client reads the index, sequentially requests the necessary media files from the server, and smoothly plays the content of each short media file.

Architecture

HLS Architecture

Designing Facebook's Photo Storage System

January 14, 2019 · 2 min read

Why Does Facebook Handle Its Own Photo Storage?

Petabyte-scale volume of blob data
Traditional NFS-based designs (where each image is stored as a file) face metadata bottlenecks: massive metadata severely limits metadata hit rates.
- Here are the details:

For photo applications, most metadata, such as image permissions, is useless, wasting storage space. However, the larger overhead is that the metadata of the file must be read from disk into memory to locate the file itself. While this is negligible for small-scale storage, when multiplied by billions of photos and several petabytes of data, accessing metadata becomes a throughput bottleneck.

Solution

By aggregating hundreds of thousands of images into a single Haystack storage file, the metadata burden is eliminated.

Structure

Data Layout

Index file (for quick memory loading) + Haystack storage file containing many images.

Index file layout

Storage file

CRUD Operations

Create: Write to the storage file, then ==asynchronously== write to the index file, as indexing is not a critical step.
Delete: Perform soft deletes by marking the deleted bits in a flag field. Execute hard deletes through compacting operations.
Update: During updates, only append (append-only); if a duplicate key is encountered, the application can choose to update and read the key with the maximum offset.
Read: Read operations (offset, key, backup key, cookie, and data size)

Use Cases

Upload

Download

iOS Architecture Patterns Revisited

January 10, 2019 · 2 min read

Why bother with architecture?

Answer: for reducing human resources costs per feature.

Mobile developers evaluate the architecture in three dimensions.

Balanced distribution of responsibilities among feature actors.
Testability
Ease of use and maintainability

	Distribution of Responsibility	Testability	Ease of Use
Tight-coupling MVC	❌	❌	✅
Cocoa MVC	❌ VC are coupled	❌	✅⭐
MVP	✅ Separated View Lifecycle	✅	Fair: more code
MVVM	✅	Fair: because of View's UIKit dependant	Fair
VIPER	✅⭐️	✅⭐️	❌

Tight-coupling MVC

Traditional MVC

For example, in a multi-page web application, page completely reloaded once you press on the link to navigate somewhere else. The problem is that the View is tightly coupled with both Controller and Model.

Cocoa MVC

Apple’s MVC, in theory, decouples View from Model via Controller.

Cocoa MVC

Apple’s MVC in reality encourages ==massive view controllers==. And the view controller ends up doing everything.

Realistic Cocoa MVC

It is hard to test coupled massive view controllers. However, Cocoa MVC is the best architectural pattern regarding the speed of the development.

MVP

In an MVP, Presenter has nothing to do with the life cycle of the view controller, and the View can be mocked easily. We can say the UIViewController is actually the View.

MVC Variant

There is another kind of MVP: the one with data bindings. And as you can see, there is tight coupling between View and the other two.

MVP

MVVM

It is similar to MVP but binding is between View and View Model.

MVVM

VIPER

There are five layers (VIPER View, Interactor, Presenter, Entity, and Routing) instead of three when compared to MV(X). This distributes responsibilities well but the maintainability is bad.

VIPER

When compared to MV(X), VIPER

Model logic is shifted to Interactor and Entities are left as dumb data structures.
==UI related business logic is placed into Presenter, while the data altering capabilities are placed into Interactor==.
It introduces Router for the navigation responsibility.

Key value cache

January 6, 2019 · 3 min read

KV cache is like a giant hash map and used to reduce the latency of data access, typically by

Putting data from slow and cheap media to fast and expensive ones.
Indexing from tree-based data structures of O(log n) to hash-based ones of O(1) to read and write

There are various cache policies like read-through/write-through(or write-back), and cache-aside. By and large, Internet services have a read to write ratio of 100:1 to 1000:1, so we usually optimize for read.

In distributed systems, we choose those policies according to the business requirements and contexts, under the guidance of CAP theorem.

Regular Patterns

Read
- Read-through: the clients read data from the database via the cache layer. The cache returns when the read hits the cache; otherwise, it fetches data from the database, caches it, and then return the vale.
Write
- Write-through: clients write to the cache and the cache updates the database. The cache returns when it finishes the database write.
- Write-behind / write-back: clients write to the cache, and the cache returns immediately. Behind the cache write, the cache asynchronously writes to the database.
- Write-around: clients write to the database directly, around the cache.

Cache-aside pattern

When a cache does not support native read-through and write-through operations, and the resource demand is unpredictable, we use this cache-aside pattern.

Read: try to hit the cache. If not hit, read from the database and then update the cache.
Write: write to the database first and then ==delete the cache entry==. A common pitfall here is that people mistakenly update the cache with the value, and double writes in a high concurrency environment will make the cache dirty.

==There are still chances for dirty cache in this pattern.== It happens when these two cases are met in a racing condition:

read database and update cache
update database and delete cache

Where to put the cache?

client-side
distinct layer
server-side

What if data volume reaches the cache capacity? Use cache replacement policies

LRU(Least Recently Used): check time, and evict the most recently used entries and keep the most recently used ones.
LFU(Least Frequently Used): check frequency, and evict the most frequently used entries and keep the most frequently used ones.
ARC(Adaptive replacement cache): it has a better performance than LRU. It is achieved by keeping both the most frequently and frequently used entries, as well as a history for eviction. (Keeping MRU+MFU+eviction history.)

Who are the King of the cache usage?

Facebook TAO

Designing Facebook photo storage

January 4, 2019 · 2 min read

Motivation & Assumptions

PB-level Blob storage
Traditional NFS based desgin (Each image stored as a file) has metadata bottleneck: large metadata size severely limits the metadata hit ratio.
- Explain more about the metadata overhead

For the Photos application most of this metadata, such as permissions, is unused and thereby wastes storage capacity. Yet the more significant cost is that the file’s metadata must be read from disk into memory in order to find the file itself. While insignificant on a small scale, multiplied over billions of photos and petabytes of data, accessing metadata is the throughput bottleneck.

Solution

Eliminates the metadata overhead by aggregating hundreds of thousands of images in a single haystack store file.

Architecture

Data Layout

index file (for quick memory load) + haystack store file containing needles.

index file layout

haystack store file

CRUD Operations

Create: write to store file and then ==async== write index file, because index is not critical
Read: read(offset, key, alternate_key, cookie, data_size)
Update: Append only. If the app meets duplicate keys, then it can choose one with largest offset to update.
Delete: soft delete by marking the deleted bit in the flag field. Hard delete is executed by the compact operation.

Usecases

Upload

Download

Designing Uber

January 3, 2019 · 2 min read

Disclaimer: All things below are collected from public sources or purely original. No Uber-confidential stuff here.

Requirements

ride hailing service targeting the transportation markets around the world
realtime dispatch in massive scale
backend design

Architecture

uber architecture

Why micro services?

==Conway's law== says structures of software systems are copies of the organization structures.

	Monolithic ==Service==	Micro Services
Productivity, when teams and codebases are small	✅ High	❌ Low
==Productivity, when teams and codebases are large==	❌ Low	✅ High (Conway's law)
==Requirements on Engineering Quality==	❌ High (under-qualified devs break down the system easily)	✅ Low (runtimes are segregated)
Dependency Bump	✅ Fast (centrally managed)	❌ Slow
Multi-tenancy support / Production-staging Segregation	✅ Easy	❌ Hard (each individual service has to either 1) build staging env connected to others in staging 2) Multi-tenancy support across the request contexts and data storage)
Debuggability, assuming same modules, metrics, logs	❌ Low	✅ High (w/ distributed tracing)
Latency	✅ Low (local)	❌ High (remote)
DevOps Costs	✅ Low (High on building tools)	❌ High (capacity planning is hard)

Combining monolithic ==codebase== and micro services can bring benefits from both sides.

Dispatch Service

consistent hashing sharded by geohash
data is transient, in memory, and thus there is no need to replicate. (CAP: AP over CP)
single-threaded or locked matching in a shard to prevent double dispatching

Payment Service

==The key is to have an async design==, because payment systems usually have a very long latency for ACID transactions across multiple systems.

leverage event queues
payment gateway w/ Braintree, PayPal, Card.io, Alipay, etc.
logging intensively to track everything
APIs with idempotency, exponential backoff, and random jitter

UserProfile Service and Trip Service

low latency with caching
UserProfile Service has the challenge to serve users in increasing types (driver, rider, restaurant owner, eater, etc) and user schemas in different regions and countries.

Push Notification Service

Apple Push Notifications Service (not quite reliable)
Google Cloud Messaging Service GCM （it can detect the deliverability) or
SMS service is usually more reliable

Designing a KV store with external storage

November 10, 2018 · 2 min read

Requirements

Data size: Data size of values is too large to be held in memory, and we should leverage the external storage for them. However, we can still keep the data keys in memory.
Single-host solution. No distributed design.
Optimize for write.

Solution

In-memory hashmap index + index hint file + data files
Append-only for write optimization. Have only one active data file for write. And compact active data to the older data file(s) for read.

Components

In-memory HashMap<Key, <FildId, ValueOffset, ValueSize, Timestamp>>
Data file layout

|crc|timestamp|key_size|value_size|key|value|
...

(index) hint file that the in-memory hashmap can recover from

Operations

Delete: get the location by the in-memory hashmap, if it exists, then go to the location on the disk to set the value to a magic number.

Get: get the location by the in-memory hashmap, and then go to the location on the disk for the value.

Put: append to the active data file and update the in-memory hash map.

Periodical compaction strategies

Copy latest entries: In-memory hashmap is always up-to-date. Stop and copy into new files. Time complexity is O(n) n is the number of valid entries.
- Pros: Efficient for lots of entries out-dated or deleted.
- Cons: Consume storage if little entries are out-dated. May double the space. (can be resolved by having a secondary node do the compression work with GET/POST periodically. E.g., Hadoop secondary namenode).
Scan and move: foreach entry, if it is up-to-date, move to the tail of the validated section. Time complexity is O(n) n is the number of all the entries.
- Pros:
  - shrink the size
  - no extra storage space needed
- Cons:
  - Complex and need to sync hashmap and storage with transactions. May hurt performance.

Following up questions

How to detect records that can be compacted?
- Use timestamp.
What if one hashmap cannot fit into a single machine’s memory?
- Consistent hashing, chord DHT, query time complexity is O(logn) with the finger table, instead of O(1) here with a hashmap.

Designing a URL Shortener System

November 1, 2018 · 5 min read

Design a system that can convert URLs provided by users into short URLs, allowing users to access their original URLs (hereinafter referred to as long URLs) using these short URLs. Describe how this system operates, including but not limited to the following questions: How are short URLs allocated? How is the mapping between short URLs and long URLs stored? How is the redirection service implemented? How is access data stored?

Assumptions: The initial problem description does not include these assumptions. An excellent candidate will ask about system scale when given a specific design.

There are approximately tens of thousands of long URL domains.
The traffic for new long URLs is about 10,000,000 per day (100 per second).
The traffic for the redirection service using short URLs to access long URLs is about 10 billion per day (100,000 per second).
Remind the candidate that these are average figures - during peak times, these numbers can be much higher (one type of peak time is time-related, such as when users return home from work, and another type is event-related, such as during the Spring Festival Gala).
Recent data (e.g., today's data) should be collected in advance and should be available within five minutes when users want to view it.
Historical data should be calculated daily.

Assumptions

1 billion new URLs per day, 100 billion short URL accesses. The shorter the short URL, the better. Data presentation (real-time/daily/monthly/yearly).

URL Encoding

http://blog.codinghorror.com/url-shortening-hashes-in-practice/

Method 1: md5 (128 bits, 16 hexadecimal digits, collisions, birthday paradox, 2^(n/2) = 2^64) Shorter? (64 bits, 8 hexadecimal digits, collisions 2^32), base 64.

Advantages: Hashing is relatively simple and easy to scale horizontally.
Disadvantages: Too long, how to handle expired URLs?

Method 2: Distributed ID generator. (Base 62: a~~z, A~~Z, 0~9, 62 characters, 62^7), partitioning: each node contains some IDs.

Advantages: Easier to eliminate expired URLs, shorter URLs.
Disadvantages: Coordination between different partitions (e.g., ZooKeeper).

Key-Value (KV) Storage

MySQL (10k requests per second, slow, no need for a relational database), key-value (100k requests per second, Redis, Memcached).

An excellent candidate will ask about the expected lifespan of short URLs and design a system that can automatically clean up expired short URLs.

Follow-Up

Question: How to generate short URLs?

A poor candidate might suggest using a single ID generator (single point of failure) or require coordination between ID generators for each ID generation. For example, using an auto-increment primary key in a database.
An acceptable candidate might suggest using md5 or some UUID generators that can generate IDs independently on some nodes. These methods can generate non-colliding IDs in a distributed system, allowing for the production of a large number of short URLs.
An excellent candidate will design a method using several ID generators, where each generator first reserves a block of ID sequences from a central coordinator (e.g., ZooKeeper), and these ID generators can allocate IDs from their ID sequences independently, cleaning up their ID sequences when necessary.

Question: How to store the mapping between long URLs and short URLs?

A poor candidate might suggest using a single, non-distributed, non-relational database. It is merely a simple key-value database.
An excellent candidate will suggest using a simple distributed storage system, such as MongoDB/HBase/Voldemort, etc.
A more excellent candidate will ask about the expected usage cycle of short URLs and then design a system that can clean up expired short URLs.

Question: How to implement the redirection service?

A poor candidate will design the system from scratch to solve problems that have already been solved.
An excellent candidate will suggest using an existing HTTP server with a plugin to translate the short URL ID, look up this ID in the database, update access data, return a 303 status, and redirect to the long URL. Existing HTTP servers include Apache/Jetty/Netty/Tomcat, etc.

Question: How to store access data?

A poor candidate will suggest writing to the database on every access.
An excellent candidate will suggest having several different components handle this task: generating access stream data, collecting and organizing it, and writing it to a permanent database after a certain period.

Question: How to separate the different components of storing access data proposed by the excellent candidate?

An excellent candidate will suggest using a low-latency information system to temporarily store access data and then hand the data over to the collection and organization component.
The candidate may ask how often access data needs to be updated. If updated daily, a reasonable method would be to store it in HDFS and use MapReduce to compute the data. If near-real-time data is required, the collection and organization component must compute the necessary data.

Question: How to block access to restricted websites?

An excellent candidate will suggest maintaining a blacklist of domains in the key-value database.
A good candidate might propose some advanced technologies that can be used when the system scales significantly, such as bloom filters.

Requirements​

Architecture​

Why Microservices?​

Scheduling Service​

Payment Service​

User Profile Service and Trip Record Service​

Notification Push Service​

Why Should We Care About Architecture?​

Tight Coupling MVC​

Cocoa MVC​

MVP​

MVVM​

VIPER​

Why Is Such a Protocol Needed?​

Solutions​

Architecture​

Why Does Facebook Handle Its Own Photo Storage?​

Solution​

Structure​

Data Layout​

CRUD Operations​

Use Cases​

Why bother with architecture?​

Tight-coupling MVC​

Cocoa MVC​

MVP​

MVVM​

VIPER​

Regular Patterns​

Cache-aside pattern​

Where to put the cache?​

What if data volume reaches the cache capacity? Use cache replacement policies​

Who are the King of the cache usage?​

Motivation & Assumptions​

Solution​

Architecture​

Data Layout​

CRUD Operations​

Usecases​

Requirements​

Architecture​

Why micro services?​

Dispatch Service​

Payment Service​

UserProfile Service and Trip Service​

Push Notification Service​

Requirements​

Solution​

Components​

Operations​

Assumptions​

URL Encoding​

Key-Value (KV) Storage​

Follow-Up​

About Tian Pan

Stay up to date

Requirements

Architecture

Why Microservices?

Scheduling Service

Payment Service

User Profile Service and Trip Record Service

Notification Push Service

Why Should We Care About Architecture?

Tight Coupling MVC

Cocoa MVC

MVP

MVVM

VIPER

Why Is Such a Protocol Needed?

Solutions

Architecture

Why Does Facebook Handle Its Own Photo Storage?

Solution

Structure

Data Layout

CRUD Operations

Use Cases

Why bother with architecture?

Tight-coupling MVC

Cocoa MVC

MVP

MVVM

VIPER

Regular Patterns

Cache-aside pattern

Where to put the cache?

What if data volume reaches the cache capacity? Use cache replacement policies

Who are the King of the cache usage?

Motivation & Assumptions

Solution

Architecture

Data Layout

CRUD Operations

Usecases

Requirements

Architecture

Why micro services?

Dispatch Service

Payment Service

UserProfile Service and Trip Service

Push Notification Service

Requirements

Solution

Components

Operations

Assumptions

URL Encoding

Key-Value (KV) Storage

Follow-Up