Skip to main content

49 docs tagged with "system design"

View all tags

3 Programming Paradigms

Structured programming is a discipline imposed upon the direct transfer of control. OO programming is a discipline imposed upon the indirect transfer of control. Functional programming is discipline imposed upon variable assignment.

4 Kinds of No-SQL

When reading data from a hard disk, a database join operation is time-consuming and 99% of the time is spent on disk seek. To optimize read performance, denormalization is introduced and four categories of NoSQL are here to help.

ACID vs BASE

ACID and BASE indicate different designing philosophy. ACID focuses on consistency over availability. In ACID, the C means that a transaction pre-serves all the database rules. Meanwhile, BASE focuses more on availability indicating the system is guaranteed to be available.

B tree vs. B+ tree

A B+ tree can be seen as B tree in which each node contains only keys. Pros of B+ tree can be summarized as fewer cache misses. In B tree, the data is associated with each key and can be accessed more quickly.

Bloom Filter

A bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way. A query returns either "possibly in set" or "definitely not in set".

Cloud Design Patterns

There are three types of cloud design patterns. Availability patterns have health endpoint monitoring and throttling. Data management patterns have cache-aside and static content hosting. Security patterns have federated identity.

Concurrency Models

Five concurrency models you may want to know: Single-threaded; Multiprocessing and lock-based concurrency; Communicating Sequential Processes (CSP); Actor Model (AM); Software Transactional Memory (STM).

Data Partition and Routing

The advantages of implementing data partition and routing are availability and read efficiency while consistency is the weakness. The routing abstract model is essentially two maps: key-partition map and partition-machine map.

Designing a Load Balancer or Dropbox Bandaid

Large-scale web services deal with high-volume traffic, but one host could only serve a limited amount of requests. There is usually a server farm to take the traffic altogether. How to route them so that each host could evenly receive the request?

Designing a URL shortener

If you are asked to design a system to take user-provided URLs and transform them to shortened URLs, what would you do? How would you allocate the shorthand URLs? How would you implement the redirect servers? How would you store the click stats?

Designing Airbnb or a hotel booking system

For guests and hosts, we store data with a relational database and build indexes to search by location, metadata, and availability. We can use external vendors for payment and remind the reservations with a priority queue.

Designing Facebook photo storage

Traditional NFS based design has metadata bottleneck: large metadata size limits the metadata hit ratio. Facebook photo storage eliminates the metadata by aggregating hundreds of thousands of images in a single haystack store file.

Designing Online Judge or Leetcode

An online judge is primarily a place where you can execute code remotely for educational or recruitment purposes. In this design, we focus on designing an OJ for interview preparation like Leetcode.

Designing payment webhook

Design a webhook that notifies the merchant when the payment succeeds. We need to aggregate the metrics (e.g., success vs. failure) and display it on the dashboard.

Designing typeahead search or autocomplete

How to design a realtime typeahead autocomplete service? Linkedin's Cleo lib answers with a multi-layer architecture (browser cache / web tier / result aggregator / various typeahead backend) and 4 elements (inverted / forward index, bloom filter, scorer).

Designing Uber

Disclaimer: All things below are collected from public sources or purely original. No Uber-confidential stuff here.

Enterprise Authorization Services 2022

Authorization determines whether an individual or system has the right to access a particular resource. And this process is a typical scenario that could be automated with software. We will review Google's Zanzibar, Zanzibar-inspired solutions and other AuthZ services on the market.

Experience Deep Dive

For those who had little experience in leadership positions, we have some tips for interviews. It is necessary to describe your previous projects including challenges or improvements. Also, remember to demonstrate your communication skills.

Fraud Detection with Semi-supervised Learning

Fraud Detection fights against account takeovers and Botnet attacks during login. Semi-supervised learning has better learning accuracy than unsupervised learning and less time and costs than supervised learning.

How Facebook Scale its Social Graph Store? TAO

Before Tao, Facebook used the cache-aside pattern to scale its social graph store. There were three problems: list update operation is inefficient; clients have to manage cache and hard to offer read-after-write consistency. With Tao, these problems are solved.

Improving availability with failover

To improve availability with failover, there are serval ways to achieve the goal such as cold standby, hot standby, warm standby, checkpointing and all active.

Intro to Relational Database

The relational database is the default choice for most storage use cases, by reason of atomicity, consistency, isolation, and durability. How is consistency here different from the one in CAP theorem? Why do we need 3NF and DB proxy?

Introduction to Architecture

Architecture serves the full lifecycle of the software system to make it easy to understand, develop, test, deploy and operate. The O’Reilly book Software Architecture Patterns gives a simple but effective introduction to five fundamental architectures.

iOS Architecture Patterns Revisited

Architecture can directly impact costs per feature. Let's compare Tight-coupling MVC, Cocoa MVC, MVP, MVVM, and VIPER in three dimensions: balanced distribution of responsibility among feature actors, testability and ease of use and maintainability.

Key value cache

The key-value cache is used to reduce the latency of data access. What are read-through, write-through, write-behind, write-back, write-behind, and cache-aside patterns?

Lambda Architecture

Lambda architecture = CQRS (batch layer + serving layer) + speed layer. It solves accuracy, latency, throughput problems of big data.

Load Balancer Types

Usually, load balancers have three categories: DNS Round Robin, Network Load balancer and Application Load balancer. DNS Round Robin is rarely used as it is hard to control and not responsive. The network Load balancer has better granularity and is simple and responsive.

Public API Choices

There are several tools for the public API, API gateway or Backend for Frontend gateway. GraphQL distinguishes itself from others for its features like tailing results, batching nested queries, performance tracing, and explicit caching.

Replica, Consistency, and CAP theorem

Any networked system has three desirable properties: consistency, availability and partition tolerance. Systems can have only two of those three. For example, RDBMS prefers consistency and partition tolerance and becomes an ACID system.

Skiplist

A skip-list is essentially a linked list that allows you to do a binary search on. The way it accomplishes this is by adding extra nodes that will enable you to ‘skip’ sections of the linked-list. There are LevelDB MemTable, Redis SortedSet and Lucene inverted index using this.

SOLID Design Principles

SOLID is an acronym of design principles that help software engineers write solid code. S is for single responsibility principle, O for open/closed principle, L for Liskov’s substitution principle, I for interface segregation principle and D for dependency inversion principle.

Stream and Batch Processing Frameworks

Stream and Batch processing frameworks can process high throughput at low latency. Why is Flink gaining popularity? And how to make an architectural choice among Storm, Storm-trident, Spark, and Flink?

Toutiao Recommendation System: P1 Overview

In order to evaluate user satisfaction, machine learning models are implemented. These models observe and measure the reality by feature engineering and further reduce latencies by recall strategy.

What can we communicate in soft skills interview?

An interview is a process for workers to find future co-workers. The candidate will be evaluated based on answers to three key questions: capability, willingness, and culture-fit. Any question above can not be answered without good communication.

What is Apache Kafka?

Apache Kafka is a distributed streaming platform, which can be used for logging by topics, messaging system geo-replication or stream processing. It is much faster than other platforms due to its zero-copy technology.