3 Programming Paradigms
Structured programming is a discipline imposed upon the direct transfer of control. OO programming is a discipline imposed upon the indirect transfer of control. Functional programming is discipline imposed upon variable assignment.
4 Kinds of No-SQL
When reading data from a hard disk, a database join operation is time-consuming and 99% of the time is spent on disk seek. To optimize read performance, denormalization is introduced and four categories of NoSQL are here to help.
A Closer Look at iOS Architecture Patterns
Why Should We Care About Architecture?
ACID vs BASE
ACID and BASE indicate different designing philosophy. ACID focuses on consistency over availability. In ACID, the C means that a transaction pre-serves all the database rules. Meanwhile, BASE focuses more on availability indicating the system is guaranteed to be available.
Authentication and Authorization in Microservices
design an auth solution that starts simple but could scale with the business, consider both security and user experiences, and talk about the future trends in this area
B tree vs. B+ tree
A B+ tree can be seen as B tree in which each node contains only keys. Pros of B+ tree can be summarized as fewer cache misses. In B tree, the data is associated with each key and can be accessed more quickly.
Bloom Filter
A bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way. A query returns either "possibly in set" or "definitely not in set".
Bloom Filter
A Bloom filter is a data structure that is used to determine whether an element is a member of a set with a much higher space and time efficiency than other general algorithms. The results obtained using a Bloom filter may yield false positive matches, but cannot yield false negative matches. Elements can be added to the set, but cannot be removed; the more elements added to the set, the greater the likelihood of false positives.
Cloud Design Patterns
There are three types of cloud design patterns. Availability patterns have health endpoint monitoring and throttling. Data management patterns have cache-aside and static content hosting. Security patterns have federated identity.
Concurrency Models
Five concurrency models you may want to know: Single-threaded; Multiprocessing and lock-based concurrency; Communicating Sequential Processes (CSP); Actor Model (AM); Software Transactional Memory (STM).
Credit Card Processing System
How is your credit card processed? 5 Parties and 2 workflows.
Data Partition and Routing
The advantages of implementing data partition and routing are availability and read efficiency while consistency is the weakness. The routing abstract model is essentially two maps: key-partition map and partition-machine map.
Design Pinterest
Designing a KV store with external storage
Requirements
Designing a Load Balancer
Internet services often need to handle traffic from around the world, but a single server can only serve a limited number of requests at the same time. Therefore, we typically have a server cluster to collectively manage this traffic. The question arises: how can we evenly distribute this traffic across different servers?
Designing a Load Balancer or Dropbox Bandaid
Large-scale web services deal with high-volume traffic, but one host could only serve a limited amount of requests. There is usually a server farm to take the traffic altogether. How to route them so that each host could evenly receive the request?
Designing a metric system
Requirements
Designing a URL shortener
If you are asked to design a system to take user-provided URLs and transform them to shortened URLs, what would you do? How would you allocate the shorthand URLs? How would you implement the redirect servers? How would you store the click stats?
Designing a URL Shortener System
Design a system that can convert URLs provided by users into short URLs, allowing users to access their original long URLs using these short URLs. The operation of the system should include, but not be limited to, the following questions: How to allocate short URLs? How to store the mapping between short URLs and long URLs? How to implement the redirection service? How to store access data?
Designing Airbnb or a hotel booking system
For guests and hosts, we store data with a relational database and build indexes to search by location, metadata, and availability. We can use external vendors for payment and remind the reservations with a priority queue.
Designing Facebook photo storage
Traditional NFS based design has metadata bottleneck: large metadata size limits the metadata hit ratio. Facebook photo storage eliminates the metadata by aggregating hundreds of thousands of images in a single haystack store file.
Designing Facebook's Photo Storage System
There are two reasons why Facebook handles photo storage: the petabyte-scale volume of blob data; traditional NFS-based designs face metadata bottlenecks, where massive metadata severely limits hit rates. The solution is to aggregate hundreds of thousands of images into a single Haystack storage file, thereby eliminating the metadata burden.
Designing Human-Centric Internationalization (i18n) Engineering Solutions
Most products from Silicon Valley companies target the global market, and internationalization is a strategic key for multinational companies venturing into this market. The i18n engineering solution we designed primarily addresses three major issues in the development process of websites and mobile apps: 1. Language, 2. Time and Time Zones, 3. Numbers and Currency. Like all software system development, there is no silver bullet for internationalization; great works are crafted through diligent foundational work.
Designing Memcached or an in-memory KV store
Memcached = rich client + distributed servers + hash table + LRU. It features a simple server and pushes complexity to the client) and hence reliable and easy to deploy.
Designing Online Judge or Leetcode
An online judge is primarily a place where you can execute code remotely for educational or recruitment purposes. In this design, we focus on designing an OJ for interview preparation like Leetcode.
Designing payment webhook
Design a webhook that notifies the merchant when the payment succeeds. We need to aggregate the metrics (e.g., success vs. failure) and display it on the dashboard.
Designing Smart Notification of Stock Price Changes
Requirements
Designing Square Cash or PayPal Money Transfer System
Design a money-transfer backend system that can receive, send, and payout. It should cover issues like scaling, internationalization, Deduplication, single-point failure, strong consistency.
Designing Stock Exchange
Requirements
Designing typeahead search or autocomplete
How to design a realtime typeahead autocomplete service? Linkedin's Cleo lib answers with a multi-layer architecture (browser cache / web tier / result aggregator / various typeahead backend) and 4 elements (inverted / forward index, bloom filter, scorer).
Designing Uber
Disclaimer: All things below are collected from public sources or purely original. No Uber-confidential stuff here.
Designing Uber Ride-Hailing Service
Requirements for designing Uber ride-hailing: providing services for the global transportation market; large-scale real-time scheduling; backend design; Uber ride-hailing design process: architecture; microservices; scheduling services; payment services; user profile services and trip record services, notification push services.
Enterprise Authorization Services 2022
Authorization determines whether an individual or system has the right to access a particular resource. And this process is a typical scenario that could be automated with software. We will review Google's Zanzibar, Zanzibar-inspired solutions and other AuthZ services on the market.
Experience Deep Dive
For those who had little experience in leadership positions, we have some tips for interviews. It is necessary to describe your previous projects including challenges or improvements. Also, remember to demonstrate your communication skills.
Fraud Detection with Semi-supervised Learning
Fraud Detection fights against account takeovers and Botnet attacks during login. Semi-supervised learning has better learning accuracy than unsupervised learning and less time and costs than supervised learning.
How Does Facebook Store a Large-Scale Social Graph? TAO
The efficiency of updating the edge list of the social graph in Memcached is too low, the logic for managing the cache on the client side is complex, and it is difficult to maintain consistency in database reads after writes. How to solve these three problems: accelerate read operations, efficiently handle large-scale reads; complete write operations in a timely manner; improve the availability of read operations.
How Facebook Scale its Social Graph Store? TAO
Before Tao, Facebook used the cache-aside pattern to scale its social graph store. There were three problems: list update operation is inefficient; clients have to manage cache and hard to offer read-after-write consistency. With Tao, these problems are solved.
How Netflix Serves Viewing Data?
Motivation
How to Build a Scalable Web Service?
How to build a scalable web service? One word: Split. The AKF Scale Cube tells us the three dimensions of "splitting": horizontal scaling; business splitting; data partitioning.
How to design robust and predictable APIs with idempotency?
APIs can be un-robust and un-predictable. To solve the problem, three principles should be observed. The client retries to ensure consistency. Retry with idempotency, exponential backoff, and random jitter.
How to Design Robust and Predictable APIs with Idempotency?
Why are APIs unreliable? Networks can fail, and servers can fail. Three principles to solve this problem: the client uses "retry" to ensure state consistency; the retry requests must include an idempotent unique ID; retries must be responsible, such as following an exponential backoff algorithm, because we do not want a large number of clients to retry simultaneously.
How to Design the Architecture of a Blockchain Server?
A distributed blockchain accounting and smart contract system. It requires minimal trust between nodes while incentivizing them to cooperate: transactions are irreversible, do not rely on trusted third parties, protect privacy, disclose minimal information, and ensure that money cannot be spent twice. Assuming performance is not an issue, we will not consider how to optimize performance.
How to scale a web service?
AKF scale cube visualizes the scaling process into three dimensions…
How to stream video over HTTP for mobile devices? HTTP Live Streaming (HLS)
Video service over Http for mobile devices has two problems: limited memory or storage and unstable network connection and variable bandwidth. HTTP live streaming solve this with separation of concerns, file segmentation, and indexing.
How to Stream Video to Mobile Devices Using HTTP? HTTP Live Streaming (HLS)
Mobile video playback services using HTTP Live Streaming face two main challenges: limited memory and storage on mobile devices; and the need to dynamically adjust video quality during transmission due to unstable network connections and varying bandwidth. We can address these issues at both the server and client levels.
How to write solid code?
Empathy plays the most important role in writing solid code. Besides, you need to choose a sustainable architecture to decrease human resource costs in total as the project scales. Then, adopt patterns and best practices; avoid anti-patterns. Finally, refactor if necessary.
Improving availability with failover
To improve availability with failover, there are serval ways to achieve the goal such as cold standby, hot standby, warm standby, checkpointing and all active.
Improving System Availability through Failover
Failover: Failover is a backup operational mode used to enhance system stability and availability. When the primary component fails or is scheduled for downtime, the functions of system components (such as processors, servers, networks, or databases) are transferred to secondary system components.
Intro to Relational Database
The relational database is the default choice for most storage use cases, by reason of atomicity, consistency, isolation, and durability. How is consistency here different from the one in CAP theorem? Why do we need 3NF and DB proxy?
Introduction to Architecture
Architecture serves the full lifecycle of the software system to make it easy to understand, develop, test, deploy and operate. The O’Reilly book Software Architecture Patterns gives a simple but effective introduction to five fundamental architectures.
Introduction to Architecture
Architecture serves the entire lifecycle of software systems, making them easy to understand, develop, test, deploy, and operate, with the goal of minimizing the human resource costs for each business use case. O'Reilly's
iOS Architecture Patterns Revisited
Architecture can directly impact costs per feature. Let's compare Tight-coupling MVC, Cocoa MVC, MVP, MVVM, and VIPER in three dimensions: balanced distribution of responsibility among feature actors, testability and ease of use and maintainability.
Key value cache
The key-value cache is used to reduce the latency of data access. What are read-through, write-through, write-behind, write-back, write-behind, and cache-aside patterns?
Lambda Architecture
Lambda architecture = CQRS (batch layer + serving layer) + speed layer. It solves accuracy, latency, throughput problems of big data.
Lambda Architecture
Using Lambda can address three issues brought by big data: accuracy (good); latency (fast); throughput (high). The lambda architecture can guide us on how to scale a data system.
Load Balancer Types
Usually, load balancers have three categories: DNS Round Robin, Network Load balancer and Application Load balancer. DNS Round Robin is rarely used as it is hard to control and not responsive. The network Load balancer has better granularity and is simple and responsive.
Lyft's Marketing Automation Platform -- Symphony
To achieve a higher ROI in advertising, Lyft launched a marketing automation platform, which consists of three main components: lifetime value forecaster, budget allocator, and bidders.
Lyft's Marketing Automation Platform Symphony
How can advertising campaigns achieve higher returns with less money and fewer people? Lyft's answer is automation, which includes an LTV prediction module, a budget allocation module, and a delivery module. When people are freed from tedious delivery tasks and can focus on understanding users, channels, and the messages they need to convey to their audience, they can achieve better campaign results.
Past Work Experience Interview
Target Audience
Public API Choices
There are several tools for the public API, API gateway or Backend for Frontend gateway. GraphQL distinguishes itself from others for its features like tailing results, batching nested queries, performance tracing, and explicit caching.
Quick Intro to Optimism Architecture
What is Optimism? How does its architecture look like? What are its building components and how do they interact with each other?
Replica, Consistency, and CAP theorem
Any networked system has three desirable properties: consistency, availability and partition tolerance. Systems can have only two of those three. For example, RDBMS prefers consistency and partition tolerance and becomes an ACID system.
Skip List
A skip list is essentially a linked list that allows for binary search. It achieves this by adding extra nodes that enable you to "skip" parts of the linked list. Given a random number generator to create these extra nodes, a skip list has O(log n) complexity for search, insert, and delete operations.
Skiplist
A skip-list is essentially a linked list that allows you to do a binary search on. The way it accomplishes this is by adding extra nodes that will enable you to ‘skip’ sections of the linked-list. There are LevelDB MemTable, Redis SortedSet and Lucene inverted index using this.
SOLID Design Principles
SOLID is an acronym of design principles that help software engineers write solid code. S is for single responsibility principle, O for open/closed principle, L for Liskov’s substitution principle, I for interface segregation principle and D for dependency inversion principle.
Stream and Batch Processing Frameworks
Stream and Batch processing frameworks can process high throughput at low latency. Why is Flink gaining popularity? And how to make an architectural choice among Storm, Storm-trident, Spark, and Flink?
Stream and Batch Processing Frameworks
Why Do We Need Such Frameworks?
Toutiao Recommendation System: P1 Overview
In order to evaluate user satisfaction, machine learning models are implemented. These models observe and measure the reality by feature engineering and further reduce latencies by recall strategy.
Toutiao Recommendation System: P1 Overview
What are we optimizing for? User Satisfaction
Toutiao Recommendation System: P2 Content Analysis
Content analysis and data mining of user tags are the cornerstones of the recommendation system. The content analysis derives intermedia data from raw articles and user behaviors. With content analysis, we are able to tag users, recommend and prepare content.
What are the use cases for key-value caching?
The essence of Key Value Cache is to reduce data access latency. Common strategies for cache design include read-through/write-through and cache aside. The specific strategy should be chosen based on your business needs.
What can we communicate in soft skills interview?
An interview is a process for workers to find future co-workers. The candidate will be evaluated based on answers to three key questions: capability, willingness, and culture-fit. Any question above can not be answered without good communication.
What Can You Discuss in a Soft Skills Interview?
Without the ability to express oneself at the same skill level, job opportunities can be taken away. The essence of an interview revolves around three questions: Can you do it or not; Do you want to do it or not; Are you a good fit or not. The five discussion points in an interview are: Adversity; Influence; Technical proficiency; Fit; Achievements. How to prepare for these five discussion points: Engage with more people, accumulate experiences, learn more technical skills, and be good at research.
What is Apache Kafka?
Apache Kafka is a distributed streaming platform, which can be used for logging by topics, messaging system geo-replication or stream processing. It is much faster than other platforms due to its zero-copy technology.
What is Apache Kafka?
Apache Kafka is a distributed streaming platform. Its features include a distributed publish-subscribe (pub-sub) messaging system that simplifies N ^ 2 relationships into N, allowing publishers and subscribers to operate at their own rates; ultra-fast zero-copy technology; and support for fault-tolerant data persistence.