Skip to main content

76 posts tagged with "system design"

View all tags

Toutiao Recommendation System: P2 Content Analysis

· 3 min read

In Toutiao Recommendation System: P1 Overview, we know that content analysis and data mining of user tags are the cornerstones of the recommendation system.

What is the content analysis?

content analysis = derive intermediate data from raw articles and user behaviors.

Take articles for example. To model user interests, we need to tag contents and articles. To associate a user with the interests of the “Internet” tag, we need to know whether a user reads an article with the “Internet” tag.

Why are we analyzing those raw data?

We do it for the reason of …

  1. Tagging users (user profile)
    • Tagging users who liked articles with “Internet” tag. Tagging users who liked articles with “xiaomi” tag.
  2. Recommending contents to users by tags
    • Pushing “meizu” contents to users with “meizu” tag. Pushing “dota” contents to users with “dota” tag.
  3. Preparing contents by topics
    • Put “Bundesliga” articles to “Bundesliga topic”. Put “diet” articles to “diet topic”.

Case Study: Analysis Result of an Article

Here is an example of “article features” page. There are article features like categorizations, keywords, topics, entities.

Analysis Result of an Article

Analysis Result of an Article: Details

What are the article features?

  1. Semantic Tags: Human predefine those tags with explicit meanings.

  2. Implicit Semantics, including topics and keywords. Topic features are describing the statistics of words. Certain rules generate keywords.

  3. Similarity. Duplicate recommendation once to be the most severe feedbacks we get from our customers.

  4. Time and location.

  5. Quality. Abusing, porn, ads, or “chicken soup for the soul”?

Article features are important

  • It is not true that a recommendation system cannot work at all without article features. Amazon, Walmart, Netflix can recommend by collaborative filtering.
  • However, in news product, users consume contents of the same day. Bootstrapping without article features is hard. Collaborative filtering cannot help with bootstrapping.
    • The finer of the granularity of the article feature, the better the ability to bootstrap.

More on Semantic Tags

We divide features of semantic tags into three levels:

  1. Categorizations: used in the user profile, filtering contents in topics, recommend recall, recommend features
  2. Concepts: used in filtering contents in topics, searching tags, recommend recall(like)
  3. Entities: used in filtering contents in topics, searching tags, recommend recall(like)

Why dividing into different levels? We do this so that they can capture articles in different granularities.

  1. Categorizations: full in coverage, low in accuracy.
  2. Concepts: medium in coverage, medium in accuracy.
  3. Entities: low in coverage, high in accuracy. It only covers hot people, organizations, products in each area.

Categorizations and concepts are sharing the same technical infrastructure.

Why do we need semantic tags?

  • Implicit semantics
    • have been functioning well.
    • cost much less than semantic tags.
  • But, topics and interests need a clear-defined tagging system.
  • Semantic tags also evaluate the capability in NPL technology of a company.

Document classification

Classification hierarchy

  1. Root
  2. Science, sports, finance, entertainment
  3. Football, tennis, table tennis, track and field, swimming
  4. International, domestic
  5. Team A, team B

Classifiers:

  • SVM
  • SVM + CNN
  • SVM + CNN + RNN

Calculating relevance

  1. Lexical analysis for articles
  2. Filtering keywords
  3. Disambiguation
  4. Calculating relevance

Stream and Batch Processing Frameworks

· 2 min read

Why Do We Need Such Frameworks?

  • To process more data in a shorter amount of time.
  • To unify fault tolerance in distributed systems.
  • To simplify task abstractions to meet changing business requirements.
  • Suitable for bounded datasets (batch processing) and unbounded datasets (stream processing).

Brief History of Batch and Stream Processing Development

  1. Hadoop and MapReduce. Google made batch processing as simple as MapReduce result = pairs.map((pair) => (morePairs)).reduce(somePairs => lessPairs) in a distributed system.
  2. Apache Storm and directed graph topologies. MapReduce does not represent iterative algorithms well. Therefore, Nathan Marz abstracted stream processing into a graph structure composed of spouts and bolts.
  3. Spark in-memory computation. Reynold Xin pointed out that Spark uses ten times fewer machines than Hadoop while being three times faster when processing the same data.
  4. Google Dataflow based on Millwheel and FlumeJava. Google uses a windowed API to support both batch and stream processing simultaneously.
  1. Flink quickly adopted the programming model of ==Google Dataflow== and Apache Beam.
  2. Flink's efficient implementation of the Chandy-Lamport checkpointing algorithm.

These Frameworks

Architecture Choices

To meet the above demands with commercial machines, there are several popular distributed system architectures...

  • Master-slave (centralized): Apache Storm + Zookeeper, Apache Samza + YARN
  • P2P (decentralized): Apache S4

Features

  1. DAG Topology for iterative processing - for example, GraphX in Spark, topologies in Apache Storm, DataStream API in Flink.
  2. Delivery Guarantees. How to ensure the reliability of data delivery between nodes? At least once / at most once / exactly once.
  3. Fault Tolerance. Implement fault tolerance using cold/warm/hot standby, checkpointing, or active-active.
  4. Windowed API for unbounded datasets. For example, streaming windows in Apache. Window functions in Spark. Windowing in Apache Beam.

Comparison Table of Different Architectures

ArchitectureStormStorm-tridentSparkFlink
ModelNativeMicro-batchMicro-batchNative
GuaranteesAt least onceExactly onceExactly onceExactly once
Fault ToleranceRecord AckRecord AckCheckpointCheckpoint
Maximum Fault ToleranceHighMediumMediumLow
LatencyVery lowHighHighLow
ThroughputLowMediumHighHigh

Toutiao Recommendation System: P1 Overview

· 6 min read

What are we optimizing for? User Satisfaction

We are finding the best function below to maximize user satisfaction .

user satisfaction = function(content, user profile, context)
  1. Content: features of articles, videos, UGC short videos, Q&As, etc.
  2. User profile: interests, occupation, age, gender and behavior patterns, etc.
  3. Context: Mobile users in contexts of workspace, commuting, traveling, etc.

How to evaluate the satisfaction?

  1. Measurable Goals, e.g.

    • click through rate
    • Session duration
    • upvotes
    • comments
    • reposts
  2. Hard-to-measurable Goals:

    • Frequency control of ads and special-typed contents (Q&A)
    • Frequency control of vulgar content
    • Reducing clickbait, low quality, disgusting content
    • Enforcing / pining / highly-weighting important news
    • Lowly-weighting contents from low-level accounts

How to optimize for those goals? Machine Learning Models

It is a typical supervised machine learning problem to find the best function above. To implement the system, we have these algorithms:

  1. Collaborative Filtering
  2. Logistic Regression
  3. DNN
  4. Factorization Machine
  5. GBDT

A world-class recommendation system is supposed to have the flexibility to A/B-test and combine multiple algorithms above. It is now popular to combine LR and DNN. Facebook used both LR and GBDT years ago.

How do models observe and measure the reality? Feature engineering

  1. Correlation, between content’s characteristic and user’s interest. Explicit correlations include keywords, categories, sources, genres. Implicit correlations can be extract from user’s vector or item’s vector from models like FM.

  2. Environmental features such as geo location, time. It’s can be used as bias or building correlation on top of it.

  3. Hot trend. There are global hot trend, categorical hot trend, topic hot trend and keyword hot trend. Hot trend is very useful to solve cold-start issue when we have little information about user.

  4. Collaborative features, which helps avoid situation where recommended content get more and more concentrated. Collaborative filtering is not analysing each user’s history separately, but finding users’ similarity based on their behaviour by clicks, interests, topics, keywords or event implicit vectors. By finding similar users, it can expand the diversity of recommended content.

Large-scale Training in Realtime

  • Users like to see news feed updated in realtime according to what we track from their actions.
  • Use Apache storm to train data (clicks, impressions, faves, shares) in realtime.
  • Collect data to a threshold and then update to the recommendation model
  • Store model parameters , like tens of billions of raw features and billions of vector features, in high performance computing clusters.

They are implemented in the following steps:

  1. Online services record features in realtime.
  2. Write data into Kafka
  3. Ingest data from Kafka to Storm
  4. Populate full user profiles and prepare samples
  5. Update model parameters according to the latest samples
  6. Online modeling gains new knowledge

How to further reducing the latency? Recall Strategy

It is impossible to predict all the things with the model, considering the super-large scale of all the contents. Therefore, we need recall strategies to focus on a representative subset of the data. Performance is critical here and timeout is 50ms.

recall strategy

Among all the recall strategies, we take the InvertedIndex<Key, List<Article>> .

The Key can be topic, entity, source, etc.

Tags of InterestsRelevanceList of Documents
E-commerce0.3
Fun0.2
History0.2
Military0.1

Data Dependencies

  • Features depends on tags of user-side and content-side.
  • recall strategy depends on tags of user-side and content-side.
  • content analysis and data mining of user tags are cornerstone of the recommendation system.

What is the content analysis?

content analysis = derive intermediate data from raw articles and user behaviors.

Take articles for example. To model user interests, we need to tag contents and articles. To associate a user with the interests of the “Internet” tag, we need to know whether a user reads an article with the “Internet” tag.

Why are we analyzing those raw data?

We do it for the reason of …

  1. Tagging users (user profile)
    • Tagging users who liked articles with “Internet” tag. Tagging users who liked articles with “xiaomi” tag.
  2. Recommending contents to users by tags
    • Pushing “meizu” contents to users with “meizu” tag. Pushing “dota” contents to users with “dota” tag.
  3. Preparing contents by topics
    • Put “Bundesliga” articles to “Bundesliga topic”. Put “diet” articles to “diet topic”.

Case Study: Analysis Result of an Article

Here is an example of “article features” page. There are article features like categorizations, keywords, topics, entities.

Analysis Result of an Article

Analysis Result of an Article: Details

What are the article features?

  1. Semantic Tags: Human predefine those tags with explicit meanings.

  2. Implicit Semantics, including topics and keywords. Topic features are describing the statistics of words. Certain rules generate keywords.

  3. Similarity. Duplicate recommendation once to be the most severe feedbacks we get from our customers.

  4. Time and location.

  5. Quality. Abusing, porn, ads, or “chicken soup for the soul”?

Article features are important

  • It is not true that a recommendation system cannot work at all without article features. Amazon, Walmart, Netflix can recommend by collaborative filtering.
  • However, in news product, users consume contents of the same day. Bootstrapping without article features is hard. Collaborative filtering cannot help with bootstrapping.
    • The finer of the granularity of the article feature, the better the ability to bootstrap.

More on Semantic Tags

We divide features of semantic tags into three levels:

  1. Categorizations: used in the user profile, filtering contents in topics, recommend recall, recommend features
  2. Concepts: used in filtering contents in topics, searching tags, recommend recall(like)
  3. Entities: used in filtering contents in topics, searching tags, recommend recall(like)

Why dividing into different levels? We do this so that they can capture articles in different granularities.

  1. Categorizations: full in coverage, low in accuracy.
  2. Concepts: medium in coverage, medium in accuracy.
  3. Entities: low in coverage, high in accuracy. It only covers hot people, organizations, products in each area.

Categorizations and concepts are sharing the same technical infrastructure.

Why do we need semantic tags?

  • Implicit semantics
    • have been functioning well.
    • cost much less than semantic tags.
  • But, topics and interests need a clear-defined tagging system.
  • Semantic tags also evaluate the capability in NPL technology of a company.

Document classification

Classification hierarchy

  1. Root
  2. Science, sports, finance, entertainment
  3. Football, tennis, table tennis, track and field, swimming
  4. International, domestic
  5. Team A, team B

Classifiers:

  • SVM
  • SVM + CNN
  • SVM + CNN + RNN

Calculating relevance

  1. Lexical analysis for articles
  2. Filtering keywords
  3. Disambiguation
  4. Calculating relevance

Toutiao Recommendation System: P1 Overview

· 3 min read

What are we optimizing for? User Satisfaction

We are finding the best function below to maximize user satisfaction .

user satisfaction = function(content, user profile, context)
  1. Content: features of articles, videos, UGC short videos, Q&As, etc.
  2. User profile: interests, occupation, age, gender and behavior patterns, etc.
  3. Context: Mobile users in contexts of workspace, commuting, traveling, etc.

How to evaluate the satisfaction?

  1. Measurable Goals, e.g.

    • click through rate
    • Session duration
    • upvotes
    • comments
    • reposts
  2. Hard-to-measurable Goals:

    • Frequency control of ads and special-typed contents (Q&A)
    • Frequency control of vulgar content
    • Reducing clickbait, low quality, disgusting content
    • Enforcing / pining / highly-weighting important news
    • Lowly-weighting contents from low-level accounts

How to optimize for those goals? Machine Learning Models

It is a typical supervised machine learning problem to find the best function above. To implement the system, we have these algorithms:

  1. Collaborative Filtering
  2. Logistic Regression
  3. DNN
  4. Factorization Machine
  5. GBDT

A world-class recommendation system is supposed to have the flexibility to A/B-test and combine multiple algorithms above. It is now popular to combine LR and DNN. Facebook used both LR and GBDT years ago.

How do models observe and measure the reality? Feature engineering

  1. Correlation, between content’s characteristic and user’s interest. Explicit correlations include keywords, categories, sources, genres. Implicit correlations can be extract from user’s vector or item’s vector from models like FM.

  2. Environmental features such as geo location, time. It’s can be used as bias or building correlation on top of it.

  3. Hot trend. There are global hot trend, categorical hot trend, topic hot trend and keyword hot trend. Hot trend is very useful to solve cold-start issue when we have little information about user.

  4. Collaborative features, which helps avoid situation where recommended content get more and more concentrated. Collaborative filtering is not analysing each user’s history separately, but finding users’ similarity based on their behaviour by clicks, interests, topics, keywords or event implicit vectors. By finding similar users, it can expand the diversity of recommended content.

Large-scale Training in Realtime

  • Users like to see news feed updated in realtime according to what we track from their actions.
  • Use Apache storm to train data (clicks, impressions, faves, shares) in realtime.
  • Collect data to a threshold and then update to the recommendation model
  • Store model parameters , like tens of billions of raw features and billions of vector features, in high performance computing clusters.

They are implemented in the following steps:

  1. Online services record features in realtime.
  2. Write data into Kafka
  3. Ingest data from Kafka to Storm
  4. Populate full user profiles and prepare samples
  5. Update model parameters according to the latest samples
  6. Online modeling gains new knowledge

How to further reducing the latency? Recall Strategy

It is impossible to predict all the things with the model, considering the super-large scale of all the contents. Therefore, we need recall strategies to focus on a representative subset of the data. Performance is critical here and timeout is 50ms.

recall strategy

Among all the recall strategies, we take the InvertedIndex<Key, List<Article>> .

The Key can be topic, entity, source, etc.

Tags of InterestsRelevanceList of Documents
E-commerce0.3
Fun0.2
History0.2
Military0.1

Data Dependencies

  • Features depends on tags of user-side and content-side.
  • recall strategy depends on tags of user-side and content-side.
  • content analysis and data mining of user tags are cornerstone of the recommendation system.

Stream and Batch Processing Frameworks

· 2 min read

Why such frameworks?

  • process high-throughput in low latency.
  • fault-tolerance in distributed systems.
  • generic abstraction to serve volatile business requirements.
  • for bounded data set (batch processing) and for unbounded data set (stream processing).

Brief history of batch/stream processing

  1. Hadoop and MapReduce. Google made batch processing as simple as MR result = pairs.map((pair) => (morePairs)).reduce(somePairs => lessPairs) in a distributed system.
  2. Apache Storm and DAG Topology. MR doesn’t efficiently express iterative algorithms. Thus Nathan Marz abstracted stream processing into a graph of spouts and bolts.
  3. Spark in-memory Computing. Reynold Xin said Spark sorted the same data 3X faster using 10X fewer machines compared to Hadoop.
  4. Google Dataflow based on Millwheel and FlumeJava. Google supports both batch and streaming computing with the windowing API.
  1. its fast adoption of ==Google Dataflow==/Beam programming model.
  2. its highly efficient implementation of Chandy-Lamport checkpointing.

How?

Architectural Choices

To serve requirements above with commodity machines, the steaming framework use distributed systems in these architectures...

  • master-slave (centralized): apache storm with zookeeper, apache samza with YARN.
  • P2P (decentralized): apache s4.

Features

  1. DAG Topology for Iterative Processing. e.g. GraphX in Spark, topologies in Apache Storm, DataStream API in Flink.
  2. Delivery Guarantees. How guaranteed to deliver data from nodes to nodes? at-least once / at-most once / exactly once.
  3. Fault-tolerance. Using cold/warm/hot standby, checkpointing, or active-active.
  4. Windowing API for unbounded data set. e.g. Stream Windows in Apache Flink. Spark Window Functions. Windowing in Apache Beam.

Comparison

FrameworkStormStorm-tridentSparkFlink
Modelnativemicro-batchmicro-batchnative
Guarenteesat-least-onceexactly-onceexactly-onceexactly-once
Fault-toleranceRecord-Ackrecord-ackcheckpointcheckpoint
Overhead of fault-tolerancehighmediummediumlow
latencyvery-lowhighhighlow
throughputlowmediumhighhigh

Fraud Detection with Semi-supervised Learning

· 4 min read

Clarify Requirements

Calculate risk probability scores in realtime and make decisions along with a rule engine to prevent ATO (account takeovers) and Botnet attacks.

Train clustering fatures with online and offline pipelines

  1. Source from website logs, auth logs, user actions, transactions, high-risk accounts in watch list
  2. track event data in kakfa topics
  3. Process events and prepare clustering features

Realtime scoring and rule-based decision

  1. assess a risk score comprehensively for online services

  2. Maintain flexibility with manually configuration in a rule engine

  3. share, or use the insights in online services

ATOs ranking from easy to hard to detect

  1. from single IP
  2. from IPs on the same device
  3. from IPs across the world
  4. from 100k IPs
  5. attacks on specific accounts
  6. phishing and malware

Challenges

  • Manual feature selection
  • Feature evolution in adversarial environment
  • Scalability
  • No online DBSCAN

High-level Architecture

Core Components and Workflows

Semi-supervised learning = unlabeled data + small amount of labeled data

Why? better learning accuracy than unsupervised learning + less time and costs than supervised learning

Training: To prepare clustering features in database

  • Streaming Pipeline on Spark:
    • Runs continuously in real-time.
    • Performs feature normalization and categorical transformation on the fly.
      • Feature Normalization: Scale your numeric features (e.g., age, income) so that they are between 0 and 1.
      • Categorical Feature Transformation: Apply one-hot encoding or another transformation to convert categorical features into a numeric format suitable for the machine learning model.
    • Uses Spark MLlib’s K-means to cluster streaming data into groups.
      • After running k-means and forming clusters, you might find that certain clusters have more instances of fraud.
      • Once you’ve labeled a cluster as fraudulent based on historical data or expert knowledge, you can use that cluster assignment during inference. Any new data point assigned to that fraudulent cluster can be flagged as suspicious.
  • Hourly Cronjob Pipeline:
    • Runs periodically every hour (batch processing).
    • Applies thresholding to identify anomalies based on results from the clustering model.
    • Tunes parameters of the DBSCAN algorithm to improve clustering and anomaly detection.
    • Uses DBSCAN from scikit-learn to find clusters and detect outliers in batch data.
      • DBSCAN, which can detect outliers, might identify clusters of regular transactions and separate them from noise, which could be unusual, potentially fraudulent transactions.
      • Transactions in the noisy or outlier regions (points that don’t belong to any dense cluster) can be flagged as suspicious.
      • After identifying a cluster as fraudulent, DBSCAN helps detect patterns of fraud even in irregularly shaped transaction distributions.

Serving

The serving layer is where the rubber meets the road - where we turn our machine learning models and business rules into actual fraud prevention decisions. Here's how it works:

  • Fraud Detection Scoring Service:
    • Takes real-time features extracted from incoming requests
    • Applies both clustering models (K-means from streaming and DBSCAN from batch)
    • Combines scores with streaming counters (like login attempts per IP)
    • Outputs a unified risk score between 0 and 1
  • Rule Engine:
    • Acts as the "brain" of the system
    • Combines ML scores with configurable business rules
    • Examples of rules:
      • If risk score > 0.8 AND user is accessing from new IP → require 2FA
      • If risk score > 0.9 AND account is high-value → block transaction
    • Rules are stored in a database and can be updated without code changes
    • Provides an admin portal for security teams to adjust rules
  • Integration with Other Services:
    • Exposes REST APIs for real-time scoring
    • Publishes results to streaming counters for monitoring
    • Feeds decisions back to the training pipeline to improve model accuracy
  • Observability:
    • Tracks key metrics like false positive/negative rates
    • Monitors model drift and feature distribution changes
    • Provides dashboards for security analysts to investigate patterns
    • Logs detailed information for post-incident analysis

Designing Uber Ride-Hailing Service

· 2 min read

Disclaimer: All content below is sourced from public resources or purely original. No confidential information regarding Uber is included here.

Requirements

  • Provide services for the global transportation market
  • Large-scale real-time scheduling
  • Backend design

Architecture

uber architecture

Why Microservices?

==Conway's Law== The structure of a software system corresponds to the organizational structure of the company.

Monolithic ==Service==Microservices
When team size and codebase are small, productivity✅ High❌ Low
==When team size and codebase are large, productivity==❌ Low✅ High (Conway's Law)
==Quality requirements for engineering==❌ High (Inadequately skilled developers can easily disrupt the entire system)✅ Low (Runtime is isolated)
Dependency version upgrades✅ Fast (Centralized management)❌ Slow
Multi-tenant support / Production-staging state isolation✅ Easy❌ Difficult (Each service must 1) either establish a staging environment connected to other services in staging 2) or support multi-tenancy across request contexts and data storage)
Debuggability, assuming the same modules, parameters, logs❌ Low✅ High (if distributed tracing is available)
Latency✅ Low (Local)❌ High (Remote)
DevOps costs✅ Low (High cost of build tools)❌ High (Capacity planning is difficult)

Combining monolithic ==codebase== and microservices can leverage the strengths of both.

Scheduling Service

  • Consistent hash addresses provided by geohash
  • Data is transient in memory, so no need for duplication. (CAP: AP over CP)
  • Use single-threaded or locked sharding to prevent double scheduling

Payment Service

==The key is to have asynchronous design==, as ACID transaction payment systems across multiple systems often have very long latencies.

User Profile Service and Trip Record Service

  • Use caching to reduce latency
  • As 1) support for more countries and regions increases 2) user roles (drivers, riders, restaurant owners, diners, etc.) gradually expand, providing user profile services for these users also faces significant challenges.

Notification Push Service

  • Apple Push Notification Service (unreliable)
  • Google Cloud Messaging (GCM) (can detect successful delivery) or
  • SMS services are generally more reliable

A Closer Look at iOS Architecture Patterns

· 3 min read

Why Should We Care About Architecture?

The answer is: to reduce the human resources spent on each feature.

Mobile developers evaluate the quality of an architecture on three levels:

  1. Whether the responsibilities of different features are evenly distributed
  2. Whether it is easy to test
  3. Whether it is easy to use and maintain
Responsibility DistributionTestabilityUsability
Tight Coupling MVC
Cocoa MVC❌ V and C are coupled✅⭐
MVP✅ Independent view lifecycleAverage: more code
MVVMAverage: View has a dependency on UIKitAverage
VIPER✅⭐️✅⭐️

Tight Coupling MVC

Traditional MVC

For example, in a multi-page web application, when you click a link to navigate to another page, the entire page reloads. The problem with this architecture is that the View is tightly coupled with the Controller and Model.

Cocoa MVC

Cocoa MVC is the architecture recommended by Apple for iOS developers. Theoretically, this architecture allows the Controller to decouple the Model from the View.

Cocoa MVC

However, in practice, Cocoa MVC encourages the use of massive view controllers, ultimately leading to the view controller handling all operations.

Realistic Cocoa MVC

Although testing such tightly coupled massive view controllers is quite difficult, Cocoa MVC performs the best in terms of development speed among the existing options.

MVP

In MVP, the Presenter has no relationship with the lifecycle of the view controller, allowing the view to be easily replaced. We can think of UIViewController as the View.

Variant of MVC

There is another type of MVP: MVP with data binding. As shown in the figure, the View is tightly coupled with the Model and Controller.

MVP

MVVM

MVVM is similar to MVP, but MVVM binds the View to the View Model.

MVVM

VIPER

Unlike the three-layer structure of MV(X), VIPER has a five-layer structure (VIPER View, Interactor, Presenter, Entity, and Routing). This structure allows for good responsibility distribution but has poorer maintainability.

VIPER

Compared to MV(X), VIPER has the following differences:

  1. The logic processing of the Model is transferred to the Interactor, so Entities have no logic and are purely data storage structures.
  2. ==UI-related business logic is handled in the Presenter, while data modification functions are handled in the Interactor==.
  3. VIPER introduces a routing module, Router, to implement inter-module navigation.

How to Stream Video to Mobile Devices Using HTTP? HTTP Live Streaming (HLS)

· One min read

Why Is Such a Protocol Needed?

Mobile video playback services using HTTP Live Streaming encounter the following issues:

  1. ==Limited memory and storage on mobile devices==.
  2. Due to unstable network connections and varying bandwidth, there is a need to ==dynamically adjust video quality during transmission==.

Solutions

  1. Server Side: In a typical setup, encoding hardware receives audio and video input, encodes it into H.264 format video and AAC format audio, and then streams it out in MPEG-2 format.
    1. A software multiplexer then splits the raw output stream into a series of short media files (with lengths possibly around 10 seconds in .ts format).
    2. The multiplexer also maintains an index file (.m3u8 format) that contains a list of all media files.
    3. The generated media files and index file are published on a web server.
  2. Client Side: The client reads the index, sequentially requests the necessary media files from the server, and smoothly plays the content of each short media file.

Architecture

HLS Architecture

Designing Facebook's Photo Storage System

· 2 min read

Why Does Facebook Handle Its Own Photo Storage?

  • Petabyte-scale volume of blob data
  • Traditional NFS-based designs (where each image is stored as a file) face metadata bottlenecks: massive metadata severely limits metadata hit rates.
    • Here are the details:

For photo applications, most metadata, such as image permissions, is useless, wasting storage space. However, the larger overhead is that the metadata of the file must be read from disk into memory to locate the file itself. While this is negligible for small-scale storage, when multiplied by billions of photos and several petabytes of data, accessing metadata becomes a throughput bottleneck.

Solution

By aggregating hundreds of thousands of images into a single Haystack storage file, the metadata burden is eliminated.

Structure

Facebook Photo Storage Architecture

Data Layout

Index file (for quick memory loading) + Haystack storage file containing many images.

Index file layout

index file layout 1

index file layout 2

Storage file

haystack store file

CRUD Operations

  • Create: Write to the storage file, then ==asynchronously== write to the index file, as indexing is not a critical step.
  • Delete: Perform soft deletes by marking the deleted bits in a flag field. Execute hard deletes through compacting operations.
  • Update: During updates, only append (append-only); if a duplicate key is encountered, the application can choose to update and read the key with the maximum offset.
  • Read: Read operations (offset, key, backup key, cookie, and data size)

Use Cases

Upload

Photo Storage Upload

Download

Photo Storage Download

iOS Architecture Patterns Revisited

· 3 min read

Why bother with architecture?

Answer: for reducing human resources costs per feature.

Mobile developers evaluate the architecture in three dimensions.

  1. Balanced distribution of responsibilities among feature actors.
  2. Testability
  3. Ease of use and maintainability
Distribution of ResponsibilityTestabilityEase of Use
Tight-coupling MVC
Cocoa MVC❌ VC are coupled✅⭐
MVP✅ Separated View LifecycleFair: more code
MVVMFair: because of View's UIKit dependantFair
VIPER✅⭐️✅⭐️

Tight-coupling MVC

Traditional MVC

For example, in a multi-page web application, page completely reloaded once you press on the link to navigate somewhere else. The problem is that the View is tightly coupled with both Controller and Model.

Cocoa MVC

Apple’s MVC, in theory, decouples View from Model via Controller.

Cocoa MVC

Apple’s MVC in reality encourages ==massive view controllers==. And the view controller ends up doing everything.

Realistic Cocoa MVC

It is hard to test coupled massive view controllers. However, Cocoa MVC is the best architectural pattern regarding the speed of the development.

MVP

In an MVP, Presenter has nothing to do with the life cycle of the view controller, and the View can be mocked easily. We can say the UIViewController is actually the View.

MVC Variant

There is another kind of MVP: the one with data bindings. And as you can see, there is tight coupling between View and the other two.

MVP

MVVM

It is similar to MVP but binding is between View and View Model.

MVVM

VIPER

There are five layers (VIPER View, Interactor, Presenter, Entity, and Routing) instead of three when compared to MV(X). This distributes responsibilities well but the maintainability is bad.

VIPER

When compared to MV(X), VIPER

  1. Model logic is shifted to Interactor and Entities are left as dumb data structures.
  2. ==UI related business logic is placed into Presenter, while the data altering capabilities are placed into Interactor==.
  3. It introduces Router for the navigation responsibility.

Key value cache

· 3 min read

KV cache is like a giant hash map and used to reduce the latency of data access, typically by

  1. Putting data from slow and cheap media to fast and expensive ones.
  2. Indexing from tree-based data structures of O(log n) to hash-based ones of O(1) to read and write

There are various cache policies like read-through/write-through(or write-back), and cache-aside. By and large, Internet services have a read to write ratio of 100:1 to 1000:1, so we usually optimize for read.

In distributed systems, we choose those policies according to the business requirements and contexts, under the guidance of CAP theorem.

Regular Patterns

  • Read
    • Read-through: the clients read data from the database via the cache layer. The cache returns when the read hits the cache; otherwise, it fetches data from the database, caches it, and then return the vale.
  • Write
    • Write-through: clients write to the cache and the cache updates the database. The cache returns when it finishes the database write.
    • Write-behind / write-back: clients write to the cache, and the cache returns immediately. Behind the cache write, the cache asynchronously writes to the database.
    • Write-around: clients write to the database directly, around the cache.

Cache-aside pattern

When a cache does not support native read-through and write-through operations, and the resource demand is unpredictable, we use this cache-aside pattern.

==There are still chances for dirty cache in this pattern.== It happens when these two cases are met in a racing condition:

  1. read database and update cache
  2. update database and delete cache

Where to put the cache?

  • client-side
  • distinct layer
  • server-side

What if data volume reaches the cache capacity? Use cache replacement policies

  • LRU(Least Recently Used): check time, and evict the most recently used entries and keep the most recently used ones.
  • LFU(Least Frequently Used): check frequency, and evict the most frequently used entries and keep the most frequently used ones.
  • ARC(Adaptive replacement cache): it has a better performance than LRU. It is achieved by keeping both the most frequently and frequently used entries, as well as a history for eviction. (Keeping MRU+MFU+eviction history.)

Who are the King of the cache usage?

Facebook TAO