Skip to main content

Mark Zuckerberg is Building a Western Version of WeChat

· One min read

As a highly profitable company, Facebook's operating profit margin reaches 42%.

Operating profit margin = Operating income / Net sales
Operating income = Total revenue - (Operating expenses + Depreciation and amortization)

As it transitions to a privacy-centric super app, it will face three challenges.

  1. Technology. How to bridge the gap between apps like WhatsApp and Instagram, turning them into a unified platform?
  2. Economics.
    • In China, there is no dominant app store, allowing WeChat to grow as the preferred platform. However, in the U.S., there are already Apple and Google.
    • ==WeChat is not a cash cow.== Targeting privacy-conscious users for segmented advertising is challenging.
  3. Privacy and competition.
    • No country wants a single company to monopolize the internet.
    • Social networks + private messaging = Windows OS + IE

Introduction to Architecture

· 3 min read

What is architecture?

Architecture is the shape of the software system. Thinking it as a big picture of physical buildings.

  • paradigms are bricks.
  • design principles are rooms.
  • components are buildings.

Together they serve a specific purpose like a hospital is for curing patients and a school is for educating students.

Why do we need architecture?

Behavior vs. Structure

Every software system provides two different values to the stakeholders: behavior and structure. Software developers are responsible for ensuring that both those values remain high.

==Software architects are, by virtue of their job description, more focused on the structure of the system than on its features and functions.==

Ultimate Goal - ==saving human resources costs per feature==

Architecture serves the full lifecycle of the software system to make it easy to understand, develop, test, deploy, and operate. The goal is to minimize the human resources costs per business use-case.

The O’Reilly book Software Architecture Patterns by Mark Richards is a simple but effective introduction to these five fundamental architectures.

1. Layered Architecture

The layered architecture is the most common in adoption, well-known among developers, and hence the de facto standard for applications. If you do not know what architecture to use, use it.

Examples

  • TCP / IP Model: Application layer > transport layer > internet layer > network access layer
  • Facebook TAO: web layer > cache layer (follower + leader) > database layer

Pros and Cons

  • Pros
    • ease of use
    • separation of responsibility
    • testability
  • Cons
    • monolithic
      • hard to adjust, extend or update. You have to make changes to all the layers.

2. Event-Driven Architecture

A state change will emit an event to the system. All the components communicate with each other through events.

A simple project can combine the mediator, event queue, and channel. Then we get a simplified architecture:

Examples

  • QT: Signals and Slots
  • Payment Infrastructure: Bank gateways usually have very high latencies, so they adopt async technologies in their architecture design.

3. Micro-kernel Architecture (aka Plug-in Architecture)

The software's responsibilities are divided into one "core" and multiple "plugins". The core contains the bare minimum functionality. Plugins are independent of each other and implement shared interfaces to achieve different goals.

Examples

  • Visual Studio Code, Eclipse
  • MINIX operating system

4. Microservices Architecture

A massive system is decoupled to multiple micro-services, each of which is a separately deployed unit, and they communicate with each other via RPCs.

uber architecture

Examples

5. Space-based Architecture

This pattern gets its name from "tuple space", which means “distributed shared memory". There is no database or synchronous database access, and thus no database bottleneck. All the processing units share the replicated application data in memory. These processing units can be started up and shut down elastically.

Examples: See Wikipedia

  • Mostly adopted among Java users: e.g., JavaSpaces

Mark Zuckerberg is building WeChat for the West

· One min read

Facebook is a very profitable company. Its operating margins = 42%

operating margin = operating income / net sales
operating income = gross income − (operating expenses + depreciation and amortization)

When it is transiting into a privacy-centric super app, there are three challenges.

  1. Technical. How to bridge apps like WhatsApp and Instagram when turning them into a uniform platform?
  2. Economic.
    • China has no dominant app stores so that WeChat grows to the platform of choice. However, in the US, there are Apple and Google.
    • ==WeChat is no cash cow.== It's hard to micro-target ads against privacy-preserved users.
  3. Privacy and competition.
    • No country wants one firm to monopolize the Internet.
    • Social network + private messaging = Windows OS + IE

The Problem with Tech Unicorns

· One min read
  1. Millions of users love the brands and leaders of those unicorns. Those tech stars have everything - except a path to high profits.

  2. In the past 25 years, Three things changed.

    1. Growing fast became more accessible thanks to cloud computing, smartphones, and social media.
    2. Low interest rates left investors chasing returns.
    3. Superstar firms (e.g. Google, Facebook, Alibaba, and Tencent), proved that wealth is made by
      • huge markets, high profits, and natural monopolies
      • limited physical assets and light regulation
  3. Because the unicorns’ markets are contested, margins have not consistently improved, despite fast-rising sales.

  4. The blitzscale philosophy of buying customers at any price is peaking. After the unicorns, a new and more convincing species of startup will have to be engineered.

Alas! Andrew Grove says - Success breeds complacency. Complacency breeds failure. Only the paranoid (who embrace change) survive.

Toutiao Recommendation System: P2 Content Analysis

· 3 min read

In Toutiao Recommendation System: P1 Overview, we know that content analysis and data mining of user tags are the cornerstones of the recommendation system.

What is the content analysis?

content analysis = derive intermediate data from raw articles and user behaviors.

Take articles for example. To model user interests, we need to tag contents and articles. To associate a user with the interests of the “Internet” tag, we need to know whether a user reads an article with the “Internet” tag.

Why are we analyzing those raw data?

We do it for the reason of …

  1. Tagging users (user profile)
    • Tagging users who liked articles with “Internet” tag. Tagging users who liked articles with “xiaomi” tag.
  2. Recommending contents to users by tags
    • Pushing “meizu” contents to users with “meizu” tag. Pushing “dota” contents to users with “dota” tag.
  3. Preparing contents by topics
    • Put “Bundesliga” articles to “Bundesliga topic”. Put “diet” articles to “diet topic”.

Case Study: Analysis Result of an Article

Here is an example of “article features” page. There are article features like categorizations, keywords, topics, entities.

Analysis Result of an Article

Analysis Result of an Article: Details

What are the article features?

  1. Semantic Tags: Human predefine those tags with explicit meanings.

  2. Implicit Semantics, including topics and keywords. Topic features are describing the statistics of words. Certain rules generate keywords.

  3. Similarity. Duplicate recommendation once to be the most severe feedbacks we get from our customers.

  4. Time and location.

  5. Quality. Abusing, porn, ads, or “chicken soup for the soul”?

Article features are important

  • It is not true that a recommendation system cannot work at all without article features. Amazon, Walmart, Netflix can recommend by collaborative filtering.
  • However, in news product, users consume contents of the same day. Bootstrapping without article features is hard. Collaborative filtering cannot help with bootstrapping.
    • The finer of the granularity of the article feature, the better the ability to bootstrap.

More on Semantic Tags

We divide features of semantic tags into three levels:

  1. Categorizations: used in the user profile, filtering contents in topics, recommend recall, recommend features
  2. Concepts: used in filtering contents in topics, searching tags, recommend recall(like)
  3. Entities: used in filtering contents in topics, searching tags, recommend recall(like)

Why dividing into different levels? We do this so that they can capture articles in different granularities.

  1. Categorizations: full in coverage, low in accuracy.
  2. Concepts: medium in coverage, medium in accuracy.
  3. Entities: low in coverage, high in accuracy. It only covers hot people, organizations, products in each area.

Categorizations and concepts are sharing the same technical infrastructure.

Why do we need semantic tags?

  • Implicit semantics
    • have been functioning well.
    • cost much less than semantic tags.
  • But, topics and interests need a clear-defined tagging system.
  • Semantic tags also evaluate the capability in NPL technology of a company.

Document classification

Classification hierarchy

  1. Root
  2. Science, sports, finance, entertainment
  3. Football, tennis, table tennis, track and field, swimming
  4. International, domestic
  5. Team A, team B

Classifiers:

  • SVM
  • SVM + CNN
  • SVM + CNN + RNN

Calculating relevance

  1. Lexical analysis for articles
  2. Filtering keywords
  3. Disambiguation
  4. Calculating relevance

How to Motivate Employees?

· 2 min read

Motivation and incentives are at the core of performance management. Without motivation, employees lack the drive to perform well, making all feedback and training efforts futile.

The Respect from Leaders is Correlated with Employee Motivation

Offensive behavior can directly undermine employee motivation and performance, so managers need to curb such behavior by:

  1. Leading by example.
  2. Upholding employees' dignity. Public praise, private criticism.
  3. Hiring respectful employees and not tolerating bad behavior. Address feedback issues promptly.

Incentives Primarily Come from Two Aspects: Extrinsic and Intrinsic

  1. Extrinsic rewards—money (promotions, raises, bonuses)

    1. These rewards do not necessarily enhance employee performance.
    2. Their effects are usually short-lived.
    3. It is often difficult to distinguish individual contributions within a team, and what constitutes an appropriate reward varies for everyone. In fact, most employees' primary concern is fairness; when providing monetary rewards, it is crucial to ensure fairness and consistency.
  2. Intrinsic rewards—satisfaction (a sense of achievement, control, appreciation, intellectual growth, skill enhancement, autonomy, and overcoming challenges)

    1. It is essential to note that these rewards should be tailored to the individual.

How to Provide Intrinsic Rewards?

  1. Recognize their work. "The key to recognition is making people feel unique." If everyone receives the same recognition, no one will feel special.

    1. Different individuals value recognition sources differently. From colleagues? Publicly praise them in front of peers. From clients? Share a thank-you note from a client. From the profession? Award professional accolades. From the boss? Describe their importance to the team vividly during one-on-ones.
    2. Tailor recognition to personality. Introverted or extroverted? Public or private? If unsure, ask them directly.
    3. Recognition frequency should be high, at least once every two weeks.
    4. Handwritten notes are low-cost but highly effective rewards.
  2. Provide decision-making authority.

    1. People enjoy having a sense of ownership and control.
  3. Offer challenges.

    1. The greater the challenge, the higher the sense of achievement upon completion.
    2. Provide opportunities to undertake tasks they haven't done before, helping them develop new skills. Note that they should have relevant talents and skills, rather than starting from scratch.

Dongxu Huang: Building a Database Startup in China

· 4 min read

Company

  • Established for four years, with the first two years spent writing code; in the last year and a half, one or two hundred companies have started using it.
  • Infra has a significant advantage in China due to the large market; companies adopt things quickly and aggressively, allowing good infra products to be utilized rapidly.
  • None of the co-founders have database experience.
  • The open-source model is the future; if it's closed-source, negotiations have to be done one by one.
  • The most important business decision: it's not about how advanced the algorithms are or how strong the team is; the key to the moat lies in 1) community 2) MySQL interface (leveraging the MySQL community is crucial, and SQL support is important, as even Kafka and Spark support SQL).

TiDB Database Principles

The more advanced the engineer, the more they prefer high performance, believing that faster is better. However, TiDB's primary goal is not to be the fastest but to achieve availability, reliability, stability, I/O, and infinite scalability. The cost of high performance is too high and should be optimized based on the user's hardware. However, this is a general-purpose database, so optimizing for various scenarios is not feasible. Keep it simple for users and complex for ourselves (contrary to AWS's approach).

They believe eventual consistency is a pseudo-concept; in reality, it means no consistency or weak consistency. For example, with Cassandra, once WRN is set up, how long does it take to resolve? It's uncertain. It's too complex; users should ideally not have to worry about these complicated settings.

Benchmark scores are not the only metric; high TPCC/TPCH scores do not provide much practical guidance. Databases are refined through use. For instance, the first customer, a gaming company, created an astonishing 30,000 tables, and the JSON metadata was very slow to connect.

The architecture is not P2P; different roles are clearly defined.

KV uses RocksDB, but the typical write amplification of LSM trees is 15 times; here, it has been optimized by extracting values to solve the write amplification issue.

Supports MySQL clients and also supports reading from SparkSQL.

The SQL layer does not use MySQL modules. Initially, there was an attempt, but 1) it was challenging to distribute 2) the code was too poor and difficult to modify. If modified drastically, it could take six months; however, the long-term maintenance cost would be higher. Redoing is not troublesome; they have already refactored three times. Using Go is more convenient for refactoring than using C/C++/Rust.

Initially, they only wanted to do F1 and SQL, collaborating with CockroachDB. Later, as they moved towards SQL, they had to focus on storage.

They hired two members from the Rust core team. Rust is challenging to recruit for; typically, they hire C++ developers and then transition them to Rust, as they find that many compiler conventions are resolved for them.

Do not underestimate the difficulty of creating industrial-grade solutions. Using gRPC, Raft, and RocksDB means that if there are new developments in the industry, users will directly benefit.

Chunk (region) splitting took two months, while merging took three years. Merging has undergone formal verification.

Why now?

  1. Hardware
  2. Hot/cold data -> warm data
  3. Log is the new database

Everything is pluggable. The top-level API remains unchanged, while the underlying components are plug-and-play and replaceable.

Distributed Transactions

  • 2PC is the only option
  • Challenges: reduce round-trips

Multi-tenancy achieved through Kubernetes

China Biz Trend

  1. Chinese speed; once it's said, it must be done.
  2. Higher expectations for new technologies to empower businesses. Companies in second and third-tier cities or non-BAT firms are using new technologies to compete with giants. They must be able to alleviate users' technical anxieties.
  3. The talent pool for foundational software is gradually strengthening. P's production capacity CAP has contributed significantly to technical content marketing.
  4. Some core scenarios (core banking systems) dare to use domestic technologies.
  5. PingCAP's path: open source (internet/community) < - > commercialization.

Stream and Batch Processing Frameworks

· 3 min read

Why Do We Need Such Frameworks?

  • To process more data in a shorter amount of time.
  • To unify fault tolerance in distributed systems.
  • To simplify task abstractions to meet changing business requirements.
  • Suitable for bounded datasets (batch processing) and unbounded datasets (stream processing).

Brief History of Batch and Stream Processing Development

  1. Hadoop and MapReduce. Google made batch processing as simple as MapReduce result = pairs.map((pair) => (morePairs)).reduce(somePairs => lessPairs) in a distributed system.
  2. Apache Storm and directed graph topologies. MapReduce does not represent iterative algorithms well. Therefore, Nathan Marz abstracted stream processing into a graph structure composed of spouts and bolts.
  3. Spark in-memory computation. Reynold Xin pointed out that Spark uses ten times fewer machines than Hadoop while being three times faster when processing the same data.
  4. Google Dataflow based on Millwheel and FlumeJava. Google uses a windowed API to support both batch and stream processing simultaneously.
  1. Flink quickly adopted the programming model of ==Google Dataflow== and Apache Beam.
  2. Flink's efficient implementation of the Chandy-Lamport checkpointing algorithm.

These Frameworks

Architecture Choices

To meet the above demands with commercial machines, there are several popular distributed system architectures...

  • Master-slave (centralized): Apache Storm + Zookeeper, Apache Samza + YARN
  • P2P (decentralized): Apache S4

Features

  1. DAG Topology for iterative processing - for example, GraphX in Spark, topologies in Apache Storm, DataStream API in Flink.
  2. Delivery Guarantees. How to ensure the reliability of data delivery between nodes? At least once / at most once / exactly once.
  3. Fault Tolerance. Implement fault tolerance using cold/warm/hot standby, checkpointing, or active-active.
  4. Windowed API for unbounded datasets. For example, streaming windows in Apache. Window functions in Spark. Windowing in Apache Beam.

Comparison Table of Different Architectures

ArchitectureStormStorm-tridentSparkFlink
ModelNativeMicro-batchMicro-batchNative
GuaranteesAt least onceExactly onceExactly onceExactly once
Fault ToleranceRecord AckRecord AckCheckpointCheckpoint
Maximum Fault ToleranceHighMediumMediumLow
LatencyVery lowHighHighLow
ThroughputLowMediumHighHigh

Toutiao Recommendation System: P1 Overview

· 6 min read

What are we optimizing for? User Satisfaction

We are finding the best function below to maximize user satisfaction .

user satisfaction = function(content, user profile, context)
  1. Content: features of articles, videos, UGC short videos, Q&As, etc.
  2. User profile: interests, occupation, age, gender and behavior patterns, etc.
  3. Context: Mobile users in contexts of workspace, commuting, traveling, etc.

How to evaluate the satisfaction?

  1. Measurable Goals, e.g.

    • click through rate
    • Session duration
    • upvotes
    • comments
    • reposts
  2. Hard-to-measurable Goals:

    • Frequency control of ads and special-typed contents (Q&A)
    • Frequency control of vulgar content
    • Reducing clickbait, low quality, disgusting content
    • Enforcing / pining / highly-weighting important news
    • Lowly-weighting contents from low-level accounts

How to optimize for those goals? Machine Learning Models

It is a typical supervised machine learning problem to find the best function above. To implement the system, we have these algorithms:

  1. Collaborative Filtering
  2. Logistic Regression
  3. DNN
  4. Factorization Machine
  5. GBDT

A world-class recommendation system is supposed to have the flexibility to A/B-test and combine multiple algorithms above. It is now popular to combine LR and DNN. Facebook used both LR and GBDT years ago.

How do models observe and measure the reality? Feature engineering

  1. Correlation, between content’s characteristic and user’s interest. Explicit correlations include keywords, categories, sources, genres. Implicit correlations can be extract from user’s vector or item’s vector from models like FM.

  2. Environmental features such as geo location, time. It’s can be used as bias or building correlation on top of it.

  3. Hot trend. There are global hot trend, categorical hot trend, topic hot trend and keyword hot trend. Hot trend is very useful to solve cold-start issue when we have little information about user.

  4. Collaborative features, which helps avoid situation where recommended content get more and more concentrated. Collaborative filtering is not analysing each user’s history separately, but finding users’ similarity based on their behaviour by clicks, interests, topics, keywords or event implicit vectors. By finding similar users, it can expand the diversity of recommended content.

Large-scale Training in Realtime

  • Users like to see news feed updated in realtime according to what we track from their actions.
  • Use Apache storm to train data (clicks, impressions, faves, shares) in realtime.
  • Collect data to a threshold and then update to the recommendation model
  • Store model parameters , like tens of billions of raw features and billions of vector features, in high performance computing clusters.

They are implemented in the following steps:

  1. Online services record features in realtime.
  2. Write data into Kafka
  3. Ingest data from Kafka to Storm
  4. Populate full user profiles and prepare samples
  5. Update model parameters according to the latest samples
  6. Online modeling gains new knowledge

How to further reducing the latency? Recall Strategy

It is impossible to predict all the things with the model, considering the super-large scale of all the contents. Therefore, we need recall strategies to focus on a representative subset of the data. Performance is critical here and timeout is 50ms.

recall strategy

Among all the recall strategies, we take the InvertedIndex<Key, List<Article>> .

The Key can be topic, entity, source, etc.

Tags of InterestsRelevanceList of Documents
E-commerce0.3
Fun0.2
History0.2
Military0.1

Data Dependencies

  • Features depends on tags of user-side and content-side.
  • recall strategy depends on tags of user-side and content-side.
  • content analysis and data mining of user tags are cornerstone of the recommendation system.

What is the content analysis?

content analysis = derive intermediate data from raw articles and user behaviors.

Take articles for example. To model user interests, we need to tag contents and articles. To associate a user with the interests of the “Internet” tag, we need to know whether a user reads an article with the “Internet” tag.

Why are we analyzing those raw data?

We do it for the reason of …

  1. Tagging users (user profile)
    • Tagging users who liked articles with “Internet” tag. Tagging users who liked articles with “xiaomi” tag.
  2. Recommending contents to users by tags
    • Pushing “meizu” contents to users with “meizu” tag. Pushing “dota” contents to users with “dota” tag.
  3. Preparing contents by topics
    • Put “Bundesliga” articles to “Bundesliga topic”. Put “diet” articles to “diet topic”.

Case Study: Analysis Result of an Article

Here is an example of “article features” page. There are article features like categorizations, keywords, topics, entities.

Analysis Result of an Article

Analysis Result of an Article: Details

What are the article features?

  1. Semantic Tags: Human predefine those tags with explicit meanings.

  2. Implicit Semantics, including topics and keywords. Topic features are describing the statistics of words. Certain rules generate keywords.

  3. Similarity. Duplicate recommendation once to be the most severe feedbacks we get from our customers.

  4. Time and location.

  5. Quality. Abusing, porn, ads, or “chicken soup for the soul”?

Article features are important

  • It is not true that a recommendation system cannot work at all without article features. Amazon, Walmart, Netflix can recommend by collaborative filtering.
  • However, in news product, users consume contents of the same day. Bootstrapping without article features is hard. Collaborative filtering cannot help with bootstrapping.
    • The finer of the granularity of the article feature, the better the ability to bootstrap.

More on Semantic Tags

We divide features of semantic tags into three levels:

  1. Categorizations: used in the user profile, filtering contents in topics, recommend recall, recommend features
  2. Concepts: used in filtering contents in topics, searching tags, recommend recall(like)
  3. Entities: used in filtering contents in topics, searching tags, recommend recall(like)

Why dividing into different levels? We do this so that they can capture articles in different granularities.

  1. Categorizations: full in coverage, low in accuracy.
  2. Concepts: medium in coverage, medium in accuracy.
  3. Entities: low in coverage, high in accuracy. It only covers hot people, organizations, products in each area.

Categorizations and concepts are sharing the same technical infrastructure.

Why do we need semantic tags?

  • Implicit semantics
    • have been functioning well.
    • cost much less than semantic tags.
  • But, topics and interests need a clear-defined tagging system.
  • Semantic tags also evaluate the capability in NPL technology of a company.

Document classification

Classification hierarchy

  1. Root
  2. Science, sports, finance, entertainment
  3. Football, tennis, table tennis, track and field, swimming
  4. International, domestic
  5. Team A, team B

Classifiers:

  • SVM
  • SVM + CNN
  • SVM + CNN + RNN

Calculating relevance

  1. Lexical analysis for articles
  2. Filtering keywords
  3. Disambiguation
  4. Calculating relevance

Toutiao Recommendation System: P1 Overview

· 4 min read

What are we optimizing for? User Satisfaction

We are finding the best function below to maximize user satisfaction .

user satisfaction = function(content, user profile, context)
  1. Content: features of articles, videos, UGC short videos, Q&As, etc.
  2. User profile: interests, occupation, age, gender and behavior patterns, etc.
  3. Context: Mobile users in contexts of workspace, commuting, traveling, etc.

How to evaluate the satisfaction?

  1. Measurable Goals, e.g.

    • click through rate
    • Session duration
    • upvotes
    • comments
    • reposts
  2. Hard-to-measurable Goals:

    • Frequency control of ads and special-typed contents (Q&A)
    • Frequency control of vulgar content
    • Reducing clickbait, low quality, disgusting content
    • Enforcing / pining / highly-weighting important news
    • Lowly-weighting contents from low-level accounts

How to optimize for those goals? Machine Learning Models

It is a typical supervised machine learning problem to find the best function above. To implement the system, we have these algorithms:

  1. Collaborative Filtering
  2. Logistic Regression
  3. DNN
  4. Factorization Machine
  5. GBDT

A world-class recommendation system is supposed to have the flexibility to A/B-test and combine multiple algorithms above. It is now popular to combine LR and DNN. Facebook used both LR and GBDT years ago.

How do models observe and measure the reality? Feature engineering

  1. Correlation, between content’s characteristic and user’s interest. Explicit correlations include keywords, categories, sources, genres. Implicit correlations can be extract from user’s vector or item’s vector from models like FM.

  2. Environmental features such as geo location, time. It’s can be used as bias or building correlation on top of it.

  3. Hot trend. There are global hot trend, categorical hot trend, topic hot trend and keyword hot trend. Hot trend is very useful to solve cold-start issue when we have little information about user.

  4. Collaborative features, which helps avoid situation where recommended content get more and more concentrated. Collaborative filtering is not analysing each user’s history separately, but finding users’ similarity based on their behaviour by clicks, interests, topics, keywords or event implicit vectors. By finding similar users, it can expand the diversity of recommended content.

Large-scale Training in Realtime

  • Users like to see news feed updated in realtime according to what we track from their actions.
  • Use Apache storm to train data (clicks, impressions, faves, shares) in realtime.
  • Collect data to a threshold and then update to the recommendation model
  • Store model parameters , like tens of billions of raw features and billions of vector features, in high performance computing clusters.

They are implemented in the following steps:

  1. Online services record features in realtime.
  2. Write data into Kafka
  3. Ingest data from Kafka to Storm
  4. Populate full user profiles and prepare samples
  5. Update model parameters according to the latest samples
  6. Online modeling gains new knowledge

How to further reducing the latency? Recall Strategy

It is impossible to predict all the things with the model, considering the super-large scale of all the contents. Therefore, we need recall strategies to focus on a representative subset of the data. Performance is critical here and timeout is 50ms.

recall strategy

Among all the recall strategies, we take the InvertedIndex<Key, List<Article>> .

The Key can be topic, entity, source, etc.

Tags of InterestsRelevanceList of Documents
E-commerce0.3
Fun0.2
History0.2
Military0.1

Data Dependencies

  • Features depends on tags of user-side and content-side.
  • recall strategy depends on tags of user-side and content-side.
  • content analysis and data mining of user tags are cornerstone of the recommendation system.