In Toutiao Recommendation System: P1 Overview, we know that content analysis and data mining of user tags are the cornerstones of the recommendation system.
content analysis = derive intermediate data from raw articles and user behaviors.
Take articles for example. To model user interests, we need to tag contents and articles. To associate a user with the interests of the “Internet” tag, we need to know whether a user reads an article with the “Internet” tag.
We do it for the reason of …
Here is an example of “article features” page. There are article features like categorizations, keywords, topics, entities.
What are the article features?
Semantic Tags: Human predefine those tags with explicit meanings.
Implicit Semantics, including topics and keywords. Topic features are describing the statistics of words. Certain rules generate keywords.
Similarity. Duplicate recommendation once to be the most severe feedbacks we get from our customers.
Time and location.
Quality. Abusing, porn, ads, or “chicken soup for the soul”?
We divide features of semantic tags are into three levels:
Why dividing into different levels? We do this so that they can capture articles in different granularities.
Categorizations and concepts are sharing the same technical infrastructure.
Why do we need semantic tags?
We are finding the best
function below to maximize
user satisfaction .
user satisfaction = function(content, user profile, context)
Measurable Goals, e.g.
It is a typical supervised machine learning problem to find the best
function above. To implement the system, we have these algorithms:
A world-class recommendation system is supposed to have the flexibility to A/B-test and combine multiple algorithms above. It is now popular to combine LR and DNN. Facebook used both LR and GBDT years ago.
Correlation, between content’s characteristic and user’s interest. Explicit correlations include keywords, categories, sources, genres. Implicit correlations can be extract from user’s vector or item’s vector from models like FM.
Environmental features such as geo location, time. It’s can be used as bias or building correlation on top of it.
Hot trend. There are global hot trend, categorical hot trend, topic hot trend and keyword hot trend. Hot trend is very useful to solve cold-start issue when we have little information about user.
Collaborative features, which helps avoid situation where recommended content get more and more concentrated. Collaborative filtering is not analysing each user’s history separately, but finding users’ similarity based on their behaviour by clicks, interests, topics, keywords or event implicit vectors. By finding similar users, it can expand the diversity of recommended content.
They are implemented in the following steps:
It is impossible to predict all the things with the model, considering the super-large scale of all the contents. Therefore, we need recall strategies to focus on a representative subset of the data. Performance is critical here and timeout is 50ms.
Among all the recall strategies, we take the
Key can be topic, entity, source, etc.
|Tags of Interests||Relevance||List of Documents|
result = pairs.map((pair) => (morePairs)).reduce(somePairs => lessPairs)in a distributed system.
To serve requirements above with commodity machines, the steaming framework use distributed systems in these architectures…
|Overhead of fault-tolerance||high||medium||medium||low|
Leveraging user and device data during user login to fight against
ATOs ranking from easy to hard to detect
Semi-supervised learning = unlabeled data + small amount of labeled data
Why? better learning accuracy than unsupervised learning + less time and costs than supervised learning
Andy Grove emphasizes that a manager’s most important responsibility is to elicit top performance from his subordinates..
Unfortunately, one management style does not fit all the people in all the scenarios. A fundamental variable to find the best management style is task-relevant maturity (TRM) of the subordinates.
|TRM||Effective Management Style|
|low||structured; task-oriented; detailed-oriented; instruct exactly “what/when/how mode”|
|medium||Individual-oriented; support, “mutual-reasoning mode”|
|high||goal-oriented; monitoring mode|
A person’s TRM depends on the specific work items. It takes time to improve. When TRM reaches the highest level, the person’s both knowledge-level and motivation are ready for her manager to delegate work.
The key here is to regard any management mode not as either good or bad but rather as effective or not effective.