When you know where you should go, it is too late to go there; if you always keep your original path, you will miss the road to the future.

Charles Handy makes an analogy as his road to Davy’s Bar. Turn right and go up the hill when there is half a mile to the Davy’s Bar. However, when he realized he was on the wrong way, he arrived at Davy’s Bar already.

The growth curve is usually in an “S” shape, and we call it S-curve or sigmoid curve. To keep the overall growth rate high, you have to develop your second S-curve before it is too late to invest your time and resources.

Intel’s CPU, Netflix’s video streaming, Nintendo’s gaming, Microsoft’s cloud are all excellent examples of the second-curve-driving businesses.

How to find and catch the second curve takes vision and execution. You have to input more information and continuously sort them to identify the best opportunities. And then, once a chance identified, you need a reliable team to fight the battle and figure out whether it really works.

What makes you succeed may not make you succeed again. There is always a limit to growth. The second curve theory helps us reflect on why and how to embrace the change and live a more thriving life.


Internet-scale web services deal with high-volume traffic from the whole world. However, one server could only serve a limited amount of requests at the same time. Consequently, there is usually a server farm or a large cluster of servers to undertake the traffic altogether. Here comes the question: how to route them so that each host could evenly receive and process the request?

Since there are many hops and layers of load balancers from the user to the server, specifically speaking, this time our design requirements are

Note: If Service A depends on (or consumes) Service B, then A is downstream service of B, and B is upstream service of A.


Why is it hard to balance loads? The answer is that it is hard to collect accurate load distribution stats and act accordingly.

Distributing-by-requests ≠ distributing-by-load

Random and round-robin distribute the traffic by requests. However, the actual load is not per request - some are heavy in CPU or thread utilization, while some are lightweight.

To be more accurate on the load, load balancers have to maintain local states of observed active request number, connection number, or request process latencies for each backend server. And based on them, we can use distribution algorithms like Least-connections, least-time, and Random N choices:

Least-connections: a request is passed to the server with the least number of active connections.

latency-based (least-time): a request is passed to the server with the least average response time and least number of active connections, taking into account weights of servers.

However, these two algorithms work well only with only one load balancer. If there are multiple ones, there might have herd effect. That is to say; all the load balancers notice that one service is momentarily faster, and then all send requests to that service.

Random N choices (where N=2 in most cases / a.k.a Power of Two Choices): pick two at random and chose the better option of the two, avoiding the worse choice.

Distributed environments.

Local LB is unaware of global downstream and upstream states, including

  • upstream service loads
  • upstream service may be super large, and thus it is hard to pick the right subset to cover with the load balancer
  • downstream service loads
  • the processing time of various requests are hard to predict


There are three options to collect load the stats accurately and then act accordingly:

  • centralized & dynamic controller
  • distributed but with shared states
  • piggybacking server-side information in response messages or active probing

Dropbox Bandaid team chose the third option because it fits into their existing random N choices approach well.

However, instead of using local states, like the original random N choices do, they use real-time global information from the backend servers via the response headers.

Server utilization: Backend servers are configured with a max capacity and count the on-going requests, and then they have utilization percentage calculated ranging from 0.0 to 1.0.

There are two problems to consider:

  1. Handling HTTP errors: If a server fast fails requests, it attracts more traffic and fails more.
  2. Stats decay: If a server’s load is too high, no requests will be distributed there and hence the server gets stuck. They use a decay function of the inverted sigmoid curve to solve the problem.

Results: requests are more balanced

Concurrency Models

10317 2019-10-16 14:04

  • Single-threaded - Callbacks, Promises, Observables and async/await: vanilla JS
  • threading/multiprocessing, lock-based concurrency
    • protecting critical section vs. performance
  • Communicating Sequential Processes (CSP)
    • Golang or Clojure’s core.async.
    • process/thread passes data through channels.
  • Actor Model (AM): Elixir, Erlang, Scala
    • asynchronous by nature, and have location transparency that spans runtimes and machines - if you have a reference (Akka) or PID (Erlang) of an actor, you can message it via mailboxes.
    • powerful fault tolerance by organizing actors into a supervision hierarchy, and you can handle failures at its exact level of hierarchy.
  • Software Transactional Memory (STM): Clojure, Haskell
    • like MVCC or pure functions: commit / abort / retry


  • realtime / low-latency typeahead and autocomplete service for social networks, like Linkedin or Facebook
  • search social profiles with prefixes
  • newly added account appear instantly in the scope of the search
  • not for “query autocomplete” (like the Google search-box dropdown), but for displaying actual search results, including
    • generic typeahead: network-agnostic results from a global ranking scheme like popularity.
    • network typeahead: results from user’s 1st and 2nd-degree network connections, and People You May Know scores.

Linkedin Search


Multi-layer architecture

  • browser cache
  • web tier
  • result aggregator
  • various typeahead backend

Cleo Architecture

Result Aggregator

The abstraction of this problem is to find documents by prefixes and terms in a very large number of elements. The solution leverages these four major data structures:

  1. InvertedIndex<prefixes or terms, documents>: given any prefix, find all the document ids that contain the prefix.
  2. for each document, prepare a BloomFilter<prefixes or terms>: with user typing more, we can quickly filter out documents that do not contain the latest prefixes or terms, by check with their bloom filters.
  3. ForwardIndex<documents, prefixes or terms>: previous bloom filter may return false positives, and now we query the actual documents to reject them.
  4. scorer(document):relevance: Each partition return all of its true hits and scores. And then we aggregate and rank.


  • generic typeahead: latency <= 1 ms within a cluster
  • network typeahead (very-large dataset over 1st and 2nd degree network): latency <= 15 ms
  • aggregator: latency <= 25 ms

Acquisition Efficiency Problem:How to achieve a better ROI in advertising?

In details, Lyft’s advertisements should meet requirements as below:

  1. being able to manage region-specific ad campaigns
  2. guided by data-driven growth: The growth must be scalable, measurable, and predictable
  3. supporting Lyft’s unique growth model as shown below

lyft growth model

However, the biggest challenge is to manage all the processes of cross-region marketing at scale, which include choosing bids, budgets, creatives, incentives, and audiences, running A/B tests, and so on. You can see what occupies a day in the life of a digital marketer:


We can find out that execution occupies most of the time while analysis, thought as more important, takes much less time. A scaling strategy will enable marketers to concentrate on analysis and decision-making process instead of operational activities.

Solution: Automation

To reduce costs and improve experimental efficiency, we need to

  1. predict the likelihood of a new user to be interested in our product
  2. evaluate effectively and allocate marketing budgets across channels
  3. manage thousands of ad campaigns handily

The marketing performance data flows into the reinforcement-learning system of Lyft: Amundsen

The problems that need to be automated include:

  1. updating bids across search keywords
  2. turning off poor-performing creatives
  3. changing referrals values by market
  4. identifying high-value user segments
  5. sharing strategies across campaigns


Lyft Symphony Architecture

The tech stack includes - Apache Hive, Presto, ML platform, Airflow, 3rd-party APIs, UI.

Main components

Lifetime Value(LTV) forecaster

The lifetime value of a user is an important criterion to measure the efficiency of acquisition channels. The budget is determined together by LTV and the price we are willing to pay in that region.

Our knowledge of a new user is limited. The historical data can help us to predict more accurately as the user interacts with our services.

Initial eigenvalue:


The forecast improves as the historical data of interactivity accumulates:

根据历史记录判断 LTV

Budget allocator

After LTV is predicted, the next is to estimate budgets based on the price. A curve of the form LTV = a * (spend)^b is fit to the data. A degree of randomness will be injected into the cost-curve creation process in order to converge a global optimum.



Bidders are made up of two parts - the tuners and actors. The tuners decide exact channel-specific parameters based on the price. The actors communicate the actual bid to different channels.

Some popular bidding strategies, applied in different channels, are listed as below:



We have to value human experiences in the automation process; otherwise, the quality of the models may be “garbage in, garbage out”. Once saved from laboring tasks, marketers can focus more on understanding users, channels, and the messages they want to convey to audiences, and thus obtain better ad impacts. That’s how Lyft can achieve a higher ROI with less time and efforts.


Startup Engineering
© 2010-2018 Tian
Built with in San Francisco