Skip to main content

Cloud Design Patterns

· 2 min read

Availability patterns

  • Health Endpoint Monitoring: Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals.
  • Queue-Based Load Leveling: Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads.
  • Throttling: Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service.

Data Management patterns

  • Cache-Aside: Load data on demand into a cache from a data store
  • Command and Query Responsibility Segregation: Segregate operations that read data from operations that update data by using separate interfaces.
  • Event Sourcing: Use an append-only store to record the full series of events that describe actions taken on data in a domain.
  • Index Table: Create indexes over the fields in data stores that are frequently referenced by queries.
  • Materialized View: Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for required query operations.
  • Sharding: Divide a data store into a set of horizontal partitions or shards.
  • Static Content Hosting: Deploy static content to a cloud-based storage service that can deliver them directly to the client.

Security Patterns

  • Federated Identity: Delegate authentication to an external identity provider.
  • Gatekeeper: Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them.
  • Valet Key: Use a token or key that provides clients with restricted direct access to a specific resource or service.

A Good Strategy is Unexpected

· One min read

A good strategy is both surprising and reasonable. For example, in 1997, Steve Jobs' turnaround strategy upon returning to Apple involved drastically reducing the product line and focusing on a few profitable products. When asked how to deal with the powerful Wintel alliance, he did not engage in grand strategic speeches or set ambitious growth targets; he simply smiled and said, "I will just wait for the next big thing."

Trying to do everything and believing everything is important is equivalent to believing that nothing is important. Good leaders need to know not only what to do but also what not to do.

Why Amazon made Kindle?

· One min read

Why did Amazon made Kindle?

  • Year 1997: Eberhard asks Amazon's investiment in NuvoMedia's Rocketbook, a ebook prototype that can be held in one hand and with battary lasting 20 hours.
  • Year 2003 - External Reason: Learning from iTunes and iPod.
  • Year 2004 - Internal Reason: Innovator's delimma - great companies fail not because they want to avoid disruptive change but because they are reluctant to embrace promising new markets that might undermine their traditional businesses and that do not appear to satisfy their short-term growth requirements.

Why did Amazon succeed?

  • Huge negotiating leverage: cajoling and threatening book publishers to embrace the digital format but not disclosing the future low prices
  • Amazon's user acquisition channel. e.g.
    • kindle ads is put onto the top of Amazon homepage. almost all readers use Amazon.
    • Bezos appeared on Orah Winfrey's talk show.
    • Hunger marketing: ads are still on even when they are sold out
  • great user experiences

Andrew Johns: Indispensable Growth Framework

· 2 min read

What is a growth team?

A team with the responsibility to measure, understand and improve the flow of users in and out of the product and business. Finance owns the flow of cash in and out of a company. Growth owns the flow of customers in and out of a product.

The Three Mandatory Skills of a Growth Leader

  • Building growth models.
  • Developing experimentation models.
  • Building ==customer acquisition channels==.

Basic growth frameworks

Sustainable Growth = multiplication of

  • Top of Funnel (traffic, conversion rates)
  • Magic Moment (create emotional response)
  • Core Product Value (solves real problems)

e.g. amazon's growth = multiplication of

  • vertical expansion
  • product inventory per vertical
  • traffic per product page
  • conversion to purchase
  • average purchase value
  • repeat purchase behavior

Theoretical growth model should be tested with experiments

More than just A/B tests:

  • How do you identify what part of the funnel to focus on?
  • How do you identify the most valuable test to run out of a set of experiments?
  • When do you run — and not run — that test?

Why does growth rate drops if no new optimization is made? It is because the prev optimization has converted its target cohort. There is always a limit and a S-curve in growth. People have to test out the potential new breakthroughs.

New Growth Breakthroughs

The larger the company, the larger the sample size, the less thoughtful the experiment can be. big companies like small optimizations

  • Google engineers testing 40 different shades of blue in a particular sign up button

go deep in funnels and focus on critical points.

A Grabbag of Guidance on Growth

  • You can’t sustainably grow something that sucks.
  • You don’t need or want a "growth hacker" to lead. A "hacker" is just a "hacker" - they are just for small start-ups.
  • Your growth lead needs to be a product person, which has deep understanding in the business, not just the growth.

Golang Library Development

· 3 min read

Guiding Principles

  1. Developing a library, writing code is actually the easy part
    1. There are always unplanned tasks
    2. There are always new features
    3. There are always bugs to fix
  2. The basic ways people measure the quality of open-source libraries
    1. Basic code quality: go vet, go fmt, go lint
    2. Test coverage
    3. Documentation
    4. Number of issues opened / closed
    5. A high number of GitHub stars does not necessarily mean high quality
  3. The four essential elements of an open-source library: usability / readability / flexibility / testability

The Four Essential Elements of a Library

Usability: Think from the User's Perspective

For example, to write an HTTP GET request, create and then send it

import "net/http"

req, err := http.NewRequest(
http.MethodGet, "https://www.google.com", nil /* no body */)
if err != nil {
return err
}
client := &http.Client{}
res, err := client.Do(req)

However, this kind of code is often reused, and writing it repeatedly can be cumbersome. So why not provide a direct GET interface?

import "net/http"

client := &http.Client{}
res, err := client.Get("https://www.google.com")

Readability

Flexibility

Still using the GET example, if you want to add request logging

import "net/http"

client := &http.Client{}
res, err := client.Get("https://www.google.com")

There is a replaceable RoundTripper interface

type Client struct {
// Transport specifies the mechanism by which individual
// HTTP requests are made.
// If nil, DefaultTransport is used.
Transport RoundTripper
}
type RoundTripper interface {
RoundTrip(*Request) (*Response, error)
}

This way, you only need to specify a loggingRoundTripper, without needing to wrap it

import "net/http"

client := &http.Client{Transport: loggingRoundTripper}
res, err := client.Get("https://www.google.com")

Testability

  • Not only must you ensure your own testability
  • You should also provide test helpers for callers

Backward Compatibility

  • Any exported entity
    • Renaming, removal
    • Modification of function parameter types
    • Adding a method to an interface

These can all lead to breaking changes, which is when semantic versioning becomes necessary.

Version Control

Semantic Versioning 2.0.0 | Semantic Versioning

MAJOR.MINOR.PATCH
  • patch: a bug fix
  • minor: a new non-breaking change
  • major: a breaking change

Stable / Unstable?

  • v < 1.0 unstable
    • Starting from 0.1.0, increment MINOR with each release, increment PATCH for bug fixes
  • v >= 1.0 stable official release 1.0

Version bumps are difficult; in a microservices architecture, it may take 6 months to a year to update all dependencies in core libraries.

^1.1 is short for >= 1.1, < 2;
~0.2 is short for >= 0.2, < 0.3

Pin to version range vs Lock exact version

How Can Knowledge Workers Rest Effectively?

· 3 min read

Why Rest?

  • Growth = Stress + Rest. Too much stress and too little rest can overwhelm a person; too little stress and too much rest can lead to complacency and stagnation. Experts allocate stress and rest wisely to grow efficiently and sustainably.
  • Engage in targeted, high-intensity training, then rest and recover. Repeat this process while slightly increasing intensity to effectively enhance physical capabilities.

What Can Rest Restore?

  • The two resources that knowledge workers expend the most and need to recover from rest are willpower and attention, so:
    1. To rest, first disconnect from the internet.
    2. Short and frequent breaks are better than long periods of work followed by long breaks.
    3. It’s best to leave the office during breaks.

What Is Rest and What Is Not?

  • Useful rest: Relaxation, socializing (casual chatting, WeChat).
  • Useless rest: Eating, cognitive activities (reading news, checking emails).

How to Rest?

  • Short-term

    1. Take timely outdoor walks. Moving your body increases blood flow to the brain and provides a pleasant “distraction.” Even standing up and walking a few steps every hour is a good rest.
    2. Return to nature; being surrounded by flowers, trees, mountains, and rivers can be restful. Even looking at pictures of nature can have a noticeable effect on the brain. Changing your computer desktop to a natural landscape can also help.
    3. Gather with friends, chat, and have a drink. It must be friends for the sake of friendship, not for networking.
    4. Start with a cup of coffee containing about 200 milligrams of caffeine; set your phone timer for 25 minutes; begin to rest; get up immediately when the 25 minutes are up.
  • Medium-term 5. A good vacation is like recharging; it can provide abundant energy for a long time afterward. During vacations, completely detach from work. Besides walking, returning to nature, and gathering with friends, you can listen to music or take a bath, but activities that consume attention, like gaming or scrolling through your phone at night, are not allowed.

  • Long-term 6. Learn to meditate. Find a quiet place where you won't be disturbed; sit comfortably; take deep breaths and feel your belly rise and fall with each breath. Focus on your breathing, but don’t deliberately avoid thoughts that come to mind; acknowledge them and then let them go. Start with one minute each day and gradually increase the time. 7. Sleep is crucial; its role is not just to eliminate fatigue but also to allow the body and brain to “grow.” Sleep should be regular, ideally 7-9 hours each night, with the specific duration varying by person, aiming to wake up naturally without an alarm.

How to Not Concentrate?

Attention is a limited resource. When you focus your attention, you are consuming the brain circuits related to attention. But how do you rest your mind? We are constantly thinking, and it’s hard to stop; so what does “not concentrating” mean? This introduces an important concept called the “Default Mode Network.”

  1. “Positive Constructive Daydreaming (PCD).”
  2. Take a short nap.
  3. Pretend to be someone else, viewing stress as a challenge rather than a threat, and encourage yourself.

Design Pinterest

· 13 min read

System design interviews are for people to find teammates capable of independently designing and implementing Internet services. An interview is a great chance to show your "engineering muscles" - you have to combine your knowledge with your decision-making skills to design the right system for the right scenario.

Emit the right signals for your audience

The first thing you need to know about a system design interview is that you have to be talkative throughout the interview session. Of course, you must consult the interviewer to determine whether you are on the right track to give them what they want; however, you still need to prove that you can do the job independently. So, ideally, keep talking about what the interviewer expects throughout the interview before they even have to ask.

Secondly, do not limit yourself to only one solution. Given the same problem, there could be so many ways to solve it that it takes no license to be an engineer. There are pros and cons to all the choices you will make. Discuss tradeoffs with your interviewer, and pick the most suitable solution to your assumptions and constraints. It's like, in the real world, people won't build the Golden Gate bridge over a trench, nor will they build a temporary bridge over San Francisco Bay.

Finally, to excel in the interview, you'd better bring something new. "Good engineers script; great engineers innovate". If you cannot teach people something new, you are just good, not great. Quality answers = Novelty x Resonance.

The 4-step template

If you are not sure how to navigate the session and be talkative all the time, here is a simple 4-step template you can follow in a divide-and-conquer way:

  1. Clarify requirements and make assumptions.
  2. Sketch out the high-level design.
  3. Dive into individual components and how they interact with each other.
  4. Wrap up with blindspots or bottlenecks.

All the designs in this book will follow these steps.

Specifically for this "Design Pinterest", I will explain everything as detailed as possible because it is the first case of the entire book. However, for simplicity, I won't cover many of the elements here in other designs of this book.

Design Pinterest

Step 1. Clarify requirements and make assumptions

All systems exist for a purpose, so with software ones. Meanwhile, software engineers are not artists - we build stuff to fulfill customers' needs. Thus, we should always start with the customer. Meanwhile, to fit the design into a 45-minute session, we must set constraints and scope the work by making assumptions.

Pinterest is a highly scalable photo-sharing service with hundreds of millions of monthly active users. Here are the requirements:

  • Most important features
    • news feed: Customers will see a feed of images after login.
    • one customer follows others to subscribe to their feeds.
    • upload photos: They can upload their images, which will appear in the followers' feeds.
  • Scaling out
    • There are too many features and teams developing the product, so the product is decoupled into microservices.
    • Most of the services should be horizontally scalable and stateless.

Step 2. Sketch out the high-level design

Do not dive into details before outlining the big picture. Otherwise, going off too far in the wrong direction would waste time and prevent you from finishing the task.

Here is the high-level architecture, in which arrows indicate dependencies. (Sometimes, people would use arrows to describe the direction of data flow.)

Instagram Architecture Overview

Step 3. Dive into individual components and how they interact with each other

Once the archiecture is there, we could confirm with the interviewer if they want to go through each component with you. Sometimes, the interviewer may want to zoom into an unexpected domain problem like designing a photo store (that's why I am always saying there is no one-size-fits-all system design solution. Keep learning...). However, here, let's still assume that we are building the core abstraction: upload a photo and then publish to followers.

Again, I will explain as much as possible in a top-down order because this is our first design example. In the real world, you don't have to go through each component in such a level of detail literally; instead, you should focus on the core abstraction first.

Mobile and browser clients connect to the Pinterest data center via edge servers. An edge server is an edge device that provides an entry point into a network. Here we see two kinds of edge servers in the diagram - load balancers and reverse proxy.

Load Balancer (LB)

Load balancers distribute incoming network traffic to a group of backend servers. They fall into three categories:

  • DNS Round Robin (rarely used): clients get a randomly-ordered list of IP addresses.
    • pros: easy to implement and usually free.
    • cons: hard to control and not quite responsive because DNS cache takes time to expire.
  • L3/L4 Network-layer Load Balancer: traffic is routed by IP address and port. L3 is the network layer (IP). L4 is the transport layer (TCP).
    • pros: better granularity, simple, responsive. e.g. forward traffic based on the ports.
    • cons: content-agnostic: cannot route traffic by the content of the data.
  • L7 Application-layer Load Balancer: traffic is routed by what is inside the HTTP protocol. L7 is the application layer (HTTP). In case the interviewer wants more, we can suggest exact algorithms like round robin, weighted round robin, least loaded, least loaded with slow start, utilization limit, latency, cascade, etc. Check design L7 load balancer to learn more.

A load balancer could exist in many other places as long as there is a need for balancing traffic.

Reverse Proxy

Unlike a "forward" proxy in front of clients that route traffic to an external network, a reverse proxy is a kind of proxy sitting in front of servers, so it's called "reverse". By this definition, a load balancer is also a reverse proxy.

Reverse proxy brings a lot of benefits according to how you use it, and here are some typical ones:

  1. Routing: centralize traffic to internal services and provides unified interfaces to the public. For example, www.example.com/index and www.example.com/sports appear to come from the same domain, but those pages are from different servers behind the reverse proxy.
  2. Filtering: filter out requests without valid credentials for authentication or authorization.
  3. Caching: Some resources are so popular for HTTP requests that you may want to configure some cache for the route to save some server resources.

reverse proxy

Nginx, Varnish, HAProxy, and AWS Elastic Load balancing are popular products on the market. I find it handy but powerful to write a lightweight reverse proxy in Golang. In the context of Kubernetes, it's basically what Ingress and Ingress Controllers are doing.

Web App

This is where we serve web pages. In the early days, web service usually combines the backend with page rendering, as Django and Ruby on Rails frameworks do. Later, growing with the project size, they are often decoupled to dedicated fronend and backend projects. Frontend focuses on App rendering while the backend serves the APIs for the frontend to consume.

Mobile App

Most backend engineers are not familiar with mobile design patterns, go to iOS Architecture Patterns for more.

A dedicated frontend web project is very similar to a standalone mobile app - they are both clients of the servers. Some people would call them "holistic frontend", when engineers can build user experiences on both platforms simultaneously, like react for web and react-native for mobile.

API App

Clients talk to the servers via public APIs. Nowadays, people often serve RESTful or GraphQL APIs. Learn more in public API choices.

Stateless web and API tier

There are two major bottlenecks of the whole system -- load (requests per second) and bandwidth. We could improve the situation

  1. by using more efficient software, e.g. using frameworks with async and non-blocking reactor pattern, or
  2. by using more hardware, like
  3. scaling up, aka vertical scaling: using more powerful machines like supercomputers or mainframes, or
  4. scaling out, aka horizontal scaling: using a more significant number of less-expensive machines.

Internet companies prefer scaling out, since

  1. It is more cost-efficient with a vast number of commodity machines.
  2. This is also good for recruiting - everyone could learn programming with a PC.

To scale out, we'd better keep services stateless, meaning they don't hold states in local memory or storage, so we could kill them unexpectedly or restart them anytime for any reason.

Learn more about scaling in how to scale a web service.

Service Tier

The single responsibility principle advocates small and autonomous services that work together so that Each service can "do one thing and do it well", and grow independently. Small teams owning small services can plan much more aggressively for hyper-growth. Learn more about Micro Services vs. Monolithic Services in Designing Uber

Service Discovery

How do those services find each other?

Zookeeper is a popular and centralized choice. Instances with name, address, port, etc. are registered into the path in ZooKeeper for each service. If one service does not know where to find another service, it can query Zookeeper for the location and memorize it until that location is unavailable.

Zookeeper is a CP system in terms of CAP theorem (See Section 2.3 for more discussion), which means it stays consistent in the case of failures, but the leader of the centralized consensus will be unavailable for registering new services.

In contrast to Zookeeper, Uber did some interesting work in a decentralized way, named hyperbahn, based on Ringpop consistent hash ring, though it turned out to be a big failure. Read Amazon's Dynamo to understand AP and eventual consistency.

In the context of Kubernetes, I would like to use service objects and Kube-proxy, so it would be easy for programmers to specify the address of the target service with internal DNS.

Follower Service

The follower-and-followee relationship is all around these two straightforward data structures:

  1. Map<Followee, List of Followers>
  2. Map<Follower, List of Followees>

A key-value store, like Redis, is very suitable here because the data structure is pretty simple, and this service should be mission-critical with high performance and low latency.

The follower service serves functionalities for followers and followees. For an image to appear in the feed, there are two models to make it happen.

  • Push. Once the image is uploaded, we push the image metadata into all the followers' feeds. The follower will see its prepared feed directly.
    • If the Map <Followee, List of Followers> fan-out is too large, then the push model will cost a lot of time and data duplicates.
  • Pull. We don't prepare the feed in advance; instead, when the follower checks its feed, it fetches the list of followees and gets their images.
    • If the Map<Follower, List of Followees> fan-out is too large, then the pull model will spend a lot of time iterating the huge followee list.

Feed Service

The feed service stores the image post metadata like URL, name, description, location, etc, in a database, while images themselves are usually saved in a Blob Storage like AWS S3 and Azure Blob store. Take S3 for example, a possible solution is like the following when the customer creates a post with the web or mobile client:

  1. The server generates an S3 pre-signed URL which grants write permission.
  2. The client uploads the image binary to S3 with the generated pre-signed URL.
  3. The client submits the post and image metadata to the server and then triggers the data pipeline to push the post to followers' feeds if there is a push model.

Customers post to feeds as time passes, so HBase / Cassandra's timestamp index is an excellent fit for this use case.

Images Blob store and CDN

Transmitting blobs consumes a lot of brandwiths. Once we uploaded the blob, we read them a lot but seldemly update or delete it. Thus, developers often cache them with CDNs which will distribute those blobs to a closer place to the customer.

AWS CloudFront CDN + S3 might be the most popular combination on the market. I personally use BunnyCDN for my online content. Web3 developers like to use a decentralized store like IPFS and Arware.

Search Service

The search service connects to all the possible data sources and index them so that people could easily search feeds. We usually use ElasticSearch or Algolia to do the work.

Spam Service

The spam service uses machine learning techniques like supervised and unsupervised learning to mark and delete profanity content and fake accounts. Learn more in Fraud Detection with Semi-supervised Learning.

Step 4. Wrap up with blindspots or bottlenecks.

What are the blindspots or bottlenecks of the design above?

  • As of 2022, people find it less favorable to use the follower-followee way of organizing feeds, because it would be hard 1) for new customers to bootstrap and 2) for existing customers to find more intriguing content. TikTok and Toutiao lead the new wave of innovations to organize feeds with recommendation algorithms. This design, however, does not cover the recommendation system part.
  • For a popular photo-based SNS, scaling is the system's biggest challenge. So to make sure the design can survive the load, we need a capacity-planning.

Capacity planning with a spreadsheet and back-of-the-envelope calculation

There are two directions that we could approach the estimation problem: top-down and bottom-up.

For bottom-up, you do load tests with the existing system and plan the future on the company's current performance and future growth rate.

For top-down, you start with the customers in theory and make the back-of-the-envelope calculation. I highly recommend you do it with a digital spreadsheet, where you can easily list the formula and the assumed/calcuated numbers.

When we rely on external blob storage and CDN, bandwidth is unlikely to be a problem. So I will estimate the capacity for the follower service as an example:

RowDescription ("/" means per)Estimated NumberCalculated
Adaily active users33,000,000
Brequests / user / day60
Crps / machine10,000 (c10k problem)
Dscale factor
(redundency for user growth in 1 yr)
3 times
ENumber of service instances= A * B / (24 * 3600) / C * D~= 7

We can see that Row E is a calculated result of the formula. After applying this estimation method to each one of those microservices and storages, we will better understand the entire system.

Real-world capacity planning is not a one-time deal. Provisioning too many machines will waste money, and preparing too few ones will cause outages. We usually do it with a few cycles of estimation and experimentation to find the right answer; or use autoscaling if the system supports this and budgets are not a problem.

Big corp engineers are often indulged with abundant computing and storage resources. However, great engineers will think about costs and benefits. I would sometimes experiment with different tiers of machines and add rows for their monthly expenses for estimation.