Building Personal Infrastructure

23684 2020-10-04 12:11

To enjoy a life of building software, media, and community as a hobby (all things here are NOT related to my job) / for pure pleasure - why build personal infrastructure? And what are the strategies and executions to grow hobby projects? What is my current progress?

Everything starts from playing

Primary school

Playing Chinese copy of Tamiya mini 4WD

And play computer games on DOS.

Introduction to programming with Macromedia Authorware

Middle School

  • Built a website with Microsoft FrontPage to track Iraq War 2003
  • Built a text-based game with QBasic on Digital Dictionary

And then you can play in the classroom behind piles of textbooks :)

High School

  • Built with Lego Mindstorm RCX for FIRST Lego League Challenge
Lego Robots Challenge

College & Grad School

  • used SQL injection to add 20 CNY credits to our school’s meal card for my roommates. However, two days later, they were called by the cashier… LOL.

  • sniffed an audiobook app and then, based on its API, built my own Android App to get free audios.

Why is programming fun?

… the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning.

— The Mythical Man-month

  • Builders build for themselves anyway. Why not share them with others?
  • Builders build things anyway. Why not make them big?

Meanwhile, I came across some mind-blowing articles.

I conclude:

  • Amazon’s flywheel is built for business with high fixed costs and optimizing returns to scale.
  • My life is with high fixed costs (limited time and energy), and scale economy is worth pursuing in this case.
  • I can achieve more than I think with a powerful personal infrastructure that gains progressive advantages over time.

Plus, some take-away from my previous pre-PMF startup experience

  1. Seek retention-first, data-driven growth. Sean Ellis: “Focusing on customer acquisition over ‘awareness’ takes discipline… At a certain scale, awareness/brand building makes sense. However, for the first year or two it’s a total waste of money.”
  2. Table stakes are table stakes. You don’t need VC to build initial product. Ideas are just ideas. Build your team, build your product, and collaborate with people.

Developing Personal Infra Strategy


Here is the architecture of my hobby projects.

Tech Stack

Technologies: React, React Native Expo, GraphQL, KOA, TypeScript, AVA, Webpack, Airflow, MongoDB, Python Pandas and Flask, svelte, Metabase, Golang, etc.

Servers and APIs: Heroku, DigitalOcean, Azure, AWS, Github Pages, BunnyCDN.

Being an early majority to adopt proven new tech

System Architecture

Focus on building, not wasting time on SRE

Example 1

Example 2

“This architecture is not future-proof! / does not scale!”


  • Services are mostly stateless and horizontal scalable
  • Service collocation is a problem but you have to segregate them anyway when collaborating with various people and achieving Personal IaaS (individually-sellable).
  • Can always evolve to kubernetes.

Personal Root Metrics

Living a balanced life and keep everything on track, measured by data

The key metrics for a “retention-first growth” is cohort analysis.

Benchmarks for reference:

Industry Day 1 Day 7 Day 30
2C 40 20 10
E-commerce 35 15 5
Gaming 30 15 <5
EdTech 25 10 5
  • Values in unit of %



  • Double-entry bookkeeping made easy for living your best financial life
  • Personal CRM: Smartly engage more in meaningful relationships
  • One coding challenge per day


  • Building web & mobile apps with speed & quality

Helped my friends’ projects to start from scratch

  • CocuSocial: Discover a different food and drink experience
  • helped my day job at IoTeX to build staking portal, blockchain explorer, desktop wallet, etc.
  • not to mention some other failed projects…



Can I use your projects or join your community?

👍 Definitely and welcome! They are mostly open sourced or open for registration. Thank you for becoming our valued customer or community member!

👏 Feedback is highly appreciated!

❤️ Like it? Check this article at and follow me on :)

Crack the System Design Interview

209683 2016-2-14 01:27 (Last Updated 2020-09-08 20:51)

Update 2019-09-25: Welcome to join us for discussion:

Update 2018/10/18: This article is collected in the topic hacking the software engineer interview

Building applications is not a rocket science, but having the vision of the overall architecture really makes a difference. We have to crack the system interview sooner or later in our career as a software engineer or engineering manager.

Interviewers are looking for future teammates that they like to work with. The future teammates are expected to be, at least, capable of solving problems independently. There are so many solutions to any given problem, but not all of them are suited given the context. So the interviewee has to specify different choices and their tradeoffs. To summarize, system design interview is a happy show-off of our knowledge on technologies and their tradeoffs. Ideally, keep talking what the interviewer expect throughout the interview, before they even have to ask.

Keep talking for 45 mins could be easy, as long as we are armed with the following four steps and three common topics. Take “design Pinterest” for example.

1. Four steps to Crack the System Design Interview

Breaking down a complex task into small chunks helps us handle the problem at a better pace and in a more actionable way.

1.1 Step 1. Clarify Requirements and Specs

First things first, the ultimate goals should always be clear.

Pinterest is a highly scalable photo-sharing service:

  • features: user profile, upload photos, news feed, etc.
  • scaling out: horizontal scalability and micro services.

1.2 Step 2. Sketch Out High Level Design

Do not dive into details before outlining the big picture. Otherwise, going off too far towards a wrong direction would make it harder to even provide a roughly correct solution. We will regret wasting time on irrelevant details when we do not have time to finish the task.

OK, let us sketch out the following diagram without concerning too much about the implementation detail of these components.

pinterest architecture

So far so good! To some extent, congrats, we have solved the problem!

1.3 Step 3. Discuss individual components and how they interact in detail

When we truly understand a system, we should be able to identify what each component is and explain how they interact with one another. Take these components in the above diagram and specify each one by one. This could lead to more general discussions, such as the three common topics in Section 2, and to more specific domains, like how to design the photo storage data layout

1.3.1 Load Balancer

Generally speaking, load balancers fall into three categories:

  • DNS Round Robin (rarely used): clients get a randomly-ordered list of IP addresses.
    • pros: easy to implement and free
    • cons: hard to control and not responsive, since DNS cache needs time to expire
  • L3/L4 Load Balancer: traffic is routed by IP address and port. L3 is network layer (IP). L4 is transport layer (TCP).
    • pros: better granularity, simple, responsive.
  • L7 Load Balancer: traffic is routed by what is inside the HTTP protocol. L7 is application layer (HTTP).

It is good enough to talk in this level of detail on this topic, but in case the interviewer wants more, we can suggest exact algorithms like round robin, weighted round robin, least loaded, least loaded with slow start, utilization limit, latency, cascade, etc.

1.3.2 Reverse Proxy

Reverse proxy, like varnish, centralizes internal services and provides unified interfaces to the public. For example, and appear to come from the same domain, but in fact they are from different micro services behind the reverse proxy. Reverse proxy could also help with caching and load balancing.

reverse proxy

1.3.3 (Frontend) Web Tier

This is where web pages are served, and usually combined with the service / backend tier in the very early stage of a web service.


There are two major bottlenecks of the whole system – requests per second (rps) and bandwidth. We could improve the situation by using more efficient tech stack, like frameworks with async and non-blocking reactor pattern, and enhancing the hardware, like scaling up (aka vertical scaling) or scaling out (aka horizontal scaling).

Internet companies prefer scaling out, since it is more cost-efficient with a huge number of commodity machines. This is also good for recruiting, because the target skillsets are equipped by. After all, people rarely play with super computers or mainframes at home.

Frontend web tier and service tier must be stateless in order to add or remove hosts conveniently, thus achieving horizontal scalability. As for feature switch or configs, there could be a database table / standalone service to keep those states.

Web Application and API

MVC(MVT) or MVVC pattern is the dominant pattern for this layer. Traditionally, view or template is rendered to HTML by the server at runtime. In the age of mobile computing, view can be as simple as serving the minimal package of data transporting to the mobile devices, which is called web API. People believe that the API can be shared by clients and browsers. And that is why single page web applications are becoming more and more popular, especially with the assistance of frontend frameworks like react.js, angular.js, backbone.js, etc.

1.3.4 App Service Tier

The single responsibility principle advocates small and autonomous services that work together, so that each service can do one thing well and not block others. Small teams with small services can plan much more aggressively for the sake of hyper-growth.

Service Discovery

How do those services find each other? Zookeeper is a popular and centralized choice. Instances with name, address, port, etc. are registered into the path in ZooKeeper for each service. If one service does not know where to find another service, it can query Zookeeper for the location and memorize it until that location is unavailable.

Zookeeper is a CP system in terms of CAP theorem (See Section 2.3 for more discussion), which means it stays consistent in the case of failures, but the leader of the centralized consensus will be unavailable for registering new services.

In contrast to Zookeeper, Uber is doing interesting work in a decentralized way, named hyperbahn, based on Ringpop consisten hash ring. Read Amazon’s Dynamo to understand AP and eventual consistency.

Micro Services

For the Pinterest case, these micro services could be user profile, follower, feed, search, spam, etc. Any of those topics could lead to an in-depth discussion. Useful links are listed in Section 3: Future Studies, to help us deal with them.

However, for a general interview question like “design Pinterest”, it is good enough to leave those services as black boxes… If we want to show more passion, elaborate with some sample endpoints / APIs for those services would be great.

1.3.5 Data Tier

Although a relational database can do almost all the storage work, please remember do not save a blob, like a photo, into a relational database, and choose the right database for the right service. For example, read performance is important for follower service, therefore it makes sense to use a key-value cache. Feeds are generated as time passes by, so HBase / Cassandra’s timestamp index is a great fit for this use case. Users have relationships with other users or objects, so a relational database is our choice by default in an user profile service.

Data and storage is a rather wide topic, and we will discuss it later in Section 2.2 Storage.

1.4 (Optional) Step 4. Back-of-the-envelope Calculation

The final step, estimating how many machines are required, is optional, because time is probably up after all the discussions above and three common topics below. In case we run into this topic, we’d better prepare for it as well. It is a little tricky… we need come up with some variables and a function first, and then make some guesses for the values of those variables, and finally calculate the result.

The cost is a function of CPU, RAM, storage, bandwidth, number and size of the images uploaded each day.

  • N users 10^10
  • i images / (user * day) 10
  • s size in bytes / image 10^6
  • viewed v times / image 100
  • d days
  • h requests / sec 10^4 (bottleneck)
  • b bandwidth (bottleneck)
  • Server cost: $1000 / server
  • Storage cost: $0.1 / GB
  • Storage = Nisd

Remember the two bottlenecks we mentioned in section 1.3.3 Web Tier? – requests per second (rps) and bandwidth. So the final expression would be

Number of required servers = max(Niv/h, Nivs/b)

2 Three Common Topics

There are three common topics that could be applied to almost every system design question. They are extracted and summarized in this section.

2.1 Communication

How do different components interact with each other? – communication protocols.

Here is a simple comparison of those protocols.

  • UDP and TCP are both transport layer protocols. TCP is reliable and connection-based. UDP is connectionless and unreliable.
  • HTTP is in the application layer and normally TCP based, since HTTP assumes a reliable transport.
  • RPC, a session layer (or application layer in TCP/IP layer model) protocol, is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space (commonly on another computer on a shared network), without the programmer explicitly coding the details for this remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. In an Object-Oriented Programming context, RPC is also called remote invocation or remote method invocation (RMI).
Further discussions

Since RPC is super useful, some interviewers may ask how RPC works. The following picture is a brief answer.


*Stub procedure: a local procedure that marshals the procedure identifier and the arguments into a request message, and then to send via its communication module to the server. When the reply message arrives, it unmarshals the results.

We do not have to implement our own RPC protocols. There are off-the-shelf frameworks.

  • Google Protobuf: an open source RPC with only APIs but no RPC implementations. Smaller serialized data and slightly faster. Better documentations and cleaner APIs.
  • Facebook Thrift: supports more languages, richer data structures: list, set, map, etc. that Protobuf does not support) Incomplete documentation and hard to find good examples.
    • User case: Hbase/Cassandra/Hypertable/Scrib/…
  • Apache Avro: Avro is heavily used in the hadoop ecosystem and based on dynamic schemas in Json. It features dynamic typing, untagged data, and no manually-assigned field IDs.

Generally speaking, RPC is internally used by many tech companies for performance issues, but it is rather hard to debug and not flexible. So for public APIs, we tend to use HTTP APIs, and are usually following the RESTful style.

  • REST (Representational state transfer of resources)
    • Best practice of HTTP API to interact with resources.
    • URL only decides the location. Headers (Accept and Content-Type, etc.) decide the representation. HTTP methods(GET/POST/PUT/DELETE) decide the state transfer.
    • minimize the coupling between client and server (a huge number of HTTP infras on various clients, data-marshalling).
    • stateless and scaling out.
    • service partitioning feasible.
    • used for public API.

2.2 Storage

2.2.1 Relational Database

Relational database is the default choice for most use cases, by reason of ACID (atomicity, consistency, isolation, and durability). One tricky thing is consistency – it means that any transaction will bring database from one valid state to another, (different from the consistency in CAP, which will be discussed in Section 2.3).

Schema Design and 3rd Normal Form (3NF)

To reduce redundancy and improve consistency, people follow 3NF when designing database schemas:

  • 1NF: tabular, each row-column intersection contains only one value
  • 2NF: only the primary key determines all the attributes
  • 3NF: only the candidate keys determine all the attributes (and non-prime attributes do not depend on each other)
Db Proxy

What if we want to eliminate single point of failure? What if the dataset is too large for one single machine to hold? For MySQL, the answer is to use a DB proxy to distribute data, either by clustering or by sharding.

Clustering is a decentralized solution. Everything is automatic. Data is distributed, moved, rebalanced automatically. Nodes gossip with each other, (though it may cause group isolation).

Sharding is a centralized solution. If we get rid of properties of clustering that we don’t like, sharding is what we get. Data is distributed manually and does not move. Nodes are not aware of each other.

2.2.2 NoSQL

In a regular Internet service, the read write ratio is about 100:1 to 1000:1. However, when reading from a hard disk, a database join operation is time consuming, and 99% of the time is spent on disk seek. Not to mention a distributed join operation across networks.

To optimize the read performance, denormalization is introduced by adding redundant data or by grouping data. These four categories of NoSQL are here to help.

Key-value Store

The abstraction of a KV store is a giant hashtable/hashmap/dictionary.

The main reason we want to use a key-value cache is to reduce latency for accessing active data. Achieve an O(1) read/write performance on a fast and expensive media (like memory or SSD), instead of a traditional O(logn) read/write on a slow and cheap media (typically hard drive).

There are three major factors to consider when we design the cache.

  1. Pattern: How to cache? is it read-through/write-through/write-around/write-back/cache-aside?
  2. Placement: Where to place the cache? client side/distinct layer/server side?
  3. Replacement: When to expire/replace the data? LRU/LFU/ARC?

Out-of-box choices: Redis/Memcache? Redis supports data persistence while memcache does not. Riak, Berkeley DB, HamsterDB, Amazon Dynamo, Project Voldemort, etc.

Document Store

The abstraction of a document store is like a KV store, but documents, like XML, JSON, BSON, and so on, are stored in the value part of the pair.

The main reason we want to use a document store is for flexibility and performance. Flexibility is obtained by schemaless document, and performance is improved by breaking 3NF. Startup’s business requirements are changing from time to time. Flexible schema empowers them to move fast.

Out-of-box choices: MongoDB, CouchDB, Terrastore, OrientDB, RavenDB, etc.

Column-oriented Store

The abstraction of a column-oriented store is like a giant nested map: ColumnFamily<RowKey, Columns<Name, Value, Timestamp>>.

The main reason we want to use a column-oriented store is that it is distributed, highly-available, and optimized for write.

Out-of-box choices: Cassandra, HBase, Hypertable, Amazon SimpleDB, etc.

Graph Database

As the name indicates, this database’s abstraction is a graph. It allows us to store entities and the relationships between them.

If we use a relational database to store the graph, adding/removing relationships may involve schema changes and data movement, which is not the case when using a graph database. On the other hand, when we create tables in a relational database for the graph, we model based on the traversal we want; if the traversal changes, the data will have to change.

Out-of-box choices: Neo4J, Infinitegraph, OrientDB, FlockDB, etc.

2.3 CAP Theorem

When we design a distributed system, trading off among CAP (consistency, availability, and partition tolerance) is almost the first thing we want to consider.

  • Consistency: all nodes see the same data at the same time
  • Availability: a guarantee that every request receives a response about whether it succeeded or failed
  • Partition tolerance: system continues to operate despite arbitrary message loss or failure of part of the system

In a distributed context, the choice is between CP and AP. Unfortunately, CA is just a joke, because single point of failure is a red flag in the real distributed systems world.

To ensure consistency, there are some popular protocols to consider: 2PC, eventual consistency (vector clock + RWN), Paxos, In-Sync Replica, etc.

To ensure availability, we can add replicas for the data. As to components of the whole system, people usually do cold standby, warm standby, hot standby, and active-active to handle the failover.

3 Future Studies

Just as everything else, preparation and practice makes perfect. Here are more resources I prepared for you to know more. Enjoy! ;)


MIT surveyed nearly 20,000 professionals from around the world - 50% from North America, 21% from Europe, 19% from Asia, and the rest from Australia, South America, and Africa. Takeaways are …

1. Sort tasks by importance and act with a clear goal.

  • Before writing anything of any length, prepare an outline in a logical order to help you stay on track.
  • Revise your daily calendar the night before to emphasize your priorities. Next to each agenda on your schedule, write down your goals.
  • Send a detailed agenda to all participants before any meeting.
  • When embarking on a large project, sketch out preliminary conclusions as soon as possible.
  • Before reading any lengthy material, determine your specific purpose for it.

2. Dealing with information & task overload.

  • Skip most messages by looking at the subject and sender.
  • Make daily processes, like getting dressed or eating breakfast, a routine so you don’t spend time thinking about them.
  • Check your device’s screen every hour, rather than every few minutes.
  • Break large projects into sections and reward yourself when you complete each section.
  • Delegate tasks that don’t interfere with your top priorities, when feasible.
  • Set aside time in your daily schedule to deal with emergencies and unexpected events.

3. Your colleagues need short meetings, responsive communication, and clear direction.

  • Respond immediately to messages from those who are important to you.
  • To capture the audience’s attention, speak from some notes rather than reading a prepared text.
  • Limit any meeting to 90 minutes or less, but preferably less. At the end of each session, delineate the next steps and responsibilities for those steps.
  • To improve your team’s performance, establish procedures to prevent future mistakes instead of playing the blame game.
  • Establish clear goals and success metrics for any team effort.


An online judge is primarily a place where you can execute code remotely for educational or recruitment purposes. In this design, we focus on designing an OJ for interview preparation like Leetcode, with the following requirements:

  • It should have core OJ functionalities like fetching a problem, submitting the solution, compiling if needed, and executing it.
  • It should be highly available with an async design, since running code may take time.
  • It should be horizontal scalable or, saying, easy to scale-out.
  • It should be robust and secure to execute untrusted source code.


The architecture below is featured on queueing for async execution and sandboxing for secure execution. And each component is separately deployable and scalable.

designing online judge system


Presentation Layer

The user agent is usually a web or mobile app like It displays the problem description and provides the user with a code editor to write and submit code.

When the user submits the code, the client will get a token since it is an async call. Then the client polls the server for the submission status.


Please see Public API Choices for the protocols we can choose from. And let’s design the interface itself here and GraphQL for example:

type Query {
  problems(id: String): [Problem]
  languageSetup(id: String!, languageId: LanguageId!): LanguageSetup
  submission(token: String!) Submission

type Mutation {
    problemId: String!
    code: String!
    languageId: LanguageId!
  ): CreatedSubmission!

enum LanguageId {
  # ...

type Problem {
  id: String!
  title: String!
  description: String!
  supportedLanguages: [Float!]!

type LanguageSetup {
  languageId: LanguageId!
  template: String!
  solutions: [String!]!

type Status {
  id: Float!
  description: String!

type Submission {
  compileOutput: String
  memory: Float
  message: String
  status: Status
  stderr: String
  stdout: String
  time: String
  token: String

type CreatedSubmission {
  token: String!

The API layer records the submission in the database, publishes it into the queue, and returns a token for the client’s future reference.

Code Execution Engine

Code execution engine (CEE) polls the queue for the code, uses a sandbox to compile and run the code and parses the metadata from the compilation and execution.

The sandbox could be LXC containers, Docker, virtual machines, etc. We can choose Docker for its ease of deployment.

I am recently learning Elixir and creating an online judge for my daily practice. It now supports Elixir and JavaScript. And I am adding more languages (like Java) and problems to it.

We may host future events to improve your coding skills. Join us at for the English community or use your WeChat to scan the following QR onetptp and reply 刷题 for the Chinese community.


What is PWA?

67337 2020-06-26 13:27

When Google came up with the PWA, it didn’t have a precise definition. It is not a specific technology, but a combination of techniques to improve the user experience. Those technologies are Web App Manifest, Service Worker, Web Push, etc.

The main features of PWA are as follows.

  • Reliability - instant loading, even in unstable or disconnected network environments.
  • User experience - rapid response, with smooth transition animations and feedback on user actions
  • Stickiness - like the Native App, can be added to the home screen and receive offline notifications.

PWA itself emphasizes Progressive (Progressive) in two perspectives.

  1. PWA is still evolving;
  2. PWA is downward-compatible and non-intrusive. It costs developers little to use the new features - developers can add it to the existing site progressively.

Google’s “Progressive Web App Checklist” defines those minimum requirements for PWA.

  • served over HTTPS.
  • Pages should be responsive on desktop, tablet, and mobile devices
  • All URLs have content to show in case of disconnection, not the default browser page
  • requires Web App Manifest to be added to the desktop
  • Faster page loading and shorter delay, even on 3G networks
  • It displays correctly in all major browsers
  • Smooth animation with immediate feedback
  • Each page has its own URL

Features of PWA

A PWA combines the benefits of both the Web App and Native App and gives us the following features.

  • Progressive - for all browsers, as it is developed with progressive enhancements in mind
  • Connectivity agnostic - Ability to leverage Service Worker for offline or low network connectivity.
  • Native experiences - on the App Shell model, they should have Native App interactions.
  • Continued updates – always up-to-date, no version or update issues.
  • Security - Served over HTTPS
  • Indexable - manifest files and Service Workers can be recognized and indexed by the search engine.
  • Stickiness - By pushing offline notifications, etc., you can get users back to your app.
  • Installable - users can easily add web apps to the home or desktop without going to an app store.
  • Linkable - share contents through links without downloading and installing them.

More specifically, what is the advantage of PWA over the native app? Openness and index-ability. Users can hardly install a native app instantly and search across native apps seamlessly.

The table below shows the comparison between t raditional Web App, Native App, and PWA for each feature.

Installable Linkable User experience User stickiness
Traditional Web
Native App 😐 ✅️

Download App

Learn startup engineering anywhere, anytime LedgerTouchBase CRMOneFx.JS Frameworkcoderoma OJ
Subscription RSS Github Email Linkedin Twitter