Skip to main content

Ryan Holiday: Attracting and Nurturing Seed Users

· 2 min read
  1. Target a few hundred or a thousand key individuals, rather than millions
    1. For example, Dropbox started its initial launch with an engaging demo video. People could sign up but had to wait to use it. Attract users with something ==novel and exciting==.
    2. Similarly, in 2012, eBay partnered with Gogo to provide free Wi-Fi access to ebay.com during flights. The clever part was tracking data to determine whether it was beneficial to continue the partnership.
  2. Don’t target everyone - focus on the right people
    1. For instance, Uber provided free rides for years during the South by Southwest conference in Austin, attracting thousands of young, high-income tech enthusiasts.
    2. Tips
      • Persuade media outlets to write about you
      • Post on Hacker News, Quora, and Reddit
      • Write blogs
      • Use Kickstarter for crowdfunding
      • Contact journalists through www.helpareporter.com
      • Invite users for free or with some incentives
    3. ==Big tricks==
      • Create exclusivity with “invitation-only” hunger marketing
      • Generate fake users to make it appear more active. (Reddit used this approach)
      • Focus on a single platform (PayPal and eBay)
      • Spread from one user group to another (Facebook and universities)
      • Attract influencers because they have a broad audience and good reputation
      • Make charitable donations on subdomains of e-commerce sites (Amazon)
  3. Focus on new user registrations (acquisition) rather than brand awareness
  4. Growth hacking = marketing + engineering
    1. For example, Airbnb created tools while cross-posting on Craigslist.
    2. Sean Ellis once said: “Staying focused on customer acquisition rather than 'building brand awareness' often requires restraint... Certainly, once a company reaches a certain scale, brand awareness/branding makes sense. But in the first year or two, it’s just a complete waste of money.”
    3. Ineffective actions
      1. Grand launches
      2. Wishfully thinking that “the best way to attract users is to let the product speak for itself” (while Aaron Swartz believed that users must be attracted to come).

Ryan Holiday: How to begin with PMF

· One min read

4 Steps of Growth Hacking

  1. Begin with the product-market fit (PMF)
  2. Finding your growth hack
  3. Going viral
  4. Close the loop

How to begin with PMF?

  1. ==Product/market fit== is the degree to which a product satisfies a strong market demand.

  2. Start with MVP and evolve with feedbacks

  3. Use data and information to back PMF.

  4. Understand the needs of the customers as early as possible

    1. e.g., Amazon employees give internal press release before developing the project to collect feedbacks.
    2. e.g., Werner Vogels suggests writing FAQs for the product you’re developing / critical UX / user manual = concepts + how-to + reference
  5. Develop answers with the Socrates method

    1. Who is this product for? Why would they use it? Why do I use it?
    2. What is it that brought you to this product? What is holding you back from referring other people to it? What’s missing? What’s golden?

Designing a URL Shortener System

· 5 min read

Design a system that can convert URLs provided by users into short URLs, allowing users to access their original URLs (hereinafter referred to as long URLs) using these short URLs. Describe how this system operates, including but not limited to the following questions: How are short URLs allocated? How is the mapping between short URLs and long URLs stored? How is the redirection service implemented? How is access data stored?

Assumptions: The initial problem description does not include these assumptions. An excellent candidate will ask about system scale when given a specific design.

  • There are approximately tens of thousands of long URL domains.
  • The traffic for new long URLs is about 10,000,000 per day (100 per second).
  • The traffic for the redirection service using short URLs to access long URLs is about 10 billion per day (100,000 per second).
  • Remind the candidate that these are average figures - during peak times, these numbers can be much higher (one type of peak time is time-related, such as when users return home from work, and another type is event-related, such as during the Spring Festival Gala).
  • Recent data (e.g., today's data) should be collected in advance and should be available within five minutes when users want to view it.
  • Historical data should be calculated daily.

Assumptions

1 billion new URLs per day, 100 billion short URL accesses. The shorter the short URL, the better. Data presentation (real-time/daily/monthly/yearly).

URL Encoding

http://blog.codinghorror.com/url-shortening-hashes-in-practice/

Method 1: md5 (128 bits, 16 hexadecimal digits, collisions, birthday paradox, 2^(n/2) = 2^64) Shorter? (64 bits, 8 hexadecimal digits, collisions 2^32), base 64.

  • Advantages: Hashing is relatively simple and easy to scale horizontally.
  • Disadvantages: Too long, how to handle expired URLs?

Method 2: Distributed ID generator. (Base 62: az, AZ, 0~9, 62 characters, 62^7), partitioning: each node contains some IDs.

  • Advantages: Easier to eliminate expired URLs, shorter URLs.
  • Disadvantages: Coordination between different partitions (e.g., ZooKeeper).

Key-Value (KV) Storage

MySQL (10k requests per second, slow, no need for a relational database), key-value (100k requests per second, Redis, Memcached).

An excellent candidate will ask about the expected lifespan of short URLs and design a system that can automatically clean up expired short URLs.

Follow-Up

Question: How to generate short URLs?

  • A poor candidate might suggest using a single ID generator (single point of failure) or require coordination between ID generators for each ID generation. For example, using an auto-increment primary key in a database.
  • An acceptable candidate might suggest using md5 or some UUID generators that can generate IDs independently on some nodes. These methods can generate non-colliding IDs in a distributed system, allowing for the production of a large number of short URLs.
  • An excellent candidate will design a method using several ID generators, where each generator first reserves a block of ID sequences from a central coordinator (e.g., ZooKeeper), and these ID generators can allocate IDs from their ID sequences independently, cleaning up their ID sequences when necessary.

Question: How to store the mapping between long URLs and short URLs?

  • A poor candidate might suggest using a single, non-distributed, non-relational database. It is merely a simple key-value database.
  • An excellent candidate will suggest using a simple distributed storage system, such as MongoDB/HBase/Voldemort, etc.
  • A more excellent candidate will ask about the expected usage cycle of short URLs and then design a system that can clean up expired short URLs.

Question: How to implement the redirection service?

  • A poor candidate will design the system from scratch to solve problems that have already been solved.
  • An excellent candidate will suggest using an existing HTTP server with a plugin to translate the short URL ID, look up this ID in the database, update access data, return a 303 status, and redirect to the long URL. Existing HTTP servers include Apache/Jetty/Netty/Tomcat, etc.

Question: How to store access data?

  • A poor candidate will suggest writing to the database on every access.
  • An excellent candidate will suggest having several different components handle this task: generating access stream data, collecting and organizing it, and writing it to a permanent database after a certain period.

Question: How to separate the different components of storing access data proposed by the excellent candidate?

  • An excellent candidate will suggest using a low-latency information system to temporarily store access data and then hand the data over to the collection and organization component.
  • The candidate may ask how often access data needs to be updated. If updated daily, a reasonable method would be to store it in HDFS and use MapReduce to compute the data. If near-real-time data is required, the collection and organization component must compute the necessary data.

Question: How to block access to restricted websites?

  • An excellent candidate will suggest maintaining a blacklist of domains in the key-value database.
  • A good candidate might propose some advanced technologies that can be used when the system scales significantly, such as bloom filters.

Improving System Availability through Failover

· 2 min read

Failover: Failover is a backup operational mode used to enhance system stability and availability. When the primary component fails or is scheduled for downtime, the functions of system components (such as processors, servers, networks, or databases) are transferred to secondary system components.

Cold Backup: Cold backup refers to copying critical files to another location, using features or metrics/alerts to track failures. The system provides a new standby node in the event of a failure; however, cold backup is only suitable for stateless services. For backing up Oracle databases, cold backup is the fastest and safest method.

Hot Backup: This involves maintaining two active systems that share the same task roles, meaning the system operates normally while providing backup. The data between the two systems is nearly mirrored in real-time and contains the same information.

Warm Backup: This keeps two active systems, where the secondary system does not consume traffic unless a failure occurs.

Checkpoint (or similar to Redis snapshots): The system uses write-ahead logging (WAL) to record requests before processing tasks. The standby node recovers from the log during failover.

  • Disadvantages
    • A large amount of log recovery can be time-consuming
    • Data loss since the last checkpoint
  • User Cases: Storm, WhillWheel, Samza

Dual-host (or all-host) mode: This keeps two active systems behind a load balancer. The hosts operate in parallel, and data replication is bidirectional.

Improving availability with failover

· One min read

Cold Standby: Use heartbeat or metrics/alerts to track failure. Provision new standby nodes when a failure occurs. Only suitable for stateless services.

Hot Standby: Keep two active systems undertaking the same role. Data is mirrored in near real time, and both systems will have identical data.

Warm Standby: Keep two active systems but the secondary one does not take traffic unless the failure occurs.

Checkpointing (or like Redis snapshot): Use write-ahead log (WAL) to record requests before processing. Standby node recovers from the log during the failover.

  • cons
    • time-consuming for large logs
    • lose data since the last checkpoint
  • usercase: Storm, WhillWheel, Samza

Active-active (or all active): Keep two active systems behind a load balancer. Both of them take in parallel. Data replication is bi-directional.

Designing a URL shortener

· 4 min read

Design a system to take user-provided URLs and transform them to a shortened URLs that redirect back to the original. Describe how the system works. How would you allocate the shorthand URLs? How would you store the shorthand to original URL mapping? How would you implement the redirect servers? How would you store the click stats?

Assumptions: I generally don't include these assumptions in the initial problem presentation. Good candidates will ask about scale when coming up with a design.

  • Total number of unique domains registering redirect URLs is on the order of 10s of thousands
  • New URL registrations are on the order of 10,000,000/day (100/sec)
  • Redirect requests are on the order of 10B/day (100,000/sec)
  • Remind candidates that those are average numbers - during peak traffic (either driven by time, such as 'as people come home from work' or by outside events, such as 'during the Superbowl') they may be much higher.
  • Recent stats (within the current day) should be aggregated and available with a 5 minute lag time
  • Long look-back stats can be computed daily

Assumptions

1B new URLs per day, 100B entries in total the shorter, the better show statics (real-time and daily/monthly/yearly)

Encode Url

http://blog.codinghorror.com/url-shortening-hashes-in-practice/

Choice 1. md5(128 bit, 16 hex numbers, collision, birthday paradox, 2^(n/2) = 2^64) truncate? (64bit, 8 hex number, collision 2^32), Base64.

  • Pros: hashing is simple and horizontally scalable.
  • Cons: too long, how to purify expired URLs?

Choice 2. Distributed Seq Id Generator. (Base62: az, AZ, 0~9, 62 chars, 62^7), sharding: each node maintains a section of ids.

  • Pros: easy to outdate expired entries, shorter
  • Cons: coordination (zookeeper)

KV store

MySQL(10k qps, slow, no relation), KV (100k qps, Redis, Memcached)

A great candidate will ask about the lifespan of the aliases and design a system that purges aliases past their expiration.

Followup

Q: How will shortened URLs be generated?

  • A poor candidate will propose a solution that uses a single id generator (single point of failure) or a solution that requires coordination among id generator servers on every request. For example, a single database server using an auto-increment primary key.
  • An acceptable candidate will propose a solution using an md5 of the URL, or some form of UUID generator that can be done independently on any node. While this allows distributed generation of non- colliding IDs, it yields large "shortened" URLs
  • A good candidate will design a solution that utilizes a cluster of id generators that reserve chunks of the id space from a central coordinator (e.g. ZooKeeper) and independently allocate IDs from their chunk, refreshing as necessary.

Q: How to store the mappings?

  • A poor candidate will suggest a monolithic database. There are no relational aspects to this store. It is a pure key-value store.
  • A good candidate will propose using any light-weight, distributed store. MongoDB/HBase/Voldemort/etc.
  • A great candidate will ask about the lifespan of the aliases and design a system that ==purges aliases past their expiration==

Q: How to implement the redirect servers?

  • A poor candidate will start designing something from scratch to solve an already solved problem
  • A good candidate will propose using an off-the-shelf HTTP server with a plug-in that parses the shortened URL key, looks the alias up in the DB, updates click stats and returns a 303 back to the original URL. Apache/Jetty/Netty/tomcat/etc. are all fine.

Q: How are click stats stored?

  • A poor candidate will suggest write-back to a data store on every click
  • A good candidate will suggest some form of ==aggregation tier that accepts clickstream data, aggregates it, and writes back a persistent data store periodically==

Q: How will the aggregation tier be partitioned?

  • A great candidate will suggest a low-latency messaging system to buffer the click data and transfer it to the aggregation tier.
  • A candidate may ask how often the stats need to be updated. If daily, storing in HDFS and running map/reduce jobs to compute stats is a reasonable approach If near real-time, the aggregation logic should compute stats

Q: How to prevent visiting restricted sites?

  • A good candidate can answer with maintaining a blacklist of hostnames in a KV store.
  • A great candidate may propose some advanced scaling techniques like bloom filter.

Lambda Architecture

· One min read

Why lambda architecture?

To solve three problems introduced by big data

  1. Accuracy (好)
  2. Latency (快)
  3. Throughput (多)

e.g. problems with scaling a pageview service in a traditional way

  1. You start with a traditional relational database.
  2. Then adding a pub-sub queue.
  3. Then scaling by horizontal partitioning or sharding
  4. Fault-tolerance issues begin
  5. Data corruption happens

The key point is that ==X-axis dimension alone of the AKF scale cube is not good enough. We should introduce Y-axis / functional decomposition as well. Lambda architecture tells us how to do it for a data system.==

What is lambda architecture?

If we define a data system as

Query = function(all data)

Then a lambda architecture is

Lambda Architecture

batch view = function(all data at the batching job's execution time)
realtime view = function(realtime view, new data)

query = function(batch view. realtime view)

==Lambda architecture = CQRS (batch layer + serving layer) + speed layer==

Lambda Architecture for big data systems

Lambda Architecture

· One min read

Why Use Lambda Architecture?

To address the three issues brought by big data:

  1. Accuracy (good)
  2. Latency (fast)
  3. Throughput (high)

For example: The problems of scaling web browsing data records in a traditional way:

  1. First, use a traditional relational database.
  2. Then, add a "publish/subscribe" model queue.
  3. Next, scale through horizontal partitioning or sharding.
  4. Fault tolerance issues begin to arise.
  5. Data corruption phenomena start to appear.

The key issue is that in the AKF Scaling Cube, ==having only the X-axis for horizontal partitioning of one dimension is not enough; we also need to introduce the Y-axis for functional decomposition. The lambda architecture can guide us on how to scale a data system==.

What is Lambda Architecture?

If we define a data system in the following form:

Query=function(all data)

Then a lambda architecture is:

Lambda Architecture

batch view = function(all data at the batching job's execution time)
realtime view = function(realtime view, new data)

query = function(batch view, realtime view)

==Lambda architecture = Read/Write separation (Batch Processing Layer + Service Layer) + Real-time Processing Layer==

Lambda Architecture for big data systems

Nonviolent Communication (NVC)

· 2 min read

Why Practice Nonviolent Communication?

Improve communication quality by ==valuing everyone's needs==. ==Doubt and violence are manifestations of unmet needs==.

What NVC Is Not

  • It is not about appearing friendly.
  • It is not about getting others to do what we want; it concerns mutual understanding between people.

Ways to Strengthen Connections and Understanding Between People

  1. Express our feelings and needs vulnerably
    • Recognize ongoing feelings and needs
    • Expose the vulnerability of feelings and needs
  2. Actively listen to the feelings and needs of others
    • The core of empathetic listening: presence, focus, space, and care, ==along with verbal expression of feelings and needs==
    • Do not give advice, make judgments, comfort, tell stories, sympathize, analyze, or explain, ...
    • Regardless of what is said, the key is to listen to the other person's feelings, needs, opinions, and requests

For example: ==Because you need... so you feel...?==

Engaging in These Behaviors Can Lead to Distance Between Us

  • Evaluating others, making judgments, labeling, analyzing, criticizing, comparing, etc.
  • Worth considering (i.e., certain behaviors deserve punishment or reward)
    • Demands (not accepting others' choices; wanting to punish those who do not act according to one's own ideas)
    • Refusing to choose or take responsibility (keywords: have to, should have, guess they will, they made me do it, etc.)

Why is Buyer Persona Important?

· 2 min read

KYC (Know Your Customer) is Not Easy

==How often do you get the chance to hear your customers describe the problems they face?== If you are an employee in a large company outside the marketing department, the answer is likely never.

Of course, if you do get the chance, it’s important to know that there are two very crucial parts to the ==core concept of buyer persona==:

  1. Asking exploratory questions
  2. Listening

==KYC (Know Your Customer)== is not easy. For example, in 2008, the sales of the iPhone 3G in Japan were poor. Japanese consumers were accustomed to recording videos with their phones and paying with debit cards or train passes.

Just Having a Buyer Profile is Not Enough

A typical buyer profile cannot accurately inform marketers about the buyers' decisions to purchase. Marketers often make guesses based on demographics (such as age, income, marital status, education) or psychographics (personality, values, lifestyle, opinions).

Nonetheless, buyer profiles can provide some obvious answers. For instance, reaching out to a Chief Financial Officer (CFO) via email is quite challenging. Moreover, emphasizing the spaciousness of a car's cargo area, even if it can fit a large dog, is pointless for a woman who only has time to care for a goldfish.

The most effective way to build buyer persona models is not through guessing, but by researching those who have weighed their options, considered or rejected proposed solutions, and made decisions similar to the ones you want to influence.

Buyer persona = buyer profile (who will buy) + buyer insights (when/how/why they buy)