Skip to main content

Intro to Relational Database

· 2 min read

Relational database is the default choice for most storage use cases, by reason of ACID (atomicity, consistency, isolation, and durability). One tricky thing is "consistency" -- it means that any transaction will bring database from one valid state to another, which is different from Consistency in CAP theorem.

Schema Design and 3rd Normal Form (3NF)

To reduce redundancy and improve consistency, people follow 3NF when designing database schemas:

  • 1NF: tabular, each row-column intersection contains only one value
  • 2NF: only the primary key determines all the attributes
  • 3NF: only the candidate keys determine all the attributes (and non-prime attributes do not depend on each other)

Db Proxy

What if we want to eliminate single point of failure? What if the dataset is too large for one single machine to hold? For MySQL, the answer is to use a DB proxy to distribute data, either by clustering or by sharding

Clustering is a decentralized solution. Everything is automatic. Data is distributed, moved, rebalanced automatically. Nodes gossip with each other, (though it may cause group isolation).

Sharding is a centralized solution. If we get rid of properties of clustering that we don't like, sharding is what we get. Data is distributed manually and does not move. Nodes are not aware of each other.

Elements of Value

· One min read

When customers evaluate a product or service, they weigh ==perceived value== against ==actual price==.

Here are 30 "elements of value."

Elements of Value

  1. Functional Value

    • Time savings
    • Simplified processes
    • Revenue generation
    • Risk reduction
    • Organization
    • Integration
    • Connectivity
    • Reduced effort
    • Trouble avoidance
    • Cost reduction
    • Quality
    • Variety
    • Sensory appeal (e.g., food and beverages)
    • Alerts
  2. Emotional Value

    • Anxiety reduction
    • Reciprocity
    • Nostalgia
    • Design/Aesthetics
    • Badge value
    • Health
    • Therapeutic value
    • Pleasure/Entertainment
    • Attractiveness
    • Providing methods
  3. Life-Changing

    • Providing hope
    • Self-actualization
    • Motivation
    • Heirloom
    • Connection/Affiliation
  4. Social Impact

    • Self-transcendence

4 Kinds of No-SQL

· 3 min read

In a regular Internet service, the read:write ratio is about 100:1 to 1000:1. However, when reading from a hard disk, a database join operation is time-consuming, and 99% of the time is spent on disk seek. Not to mention a distributed join operation across networks.

To optimize the read performance, denormalization is introduced by adding redundant data or by grouping data. These four categories of NoSQL are here to help.

Key-value Store

The abstraction of a KV store is a giant hashtable/hashmap/dictionary.

The main reason we want to use a key-value cache is to reduce latency for accessing active data. Achieve an O(1) read/write performance on a fast and expensive media (like memory or SSD), instead of a traditional O(logn) read/write on a slow and cheap media (typically hard drive).

There are three major factors to consider when we design the cache.

  1. Pattern: How to cache? is it read-through/write-through/write-around/write-back/cache-aside?
  2. Placement: Where to place the cache? client-side/distinct layer/server side?
  3. Replacement: When to expire/replace the data? LRU/LFU/ARC?

Out-of-box choices: Redis/Memcache? Redis supports data persistence while Memcache does not. Riak, Berkeley DB, HamsterDB, Amazon Dynamo, Project Voldemort, etc.

Document Store

The abstraction of a document store is like a KV store, but documents, like XML, JSON, BSON, and so on, are stored in the value part of the pair.

The main reason we want to use a document store is for flexibility and performance. Flexibility is achieved by the schemaless document, and performance is improved by breaking 3NF. Startup's business requirements are changing from time to time. Flexible schema empowers them to move fast.

Out-of-box choices: MongoDB, CouchDB, Terrastore, OrientDB, RavenDB, etc.

Column-oriented Store

The abstraction of a column-oriented store is like a giant nested map: ColumnFamily<RowKey, Columns<Name, Value, Timestamp>>.

The main reason we want to use a column-oriented store is that it is distributed, highly-available, and optimized for write.

Out-of-box choices: Cassandra, HBase, Hypertable, Amazon SimpleDB, etc.

Graph Database

As the name indicates, this database's abstraction is a graph. It allows us to store entities and the relationships between them.

If we use a relational database to store the graph, adding/removing relationships may involve schema changes and data movement, which is not the case when using a graph database. On the other hand, when we create tables in a relational database for the graph, we model based on the traversal we want; if the traversal changes, the data will have to change.

Out-of-box choices: Neo4J, Infinitegraph, OrientDB, FlockDB, etc.

Why is buyer persona important?

· 2 min read

KYC is not easy

==How often do you have an opportunity to listen to your customers describe their problems?== The answer is probably NEVER if you are an employee positioned in a non-marketing department of a large company.

If you do have the chance, two things lie at the core of the ==buyer persona concept,==

  1. asking probing questions
  2. listening

==KYC (knowing your customer)== is not easy. e.g. iPhone 3G was not selling well in Japan in 2008. Japanese customers were accustomed to using phones to shoot videos / pay with debit card chips / train pass chips.

A buyer's profile is good but not enough

A generic buyer profile cannot make marketer understand exactly what determine’s the buyer’s buying decision. Marketers are just guessing based on demographics (age, income, marital status, education) or psychographics (personality, values, lifestyles, opinions).

The buyer profile can still give some obvious answers though. e.g. reaching CFO via an email campaign is so difficult. Emphasizing the spaciousness of the car’s cargo for a large dog is not useful for a busy woman that only raises goldfishes.

Rather than guessing, the most effective way to build buyer personas is to interview buyers who have previously weighed their options, considered or rejected solutions and made a decision similar to the one you want to influence.

Buyer persona = buyer profile (who will buy) + buyer insights (when/how/why to buy)

What are CAC, LTV, and PBP in Marketing?

· One min read
  • CAC (Customer Acquisition Cost): Customer Acquisition Cost refers to the cost of getting customers to purchase a product or service.
  • LTV (Customer Lifetime Value): Customer Lifetime Value is the net profit we can obtain from a customer.
  • PBP (Payback Period): The payback period for capital investments refers to the time required to recover the investment cost or reach the break-even point. An ideal payback period is about one year.

LTV:CAC Ratio

The LTV:CAC ratio helps you determine how much you should spend to acquire a customer for sustainable growth.

  1. 1:1 = The more you sell, the more you lose.
  2. 3:1 or higher = Good.
  3. 5:1 or higher = Insufficient marketing investment.

Nonviolent Communication (NVC)

· One min read

Why Nonviolent Communication?

To improve communication quality by ==valuing everyone's needs==. ==Judgments and violence are tragic expressions of unmet needs.==

What NVC is not

  • NOT about being nice.
  • NOT about making them to do what we want. It's about mutual understanding.

Ways to enhance connection & understanding:

  1. vulnerably express our feelings & needs
    • consciousness of the ongoing feelings & needs
    • vulnerability of exposing feelings & needs
  2. emphatically listen to the feelings & needs of the other.
    • Qualities of empathic listening: presence, focus, space, caring, ==verbal reflection of feelings & needs==
    • NOT advising, fixing, consoling, story-telling, sympathizing, analyzing, explaining, …
    • No matter what is said, hear only feelings, needs, observations & requests

e.g. ==Are you feeling … because you need …?==

Ways to alienate us from one another

  • Diagnoses, judgments, labels, analysis, criticism, comparisons, etc.
  • Deserve thinking (i.e. that certain behaviors merit punishment or rewards)
    • Demands (denial of other person’s choice; intention to punish those who don’t do it)
    • Denial of choice or responsibility (had to, should, supposed to, they made me do it, etc.)

Bloom Filter

· One min read

A Bloom filter is a data structure used to detect whether an element is in a set in a time and space efficient way.

False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" bloom filter); the more elements that are added to the set, the larger the probability of false positives.

Usecases

  • Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
  • An HBase Bloom Filter is an efficient mechanism to test whether a StoreFile contains a specific row or row-col cell.
  • A website's anti-fraud system can use bloom filters to reject banned users effectively.
  • The Google Chrome web browser used to use a Bloom filter to identify malicious URLs.

Skiplist

· One min read

A skip-list is essentially a linked list that allows you to binary search on it. The way it accomplishes this is by adding extra nodes that will enable you to 'skip' sections of the linked-list. Given a random coin toss to create the extra nodes, the skip list should have O(logn) searches, inserts and deletes.

Usecases

  • LevelDB MemTable
  • Redis SortedSet
  • Lucene inverted index

Skip List

· One min read

A skip list is essentially a linked list that allows for binary search. It achieves this by adding extra nodes that enable you to "skip" parts of the linked list. Given a random number generator to create these extra nodes, a skip list has O(log n) complexity for search, insert, and delete operations.

Use Cases

  • LevelDB MemTable
  • Redis Sorted Set
  • Lucene Inverted Index

Bloom Filter

· One min read

A Bloom filter is a data structure that is used to determine whether an element is a member of a set with a much higher space and time efficiency than other general algorithms.

The results obtained using a Bloom filter may yield false positive matches, but cannot yield false negative matches. In other words, the query returns results that are "either possibly present or definitely not present." Elements can be added to the set, but cannot be removed (although this can be addressed with an additional "counting" Bloom filter); the more elements added to the set, the greater the likelihood of false positives.

Use Cases

  • Cassandra uses Bloom filters to determine if an SSTable contains data for a specific row.
  • HBase Bloom filters are an effective mechanism for testing whether a StoreFile contains a specific row or row-column cell.
  • With Bloom filters, a website's anti-cheat system can effectively deny access to banned users.
  • Google's Chrome browser once used Bloom filters to identify malicious links.