Requirement Analysis
- A distributed blockchain accounting and smart contract system
- Nodes have minimal trust in each other but need to be incentivized to cooperate
- Transactions are irreversible
- Do not rely on trusted third parties
- Protect privacy, disclose minimal information
- Do not rely on centralized authority to prove that money cannot be spent twice
- Assuming performance is not an issue, we will not consider how to optimize performance
Architecture Design
Specific Modules and Their Interactions
Base Layer (P2P Network, Cryptographic Algorithms, Storage)
P2P Network
There are two ways to implement distributed systems:
- Centralized lead/follower distributed systems, such as Hadoop and Zookeeper, which have a simpler structure but high requirements for the lead
- Decentralized peer-to-peer (P2P) network distributed systems, such as those organized by Chord, CAN, Pastry, and Tapestry algorithms, which have a more complex structure but are more egalitarian
Given the premise that nodes have minimal trust in each other, we choose the P2P form. How do we specifically organize the P2P network? A typical decentralized node and network maintain connections as follows:
- Based on the IP protocol, nodes come online occupying a certain address hostname/port, broadcasting their address using an initialized node list, and trying to flood their information across the network using these initial hops.
- The initial nodes receiving the broadcast save this neighbor and help with flooding; non-adjacent nodes, upon receiving it, use NAT to traverse walls and add neighbors.
- Nodes engage in anti-entropy by randomly sending heartbeat messages containing the latest information similar to vector clocks, ensuring they can continuously update each other with their latest information.
We can use existing libraries, such as libp2p
, to implement the network module. For the choice of network protocols, see Crack the System Design Interview: Communication.
Cryptographic Algorithms
In a distributed system with minimal trust, how can a transfer be proven to be initiated by oneself without leaking secret information? Asymmetric encryption: a pair of public and private keys corresponds to "ownership." Bitcoin chooses the secp256k1 parameters of the ECDSA elliptic curve cryptographic algorithm, and for compatibility, other chains also generally choose the same algorithm.
Why not directly use the public key as the address for the transfer? Privacy concerns; the transaction process should disclose as little information as possible. Using the hash of the public key as the "address" can prevent the recipient from leaking the public key. Furthermore, people should avoid reusing the same address.
Regarding account ledgers, there are two implementation methods: UTXO vs. Account/Balance
- UTXO (unspent transaction output), such as Bitcoin, resembles double-entry bookkeeping with credits and debits. Each transaction has inputs and outputs, but every input is linked to the previous output except for the coinbase. Although there is no concept of an account, taking all unspent outputs corresponding to an address gives the balance of that address.
- Advantages
- Precision: The structure similar to double-entry bookkeeping allows for very accurate recording of all asset flows.
- Privacy protection and resistance to quantum attacks: If users frequently change addresses.
- Stateless: Leaves room for improving concurrency.
- Avoids replay attacks: Because replaying will not find the corresponding UTXO for the input.
- Disadvantages
- Records all transactions, complex, consumes storage space.
- Traversing UTXOs takes time.
- Account/Balance, such as Ethereum, has three main maps: account map, transaction map, transaction receipts map. Specifically, to reduce space and prevent tampering, it uses a Merkle Patricia Trie (MPT).
- Advantages
- Space-efficient: Unlike UTXO, a transaction connects multiple UTXOs.
- Simplicity: Complexity is offloaded to the script.
- Disadvantages
- Requires using nonce to solve replay issues since there is no dependency between transactions.
It is worth mentioning that the "block + chain" data structure is essentially an append-only Merkle tree, also known as a hash tree.
Storage
Since UTXO or MPT structures serve as indexes, and to simplify operations for each node in a distributed environment, data persistence typically favors in-process databases that can run directly with the node's program, such as LevelDB or RocksDB.
Because these indexes are not universal, you cannot query them like an SQL database, which raises the barrier for data analysis. Optimizations require a dedicated indexing service, such as Etherscan.
Protocol Layer
Now that we have a functional base layer, we need a more general protocol layer for logical operations above this layer. Depending on the blockchain's usage requirements, specific logical processing modules can be plugged in and out like a microkernel architecture.
For instance, the most common accounting: upon receiving some transactions at the latest block height, organize them to establish the data structure as mentioned in the previous layer.
Writing a native module for each business logic and updating all nodes' code is not very realistic. Can we decouple this layer using virtualization? The answer is a virtual machine capable of executing smart contract code. In a non-trusting environment, we cannot allow clients to execute code for free, so the most unique feature of this virtual machine may be billing.
The difference between contract-based tokens, such as ERC20, and native tokens leads to complications when dealing with different tokens, resulting in the emergence of Wrapped Ether tokens.
Consensus Layer
After the protocol layer computes the execution results, how do we reach consensus with other nodes? There are several common mechanisms to incentivize cooperation:
- Proof of Work (POW): Mining tokens through hash collisions, which is energy-intensive and not environmentally friendly.
- Proof of Stake (POS): Mining tokens using staked tokens.
- Delegated Proof-of-Stake (DPOS): Electing representatives to mine tokens using staked tokens.
Based on the incentive mechanism, the longest chain among nodes is followed; if two groups dislike each other, a fork occurs.
Additionally, there are consensus protocols that help everyone reach agreement (i.e., everyone either does something together or does nothing together):
- 2PC: Everyone relies on a coordinator: the coordinator asks everyone: should we proceed? If anyone replies no, the coordinator tells everyone "no"; otherwise, everyone proceeds. This dependency can lead to issues if the coordinator fails in the middle of the second phase, leaving some nodes unsure of what to do with the block, requiring manual intervention to restart the coordinator.
- 3PC: To solve the above problem, an additional phase is added to ensure everyone knows whether to proceed before doing so; if an error occurs, a new coordinator is selected.
- Paxos: The above 2PC and 3PC both rely on a coordinator; how can we eliminate this coordinator? By using "the majority (at least f+1 in 2f + 1)" to replace it, as long as the majority agrees in two steps, consensus can be achieved.
- PBFT (deterministic 3-step protocol): The fault tolerance of the above methods is still not high enough, leading to the development of PBFT. This algorithm ensures that the majority (2/3) of nodes either all agree or all disagree, implemented through three rounds of voting, with at least a majority (2/3) agreeing in each round before committing the block in the final round.
In practical applications, relational databases mostly use 2PC or 3PC; variants of Paxos include implementations in Zookeeper, Google Chubby distributed locks, and Spanner; in blockchain, Bitcoin and Ethereum use POW, while the new Ethereum uses POS, and IoTeX and EOS use DPOS.
API Layer
See Public API choices