Skip to main content

10 posts tagged with "architecture"

View All Tags

Designing Online Judge or Leetcode

· 3 min read

Requirements

An online judge is primarily a place where you can execute code remotely for educational or recruitment purposes. In this design, we focus on designing an OJ for interview preparation like Leetcode, with the following requirements:

  • It should have core OJ functionalities like fetching a problem, submitting the solution, compiling if needed, and executing it.
  • It should be highly available with an async design, since running code may take time.
  • It should be horizontal scalable or, saying, easy to scale-out.
  • It should be robust and secure to execute untrusted source code.

Architecture

The architecture below is featured on queueing for async execution and sandboxing for secure execution. And each component is separately deployable and scalable.

designing online judge system

Components

Presentation Layer

The user agent is usually a web or mobile app like coderoma.com. It displays the problem description and provides the user with a code editor to write and submit code.

When the user submits the code, the client will get a token since it is an async call. Then the client polls the server for the submission status.

API

Please see Public API Choices for the protocols we can choose from. And let’s design the interface itself here and GraphQL for example:

type Query {
problems(id: String): [Problem]
languageSetup(id: String!, languageId: LanguageId!): LanguageSetup
submission(token: String!) Submission
}

type Mutation {
createSubmission(
problemId: String!
code: String!
languageId: LanguageId!
): CreatedSubmission!
}

enum LanguageId {
JAVA
JS
ELIXIR
# ...
}


type Problem {
id: String!
title: String!
description: String!
supportedLanguages: [Float!]!
}

type LanguageSetup {
languageId: LanguageId!
template: String!
solutions: [String!]!
}

type Status {
id: Float!
description: String!
}

type Submission {
compileOutput: String
memory: Float
message: String
status: Status
stderr: String
stdout: String
time: String
token: String
}

type CreatedSubmission {
token: String!
}

The API layer records the submission in the database, publishes it into the queue, and returns a token for the client's future reference.

Code Execution Engine

Code execution engine (CEE) polls the queue for the code, uses a sandbox to compile and run the code and parses the metadata from the compilation and execution.

The sandbox could be LXC containers, Docker, virtual machines, etc. We can choose Docker for its ease of deployment.

Coderoma.com

I am recently learning Elixir and creating an online judge coderoma.com for my daily practice. It now supports Elixir and JavaScript. And I am adding more languages (like Java) and problems to it.

We may host future events to improve your coding skills. Join us at https://t.me/coderoma for the English community or use your WeChat to scan the following QR onetptp and reply 刷题 for the Chinese community.

onetptp

Lyft's Marketing Automation Platform Symphony

· 3 min read

Customer Acquisition Efficiency Issue: How can advertising campaigns achieve higher returns with less money and fewer people?

Specifically, Lyft's advertising campaigns need to address the following characteristics:

  1. Manage location-based campaigns
  2. Data-driven growth: growth must be scalable, measurable, and predictable
  3. Support Lyft's unique growth model, as shown below:

lyft growth model

The main challenge is the difficulty of scaling management across various aspects of regional marketing, including ad bidding, budgeting, creative assets, incentives, audience selection, testing, and more. The following image depicts a day in the life of a marketer:

A Day in the Life of a Marketer

We can see that "execution" takes up most of the time, while less time is spent on the more important tasks of "analysis and decision-making." Scaling means reducing complex operations and allowing marketers to focus on analysis and decision-making.

Solution: Automation

To reduce costs and improve the efficiency of experimentation, it is necessary to:

  1. Predict whether new users are interested in the product
  2. Optimize across multiple channels and effectively evaluate and allocate budgets
  3. Conveniently manage thousands of campaigns

Data is enhanced through Lyft's Amundsen system using reinforcement learning.

The automation components include:

  1. Updating bid keywords
  2. Disabling underperforming creative assets
  3. Adjusting referral values based on market changes
  4. Identifying high-value user segments
  5. Sharing strategies across multiple campaigns

Architecture

Lyft Symphony Architecture

Technology stack: Apache Hive, Presto, ML platform, Airflow, 3rd-party APIs, UI.

Specific Component Modules

LTV Prediction Module

The lifetime value (LTV) of users is an important metric for evaluating channels, and the budget is determined by both LTV and the price we are willing to pay for customer acquisition in that region.

Our understanding of new users is limited, but as interactions increase, the historical data provided will more accurately predict outcomes.

Initial feature values:

Feature Values

As historical interaction records accumulate, the predictions become more accurate:

Predicting LTV Based on Historical Records

Budget Allocation Module

Once LTV is established, the next step is to set the budget based on pricing. A curve of the form LTV = a * (spend)^b is fitted, along with similar parameter curves in the surrounding range. Achieving a global optimum requires some randomness.

Budget Calculation

Delivery Module

This module is divided into two parts: the parameter tuner and the executor. The tuner sets specific parameters based on pricing for each channel, while the executor applies these parameters to the respective channels.

There are many popular delivery strategies that are common across various channels:

Delivery Strategies

Conclusion

It is essential to recognize the importance of human experience within the system; otherwise, it results in garbage in, garbage out. When people are liberated from tedious delivery tasks and can focus on understanding users, channels, and the messages they need to convey to their audience, they can achieve better campaign results—spending less time to achieve higher ROI.

Introduction to Architecture

· 3 min read

What is Architecture?

Architecture is the shape of a software system. To illustrate with a building:

  • Paradigm is the bricks.
  • Design principles are the rooms.
  • Components are the structure.

They all serve a specific purpose, just like hospitals treat patients and schools educate students.

Why Do We Need Architecture?

Behavior vs. Structure

Every software system provides two distinct values to stakeholders: behavior and structure. Software developers must ensure that both values are high.

==Due to the nature of their work, software architects focus more on the structure of the system rather than its features and functions.==

Ultimate Goal — ==Reduce the human resource costs required for adding new features==

Architecture serves the entire lifecycle of software systems, making them easy to understand, develop, test, deploy, and operate. Its goal is to minimize the human resource costs for each business use case.

O'Reilly's "Software Architecture" provides a great introduction to these five fundamental architectures.

1. Layered Architecture

Layered architecture is widely adopted and well-known among developers. Therefore, it is the de facto standard at the application level. If you are unsure which architecture to use, layered architecture is a good choice.

Examples:

  • TCP/IP model: Application Layer > Transport Layer > Internet Layer > Network Interface Layer
  • Facebook TAO: Network Layer > Cache Layer (follower + leader) > Database Layer

Pros and Cons:

  • Pros
    • Easy to use
    • Clear responsibilities
    • Testability
  • Cons
    • Large and rigid
      • Adjusting, extending, or updating the architecture requires changes across all layers, which can be quite tricky.

2. Event-Driven Architecture

Any change in state triggers an event in the system. Communication between system components is accomplished through events.

A simplified architecture includes a mediator, event queue, and channels. The diagram below illustrates a simplified event-driven architecture:

Examples:

  • QT: Signals and Slots
  • Payment infrastructure: As bank gateways often have high latency, asynchronous techniques are used in banking architecture.

3. Microkernel Architecture (aka Plug-in Architecture)

The functionality of the software is distributed between a core and multiple plugins. The core contains only the most basic functionalities. Each plugin operates independently and implements shared interfaces to achieve different goals.

Examples:

  • Visual Studio Code and Eclipse
  • MINIX operating system

4. Microservices Architecture

Large systems are decomposed into numerous microservices, each a separately deployable unit that communicates via RPCs.

uber architecture

Examples:

5. Space-Based Architecture

The name "Space-Based Architecture" comes from "tuple space," which implies a "distributed shared space." In space-based architecture, there are no databases or synchronized database access, thus avoiding database bottleneck issues. All processing units share copies of application data in memory. These processing units can be flexibly started and stopped.

Example: See Wikipedia

  • Primarily adopted by Java-based architectures: for example, JavaSpaces.

Stream and Batch Processing Frameworks

· 3 min read

Why Do We Need Such Frameworks?

  • To process more data in a shorter amount of time.
  • To unify fault tolerance in distributed systems.
  • To simplify task abstractions to meet changing business requirements.
  • Suitable for bounded datasets (batch processing) and unbounded datasets (stream processing).

Brief History of Batch and Stream Processing Development

  1. Hadoop and MapReduce. Google made batch processing as simple as MapReduce result = pairs.map((pair) => (morePairs)).reduce(somePairs => lessPairs) in a distributed system.
  2. Apache Storm and directed graph topologies. MapReduce does not represent iterative algorithms well. Therefore, Nathan Marz abstracted stream processing into a graph structure composed of spouts and bolts.
  3. Spark in-memory computation. Reynold Xin pointed out that Spark uses ten times fewer machines than Hadoop while being three times faster when processing the same data.
  4. Google Dataflow based on Millwheel and FlumeJava. Google uses a windowed API to support both batch and stream processing simultaneously.
  1. Flink quickly adopted the programming model of ==Google Dataflow== and Apache Beam.
  2. Flink's efficient implementation of the Chandy-Lamport checkpointing algorithm.

These Frameworks

Architecture Choices

To meet the above demands with commercial machines, there are several popular distributed system architectures...

  • Master-slave (centralized): Apache Storm + Zookeeper, Apache Samza + YARN
  • P2P (decentralized): Apache S4

Features

  1. DAG Topology for iterative processing - for example, GraphX in Spark, topologies in Apache Storm, DataStream API in Flink.
  2. Delivery Guarantees. How to ensure the reliability of data delivery between nodes? At least once / at most once / exactly once.
  3. Fault Tolerance. Implement fault tolerance using cold/warm/hot standby, checkpointing, or active-active.
  4. Windowed API for unbounded datasets. For example, streaming windows in Apache. Window functions in Spark. Windowing in Apache Beam.

Comparison Table of Different Architectures

ArchitectureStormStorm-tridentSparkFlink
ModelNativeMicro-batchMicro-batchNative
GuaranteesAt least onceExactly onceExactly onceExactly once
Fault ToleranceRecord AckRecord AckCheckpointCheckpoint
Maximum Fault ToleranceHighMediumMediumLow
LatencyVery lowHighHighLow
ThroughputLowMediumHighHigh

Fraud Detection with Semi-supervised Learning

· 4 min read

Clarify Requirements

Calculate risk probability scores in realtime and make decisions along with a rule engine to prevent ATO (account takeovers) and Botnet attacks.

Train clustering fatures with online and offline pipelines

  1. Source from website logs, auth logs, user actions, transactions, high-risk accounts in watch list
  2. track event data in kakfa topics
  3. Process events and prepare clustering features

Realtime scoring and rule-based decision

  1. assess a risk score comprehensively for online services

  2. Maintain flexibility with manually configuration in a rule engine

  3. share, or use the insights in online services

ATOs ranking from easy to hard to detect

  1. from single IP
  2. from IPs on the same device
  3. from IPs across the world
  4. from 100k IPs
  5. attacks on specific accounts
  6. phishing and malware

Challenges

  • Manual feature selection
  • Feature evolution in adversarial environment
  • Scalability
  • No online DBSCAN

High-level Architecture

Core Components and Workflows

Semi-supervised learning = unlabeled data + small amount of labeled data

Why? better learning accuracy than unsupervised learning + less time and costs than supervised learning

Training: To prepare clustering features in database

  • Streaming Pipeline on Spark:
    • Runs continuously in real-time.
    • Performs feature normalization and categorical transformation on the fly.
      • Feature Normalization: Scale your numeric features (e.g., age, income) so that they are between 0 and 1.
      • Categorical Feature Transformation: Apply one-hot encoding or another transformation to convert categorical features into a numeric format suitable for the machine learning model.
    • Uses Spark MLlib’s K-means to cluster streaming data into groups.
      • After running k-means and forming clusters, you might find that certain clusters have more instances of fraud.
      • Once you’ve labeled a cluster as fraudulent based on historical data or expert knowledge, you can use that cluster assignment during inference. Any new data point assigned to that fraudulent cluster can be flagged as suspicious.
  • Hourly Cronjob Pipeline:
    • Runs periodically every hour (batch processing).
    • Applies thresholding to identify anomalies based on results from the clustering model.
    • Tunes parameters of the DBSCAN algorithm to improve clustering and anomaly detection.
    • Uses DBSCAN from scikit-learn to find clusters and detect outliers in batch data.
      • DBSCAN, which can detect outliers, might identify clusters of regular transactions and separate them from noise, which could be unusual, potentially fraudulent transactions.
      • Transactions in the noisy or outlier regions (points that don’t belong to any dense cluster) can be flagged as suspicious.
      • After identifying a cluster as fraudulent, DBSCAN helps detect patterns of fraud even in irregularly shaped transaction distributions.

Serving

The serving layer is where the rubber meets the road - where we turn our machine learning models and business rules into actual fraud prevention decisions. Here's how it works:

  • Fraud Detection Scoring Service:
    • Takes real-time features extracted from incoming requests
    • Applies both clustering models (K-means from streaming and DBSCAN from batch)
    • Combines scores with streaming counters (like login attempts per IP)
    • Outputs a unified risk score between 0 and 1
  • Rule Engine:
    • Acts as the "brain" of the system
    • Combines ML scores with configurable business rules
    • Examples of rules:
      • If risk score > 0.8 AND user is accessing from new IP → require 2FA
      • If risk score > 0.9 AND account is high-value → block transaction
    • Rules are stored in a database and can be updated without code changes
    • Provides an admin portal for security teams to adjust rules
  • Integration with Other Services:
    • Exposes REST APIs for real-time scoring
    • Publishes results to streaming counters for monitoring
    • Feeds decisions back to the training pipeline to improve model accuracy
  • Observability:
    • Tracks key metrics like false positive/negative rates
    • Monitors model drift and feature distribution changes
    • Provides dashboards for security analysts to investigate patterns
    • Logs detailed information for post-incident analysis

A Closer Look at iOS Architecture Patterns

· 3 min read

Why Should We Care About Architecture?

The answer is: to reduce the human resources spent on each feature.

Mobile developers evaluate the quality of an architecture on three levels:

  1. Whether the responsibilities of different features are evenly distributed
  2. Whether it is easy to test
  3. Whether it is easy to use and maintain
Responsibility DistributionTestabilityUsability
Tight Coupling MVC
Cocoa MVC❌ V and C are coupled✅⭐
MVP✅ Independent view lifecycleAverage: more code
MVVMAverage: View has a dependency on UIKitAverage
VIPER✅⭐️✅⭐️

Tight Coupling MVC

Traditional MVC

For example, in a multi-page web application, when you click a link to navigate to another page, the entire page reloads. The problem with this architecture is that the View is tightly coupled with the Controller and Model.

Cocoa MVC

Cocoa MVC is the architecture recommended by Apple for iOS developers. Theoretically, this architecture allows the Controller to decouple the Model from the View.

Cocoa MVC

However, in practice, Cocoa MVC encourages the use of massive view controllers, ultimately leading to the view controller handling all operations.

Realistic Cocoa MVC

Although testing such tightly coupled massive view controllers is quite difficult, Cocoa MVC performs the best in terms of development speed among the existing options.

MVP

In MVP, the Presenter has no relationship with the lifecycle of the view controller, allowing the view to be easily replaced. We can think of UIViewController as the View.

Variant of MVC

There is another type of MVP: MVP with data binding. As shown in the figure, the View is tightly coupled with the Model and Controller.

MVP

MVVM

MVVM is similar to MVP, but MVVM binds the View to the View Model.

MVVM

VIPER

Unlike the three-layer structure of MV(X), VIPER has a five-layer structure (VIPER View, Interactor, Presenter, Entity, and Routing). This structure allows for good responsibility distribution but has poorer maintainability.

VIPER

Compared to MV(X), VIPER has the following differences:

  1. The logic processing of the Model is transferred to the Interactor, so Entities have no logic and are purely data storage structures.
  2. ==UI-related business logic is handled in the Presenter, while data modification functions are handled in the Interactor==.
  3. VIPER introduces a routing module, Router, to implement inter-module navigation.

iOS Architecture Patterns Revisited

· 2 min read

Why bother with architecture?

Answer: for reducing human resources costs per feature.

Mobile developers evaluate the architecture in three dimensions.

  1. Balanced distribution of responsibilities among feature actors.
  2. Testability
  3. Ease of use and maintainability
Distribution of ResponsibilityTestabilityEase of Use
Tight-coupling MVC
Cocoa MVC❌ VC are coupled✅⭐
MVP✅ Separated View LifecycleFair: more code
MVVMFair: because of View's UIKit dependantFair
VIPER✅⭐️✅⭐️

Tight-coupling MVC

Traditional MVC

For example, in a multi-page web application, page completely reloaded once you press on the link to navigate somewhere else. The problem is that the View is tightly coupled with both Controller and Model.

Cocoa MVC

Apple’s MVC, in theory, decouples View from Model via Controller.

Cocoa MVC

Apple’s MVC in reality encourages ==massive view controllers==. And the view controller ends up doing everything.

Realistic Cocoa MVC

It is hard to test coupled massive view controllers. However, Cocoa MVC is the best architectural pattern regarding the speed of the development.

MVP

In an MVP, Presenter has nothing to do with the life cycle of the view controller, and the View can be mocked easily. We can say the UIViewController is actually the View.

MVC Variant

There is another kind of MVP: the one with data bindings. And as you can see, there is tight coupling between View and the other two.

MVP

MVVM

It is similar to MVP but binding is between View and View Model.

MVVM

VIPER

There are five layers (VIPER View, Interactor, Presenter, Entity, and Routing) instead of three when compared to MV(X). This distributes responsibilities well but the maintainability is bad.

VIPER

When compared to MV(X), VIPER

  1. Model logic is shifted to Interactor and Entities are left as dumb data structures.
  2. ==UI related business logic is placed into Presenter, while the data altering capabilities are placed into Interactor==.
  3. It introduces Router for the navigation responsibility.

Lambda Architecture

· One min read

Why Use Lambda Architecture?

To address the three issues brought by big data:

  1. Accuracy (good)
  2. Latency (fast)
  3. Throughput (high)

For example: The problems of scaling web browsing data records in a traditional way:

  1. First, use a traditional relational database.
  2. Then, add a "publish/subscribe" model queue.
  3. Next, scale through horizontal partitioning or sharding.
  4. Fault tolerance issues begin to arise.
  5. Data corruption phenomena start to appear.

The key issue is that in the AKF Scaling Cube, ==having only the X-axis for horizontal partitioning of one dimension is not enough; we also need to introduce the Y-axis for functional decomposition. The lambda architecture can guide us on how to scale a data system==.

What is Lambda Architecture?

If we define a data system in the following form:

Query=function(all data)

Then a lambda architecture is:

Lambda Architecture

batch view = function(all data at the batching job's execution time)
realtime view = function(realtime view, new data)

query = function(batch view, realtime view)

==Lambda architecture = Read/Write separation (Batch Processing Layer + Service Layer) + Real-time Processing Layer==

Lambda Architecture for big data systems

Thinking Software Architecture as Physical Buildings

· One min read

What is architecture?

Architecture is the shape of the software system. Thinking it as a big picture of physical buildings.

  • paradigms are bricks.
  • design principles are rooms.
  • components are buildings.

Together they serve a specific purpose, like a hospital is for curing patients and a school is for educating students.

Why do we need architecture?

Behavior vs. Structure

Every software system provides two different values to the stakeholders: behavior and structure. Software developers are responsible for ensuring that both those values remain high.

::Software architects are, by virtue of their job description, more focused on the structure of the system than on its features and functions.::

Ultimate Goal - ==saving human resources costs per feature==

Architecture serves the full lifecycle of the software system to make it easy to understand, develop, test, deploy, and operate. The goal is to minimize the human resources costs per business use-case.

Designing very large (JavaScript) applications

· 3 min read

Very Large JS App = a lot of developers + large codebase

How to deal with a lot of developers?

empathy

What is a ==senior engineer==? A team of senior engineers without junior engineers is a team of engineers

  1. being senior means is that I’d be able to solve almost every problem that somebody might throw at me.
  2. make the junior engineers eventually be senior engineers.

what’s the next step of a senior engineer?

  1. senior: “I know how I would solve the problem” and because I know how I would solve it I could also teach someone else to do it.
  2. next level: “I know how others would solve the problem “

good programming model

how people write software, e.g. react/redux, npm. Here comes a model that affects all large JS apps - code splitting.

  1. People have to think what to bundle and when to load
  2. ==route based code splitting==
  3. But, what if it is not enough?
    1. lazy loaded every single component of our website
    2. How does Google do? split them by rendering logic, and by application logic. ==simply server side render a page, and then whatever was actually rendered, triggers downloading the associated application bundles.== Google does not do isomorphic rendering - no double rendering

How to deal with a large codebase?

==Code Removability/Delete-ability==

e.g. CSS is bad in code removability

  1. one big css file. There is this selector in there. Who really knows whether that still matches anything in your app? So, you end up just keeping it there.
  2. People thus created CSS-in-JS

==avoid central configuration of your application at all cost==

  1. Bad example
    1. central routes configuration
    2. central webpack.config.js
  2. Good example
    1. decentralized package.json

avoid central import problem: router imports component A, B, and C

  1. to solve this problem, do ==“enhance” instead of “import”==
  2. However, developers still have to decide when to enhance and when to import. Since this might lead to very bad situations, we make “enhance” illegal, nobody gets to use it–with one exception: generated code.

avoid base bundle pile of trash

  1. e.g. base bundle should never contain UI code
  2. Solve this problem with forbidden dependency tests
  3. ==most straight forward way must be the right way; otherwise add a test that ensures the right way.==

Be careful with abstractions

We have to become good at finding the right abstractions: Empathy and experience -> Right abstractions