Measuring Agent Capabilities and Anthropic’s RSP
 · 2 min read
Anthropic’s History
- Founded: 2021 as a Public Benefit Corporation (PBC).
 - Milestones:
- 2022: Claude 1 completed.
 - 2023: Claude 1 released, Claude 2 launched.
 - 2024: Claude 3 launched.
 - 2025: Advances in interpretability and AI safety:
- Mathematical framework for constitutional AI.
 - Sleeper agents and toy models of superposition.
 
 
 
Responsible Scaling Policy (RSP)
- Definition: A framework to ensure safe scaling of AI capabilities.
 - Goals:
- Provide structure for safety decisions.
 - Ensure public accountability.
 - Iterate on safe decisions.
 - Serve as a template for policymakers.
 
 - AI Safety Levels (ASL): Modeled after biosafety levels (BSL) for handling dangerous biological materials, aligning safety, security, and operational standards with a model’s catastrophic risk potential.
- ASL-1: Smaller Models: No meaningful catastrophic risk (e.g., 2018 LLMs, chess-playing AIs).
 - ASL-2: Present Large Models: Early signs of dangerous capabilities (e.g., instructions for bioweapons with limited reliability).
 - ASL-3: Higher Risk Models: Models with significant catastrophic misuse potential or low-level autonomy.
 - ASL-4 and higher: Speculative Models: Future systems involving qualitative escalations in catastrophic risk or autonomy.
 
 - Implementation:
- Safety challenges and methods.
 - Case study: computer use.
 
 
Measuring Capabilities
- Challenges: Benchmarks become obsolete.
 - Examples:
- Task completion time relative to humans: Claude 3.5 completes tasks in seconds compared to human developers’ 30 minutes.
 - Benchmarks:
- SWE-bench: Assesses real-world software engineering tasks.
 - Aider’s benchmarks: Code editing and refactoring.
 
 
 - Results:
- Claude 3.5 Sonnet outperforms OpenAI o1 across key benchmarks.
 - Faster and cheaper: $3/Mtok input vs. OpenAI o1 at $15/Mtok input.
 
 
Claude 3.5 Sonnet Highlights
- Agentic Coding and Game Development: Designed for efficiency and accuracy in real-world scenarios.
 - Computer Use Demos:
- Coding: Demonstrated advanced code generation and integration.
 - Operations: Showcased operational tasks with safety considerations.
 
 
AI Safety Measures
- Focus Areas:
- Scaling governance.
 - Capability measurement.
 - Collaboration with academia.
 
 - Practical Safety:
- ASL standard implementation.
 - Deployment safeguards.
 - Lessons learned in year one.
 
 
Future Directions
- Scaling and governance improvements.
 - Enhanced benchmarks and academic partnerships.
 - Addressing interpretability and sleeper agent risks.