Open-Source Foundation Models
· 2 min read
Key Trends
- Skyrocketing Capabilities: Rapid advancements in LLMs since 2018.
- Declining Access: Shift from open paper, code, and weights to API-only models, limiting experimentation and research.
Why Access Matters
- Access drives innovation:
- 1990s: Digital text enabled statistical NLP.
- 2010s: GPUs and crowdsourcing fueled deep learning and large datasets.
- Levels of access define research opportunities:
- API: Like a cognitive scientist, measure behavior (prompt-response systems).
- Open-Weight: Like a neuroscientist, probe internal activations for interpretability and fine-tuning.
- Open-Source: Like a computer scientist, control and question every part of the system.
Levels of Access for Foundation Models
-
API Access
- Acts as a universal function (e.g., summarize, verify, generate).
- Enables problem-solving agents (e.g., cybersecurity tools, social simulations).
- Challenges: Deprecation and limited reproducibility.
-
Open-Weight Access
- Enables interpretability, distillation, fine-tuning, and reproducibility.
- Prominent models: Llama, Mistral.
- Challenges:
- Testing model independence and functional changes from weight modifications.
- Blueprint constraints of pre-existing models.
-
Open-Source Access
- Embodies creativity, transparency, and collaboration.
- Examples: GPT-J, GPT-NeoX, StarCoder.
- Performance gap persists compared to closed models due to compute and data limitations.
Key Challenges and Opportunities
- Open-Source Barriers:
- Legal restrictions on releasing web-derived training data.
- Significant compute requirements for retraining.
- Scaling Compute:
- Pooling idle GPUs.
- Crowdsourced efforts like Big Science.
- Emergent Research Questions:
- How do architecture and data shape behavior?
- Can scaling laws predict performance at larger scales?