Skip to main content

18 posts tagged with "ai"

View all tags

Open-Source Foundation Models

· 2 min read
  • Skyrocketing Capabilities: Rapid advancements in LLMs since 2018.
  • Declining Access: Shift from open paper, code, and weights to API-only models, limiting experimentation and research.

Why Access Matters

  • Access drives innovation:
    • 1990s: Digital text enabled statistical NLP.
    • 2010s: GPUs and crowdsourcing fueled deep learning and large datasets.
  • Levels of access define research opportunities:
    • API: Like a cognitive scientist, measure behavior (prompt-response systems).
    • Open-Weight: Like a neuroscientist, probe internal activations for interpretability and fine-tuning.
    • Open-Source: Like a computer scientist, control and question every part of the system.

Levels of Access for Foundation Models

  1. API Access

    • Acts as a universal function (e.g., summarize, verify, generate).
    • Enables problem-solving agents (e.g., cybersecurity tools, social simulations).
    • Challenges: Deprecation and limited reproducibility.
  2. Open-Weight Access

    • Enables interpretability, distillation, fine-tuning, and reproducibility.
    • Prominent models: Llama, Mistral.
    • Challenges:
      • Testing model independence and functional changes from weight modifications.
      • Blueprint constraints of pre-existing models.
  3. Open-Source Access

    • Embodies creativity, transparency, and collaboration.
    • Examples: GPT-J, GPT-NeoX, StarCoder.
    • Performance gap persists compared to closed models due to compute and data limitations.

Key Challenges and Opportunities

  • Open-Source Barriers:
    • Legal restrictions on releasing web-derived training data.
    • Significant compute requirements for retraining.
  • Scaling Compute:
    • Pooling idle GPUs.
    • Crowdsourced efforts like Big Science.
  • Emergent Research Questions:
    • How do architecture and data shape behavior?
    • Can scaling laws predict performance at larger scales?

Reflections

  • Most research occurs within API and fixed-weight confines, limiting exploration.
  • Open-weight models offer immense value for interpretability and experimentation.
  • Open-source efforts require collective funding and infrastructure support.

Final Takeaway

Access shapes the trajectory of innovation in foundation models. To unlock their full potential, researchers must question data, architectures, and algorithms while exploring new models of collaboration and resource pooling.

Unifying Neural and Symbolic Decision Making

· 2 min read

Key Challenges with LLMs

  • Difficulty with tasks requiring complex planning (e.g., travel itineraries, meeting schedules).
  • Performance declines with increasing task complexity (e.g., more cities, people, or constraints).

Three Proposed Solutions

  1. Scaling Law
    • Increase data, compute, and model size.
    • Limitation: High costs and diminishing returns for reasoning/planning tasks.
  2. Hybrid Systems
    • Combine deep learning models with symbolic solvers. Symbolic reasoning refers to the process of solving problems and making decisions using explicit symbols, rules, and logic. It is a method where reasoning is based on clearly defined relationships and representations, often following formal logic or mathematical principles.
    • Approaches:
      • End-to-End Integration: Unified deep model and symbolic system.
      • Data Augmentation: Neural models provide structured data for solvers.
      • Tool Use: LLMs act as interfaces for external solvers.
    • Notable Examples:
      • MILP Solvers: For travel planning with constraints.
      • Searchformer: Transformers trained to emulate A* search.
      • DualFormer: Switches dynamically between fast (heuristic) and slow (deliberative) reasoning.
      • SurCo: Combines combinatorial optimization with latent space representations.
  3. Emerging Symbolic Structures
    • Exploration of symbolic reasoning emerging in neural networks.
    • Findings:
      • Neural networks exhibit Fourier-like patterns in arithmetic tasks.
      • Gradient descent produces solutions aligned with algebraic constructs.
      • Emergent ring homomorphisms and symbolic efficiency in complex tasks.

Research Implications

  • Neural networks naturally learn symbolic abstractions, offering potential for improved reasoning.
  • Hybrid systems might represent the optimal balance between adaptability (neural) and precision (symbolic).
  • Advanced algebraic techniques could eventually replace gradient descent.

Overall Takeaway

The future of decision-making AI lies in leveraging both neural adaptability and symbolic rigor. Hybrid approaches appear most promising for solving tasks requiring both perception and structured reasoning.

Enterprise Workflow Agents

· 3 min read

Key Themes and Context

Enterprise Workflows

  • Automation levels range from scripted workflows (minimal variation) to agentic workflows (adaptive and dynamic).
  • Enterprise environments, such as those supported by ServiceNow, involve complex, repetitive tasks like IT management, CRM updates, and scheduling.
  • The adoption of LLM-powered agents (e.g., API agents and Web agents) transforms these workflows by leveraging capabilities like multimodal observations and dynamic actions.

LLM Agents for Enterprise Workflows

  • API Agents
    • Utilize structured API calls for efficiency.
    • Pros: Low latency, structured inputs.
    • Cons: Depend on predefined APIs, limited adaptability.
  • Web Agents
    • Simulate human actions on web interfaces.
    • Pros: Greater flexibility; can interact with dynamic UIs.
    • Cons: High latency, error-prone.

WorkArena Framework

  • Benchmarks designed for realistic enterprise workflows.
  • Tasks range from IT inventory management to budget allocation and employee offboarding.
  • Supported by BrowserGym and AgentLab for testing and evaluation in simulated environments.

Technical Frameworks

Agent Architectures

  • TapeAgents Framework

    • Represents agents as resumable modular state machines.
    • Features structured logs (the "tape") for actions, thoughts, and outcomes.
    • Facilitates optimization (e.g., fine-tuning from teacher-to-student agents).
  • WorkArena++

    • Extends WorkArena with more compositional and challenging tasks.
    • Evaluates agents on capabilities like long-term planning and multimodal data integration.

Benchmarks

  • WorkArena: ~20k unique enterprise task instances.
  • WorkArena++: Focused on compositional workflows and data-driven reasoning.
  • Other tools: MiniWoB, WebLINX, VisualWebArena.

Evaluation Metrics

  • GREADTH (Grounded, Responsive, Accurate, Disciplined, Transparent, Helpful):
    • Prioritizes real-world agent performance metrics.
  • Task-Specific Success Rates:
    • For example, form-filling assistants evaluated at 300x lower cost than GPT-4 through fine-tuned students.

Challenges for Agents in Workflows

  • Context Understanding
    • Enterprise tasks require understanding deep hierarchies of information (e.g., dashboards, KBs).
    • Sparse rewards in benchmarks complicate learning.
  • Long-Term Planning
    • Subgoal decomposition and multi-step task execution remain difficult.
  • Safety and Alignment
    • Risks from malicious inputs (e.g., adversarial prompts, hidden text).
  • Cost and Efficiency
    • Shrinking context windows and modular architectures are key to reducing compute costs.

Future Directions

Augmentation Models

  • Centaur Framework:
    • Separates AI from human tasks (e.g., content gathering by AI, final editing by humans).
  • Cyborg Framework:
    • Promotes tight collaboration between AI and humans.

Unified Evaluation

  • Calls for a meta-benchmark to consolidate evaluation protocols across platforms (e.g., WebLINX, WorkArena).

Advancements in Agent Optimization

  • Leveraging RL-inspired techniques for fine-tuning.
  • Modular learning frameworks to improve generalizability.

Opportunities in Knowledge Work

  • Automation of repetitive, low-value tasks (e.g., scheduling, report generation).
  • Integration of multimodal agents into enterprise environments to support decision-making and strategic tasks.
  • Enhanced productivity through human-AI collaboration models.

This synthesis connects the theoretical and practical elements of enterprise workflow agents, showcasing their transformative potential while addressing current limitations.

Agentic AI Frameworks

· 2 min read

Introduction

  • Two kinds of AI applications:

    • Generative AI: Creates content like text and images.
    • Agentic AI: Performs complex tasks autonomously. This is the future.
  • Key Question: How can developers make these systems easier to build?

Agentic AI Frameworks

  • Examples:

    • Applications include personal assistants, autonomous robots, gaming agents, web/software agents, science, healthcare, and supply chains.
  • Core Benefits:

    • User-Friendly: Natural and intuitive interactions with minimal input.
    • High Capability: Handles complex tasks efficiently.
    • Programmability: Modular and maintainable, encouraging experimentation.
  • Design Principles:

    • Unified abstractions integrating models, tools, and human interaction.
    • Support for dynamic workflows, collaboration, and automation.

AutoGen Framework

https://github.com/microsoft/autogen

  • Purpose: A framework for building agentic AI applications.

  • Key Features:

    • Conversable and Customizable Agents: Simplifies building applications with natural language interactions.
    • Nested Chat: Handles complex workflows like content creation and reasoning-intensive tasks.
    • Group Chat: Supports collaborative task-solving with multiple agents.
  • History:

    • Started in FLAML (2022), became standalone (2023), with over 200K monthly downloads and widespread adoption.

Applications and Examples

  • Advanced Reflection:
    • Two-agent systems for collaborative refinement of tasks like blog writing.
  • Gaming and Strategy:
    • Conversational Chess, where agents simulate strategic reasoning.
  • Enterprise and Research:
    • Applications in supply chains, healthcare, and scientific discovery, such as ChemCrow for discovering novel compounds.

Core Components of AutoGen

  • Agentic Programming:
    • Divides tasks into manageable steps for easier scaling and validation.
  • Multi-Agent Orchestration:
    • Supports dynamic workflows with centralized or decentralized setups.
  • Agentic Design Patterns:
    • Covers reasoning, planning, tool integration, and memory management.

Challenges in Agent Design

  • System Design:
    • Optimizing multi-agent systems for reasoning, planning, and diverse applications.
  • Performance:
    • Balancing quality, cost, and scalability while maintaining resilience.
  • Human-AI Collaboration:
    • Designing systems for safe, effective human interaction.

Open Questions and Future Directions

  • Multi-Agent Topologies:
    • Efficiently balancing centralized and decentralized systems.
  • Teaching and Optimization:
    • Enabling agents to learn autonomously using tools like AgentOptimizer.
  • Expanding Applications:
    • Exploring new domains such as software engineering and cross-modal systems.

History and Future of LLM Agents

· 2 min read

Trajectory and potential of LLM agents

Introduction

  • Definition of Agents: Intelligent systems interacting with environments (physical, digital, or human).
  • Evolution: From symbolic AI agents like ELIZA(1966) to modern LLM-based reasoning agents.

Core Concepts

  1. Agent Types:
    • Text Agents: Rule-based systems like ELIZA(1966), limited in scope.
    • LLM Agents: Utilize large language models for versatile text-based interaction.
    • Reasoning Agents: Combine reasoning and acting, enabling decision-making across domains.
  2. Agent Goals:
    • Perform tasks like question answering (QA), game-solving, or real-world automation.
    • Balance reasoning (internal actions) and acting (external feedback).

Key Developments in LLM Agents

  1. Reasoning Approaches:
    • Chain-of-Thought (CoT): Step-by-step reasoning to improve accuracy.
    • ReAct Paradigm: Integrates reasoning with actions for systematic exploration and feedback.
  2. Technological Milestones:
    • Zero-shot and Few-shot Learning: Achieving generality with minimal examples.
    • Memory Integration: Combining short-term (context-based) and long-term memory for persistent learning.
  3. Tools and Applications:
    • Code Augmentation: Enhancing computational reasoning through programmatic methods.
    • Retrieval-Augmented Generation (RAG): Leveraging external knowledge sources like APIs or search engines.
    • Complex Task Automation: Embodied reasoning in robotics and chemistry, exemplified by ChemCrow.

Limitations

  • Practical Challenges:
    • Difficulty in handling real-world environments (e.g., decision-making with incomplete data).
    • Vulnerability to irrelevant or adversarial context.
  • Scalability Issues:
    • Real-world robotics vs. digital simulation trade-offs.
    • High costs of fine-tuning and data collection in specific domains.

Research Directions

  • Unified Solutions: Simplifying diverse tasks into generalizable frameworks (e.g., ReAct for exploration and decision-making).
  • Advanced Memory Architectures: Moving from append-only logs to adaptive, writeable long-term memory systems.
  • Collaboration with Humans: Focusing on augmenting human creativity and problem-solving capabilities.

Future Outlook

  • Emerging Benchmarks:
    • SWE-Bench for software engineering tasks.
    • FireAct for fine-tuning LLM agents in dynamic environments.
  • Broader Impacts:
    • Enhanced digital automation.
    • Scalable solutions for complex problem-solving in domains like software engineering, scientific discovery, and web automation.

Building an AI-Native Publishing System: The Evolution of TianPan.co

· 3 min read

The story of TianPan.co mirrors the evolution of web publishing itself - from simple HTML pages to today's AI-augmented content platforms. As we launch version 3, I want to share how we're reimagining what a modern publishing platform can be in the age of AI.

AI-Native Publishing

The Journey: From WordPress to AI-Native

Like many technical blogs, TianPan.co started humbly in 2009 as a WordPress site on a free VPS. The early days were simple: write, publish, repeat. But as technology evolved, so did our needs. Version 1 moved to Octopress and GitHub, embracing the developer-friendly approach of treating content as code. Version 2 brought modern web technologies with GraphQL, server-side rendering, and a React Native mobile app.

But the landscape has changed dramatically. AI isn't just a buzzword - it's transforming how we create, organize, and share knowledge. This realization led to Version 3, built around a radical idea: what if we designed a publishing system with AI at its core, not just as an add-on?

The Architecture of an AI-Native Platform

Version 3 breaks from traditional blogging platforms in several fundamental ways:

  1. Content as Data: Every piece of content is stored as markdown, making it instantly processable by AI systems. This isn't just about machine readability - it's about enabling AI to become an active participant in the content lifecycle.

  2. Distributed Publishing, Centralized Management: Content flows automatically from our central repository to multiple channels - Telegram, Discord, Twitter, and more. But unlike traditional multi-channel publishing, AI helps maintain consistency and optimize for each platform.

  3. Infrastructure Evolution: We moved from a basic 1 CPU/1GB RAM setup to a more robust infrastructure, not just for reliability but to support AI-powered features like real-time content analysis and automated editing.

The technical architecture reflects this AI-first approach:

.
├── _inbox # AI-monitored draft space
├── notes # published English notes
├── notes-zh # published Chinese notes
├── crm # personal CRM
├── ledger # my beancount.io ledger
├── packages
│ ├── chat-tianpan # LlamaIndex-powered content interface
│ ├── website # tianpan.co source code
│ ├── prompts # AI system prompts
│ └── scripts # AI processing pipeline

Beyond Publishing: An Integrated Knowledge System

What makes Version 3 unique is how it integrates multiple knowledge streams:

  • Personal CRM: Relationship management through AI-enhanced note-taking
  • Financial Tracking: Integrated ledger system via beancount.io
  • Multilingual Support: Automated translation and localization
  • Interactive Learning: AI-powered chat interface for deep diving into content

The workflow is equally transformative:

  1. Content creation starts in markdown
  2. CI/CD pipelines trigger AI processing
  3. Zapier integrations distribute across platforms
  4. AI editors continuously suggest improvements through GitHub issues

Looking Forward: The Future of Technical Publishing

This isn't just about building a better blog - it's about reimagining how we share technical knowledge in an AI-augmented world. The system is designed to evolve, with each component serving as a playground for experimenting with new AI capabilities.

What excites me most isn't just the technical architecture, but the possibilities it opens up. Could AI help surface connections between seemingly unrelated technical concepts? Could it help make complex technical content more accessible to broader audiences? Will it be possible to easily produce multimedia content in the future?

These are the questions we're exploring with TianPan.co v3. It's an experiment in using AI not just as a tool, but as a collaborative partner in creating and sharing knowledge.