Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98



AI Summary

  • Speaker Introduction:
    • Shisher, a fifth-year PhD student at Sky Computing and Berkeley AI Research labs.
    • Focus on LLMs (Large Language Models) and tool use.
    • Previous work on ML systems for inference and training on edge devices.
    • Former research fellow at Microsoft Research.
    • Undergraduate degree from India.
  • Talk Overview:
    • Presenting vision for connecting LLMs to tool use.
    • Demonstrations of the work.
    • Insights and open questions for audience’s projects or research directions.
  • Connecting LLMs to Tool Use:
    • Current LLM usage involves prompting LLM, receiving a response, and user acting on that response.
    • Goal: Flip this process with the model called Gorilla.
      • User prompts Gorilla Gorilla performs action Gorilla gathers feedback Gorilla relays response to user.
    • Humans are good discriminators, LLMs are good generators.
    • Example: Installing software on Linux using LLM to generate bash commands.
  • Gorilla Model:
    • Supports various hyperscalers (AWS, GCP, Azure) and services (Kubernetes, Salesforce, etc.).
    • Project supports 60,000 APIs and growing.
    • Gorilla is robust, open-source, and used in enterprises.
    • Other groups have adapted Gorilla’s ideas for their models.
  • Demonstrations:
    • Command-line interface to list GCP instances.
    • Jupyter notebook for translating text using a specific model.
  • Key Ideas Behind Gorilla:
    • Retrieval-Aware Training (RAT): Fine-tuning model to use or ignore retrieved context.
    • Measuring hallucination using Abstract Syntax Trees (ASTs) to verify if API calls generated by LLMs are valid.
  • Performance and Insights:
    • Gorilla outperforms other models in API calling tasks.
    • Hallucination rates can be measured and compared across models.
    • Fine-tuning converges across different models with similar accuracy.
  • Execution Engine (GoEx):
    • Allows LLMs to perform actions with delayed verification.
    • Provides reversibility guarantees and blast radius controls.
    • Policies and abstractions for safe execution of LLM-generated actions.
  • Conclusion:
    • Gorilla connects LLMs to a wide range of tools via API calls.
    • RAT and hallucination measurement are key to training effective LLMs for tool use.
    • GoEx aims to enable LLMs to act autonomously with safety measures.

For further details, the audience is encouraged to read the associated papers and explore the open-source projects.