AI Engineering at Jane Street - John Crepezzi



AI Summary

Summary of AI Assistant Development at Jane Street

Speaker Introduction

  • Name: John Kzi
  • Team: AI Assistant at Jane Street
  • Background: Extensive experience in development tools, including GitHub.

Overview of the Team’s Focus

  • Maximizing value from large language models (LLMs).
  • Navigating challenges with off-the-shelf tools due to using OCaml as the primary language.

Challenges with OCaml

  • OCaml is obscured, primarily used for theorem proving and formal verification.
  • Development practices include:
    • Using OCaml libraries to transpile code to other languages (JavaScript, Vim script).
  • Building custom tools for development cycles, including:
    • Monorepo management.
    • Custom distributed build and code review systems.

Need for Custom Models

  • LLMs are not generally effective with OCaml due to data availability.
  • Built internal models aligned with OCaml code base specificities.

Approach to Model Development

  1. Define Goals: Generate diffs based on user prompts in editors.
  2. Data Collection: Use workspace snapshotting to collect data on developer actions and errors.
  3. Training Process:
    • Supervised training with labeled data.
    • Reinforcement learning to ensure code quality (compiles and passes tests).

Implementation of AI Development Environment

  • Integrated LLMs into editors (VS Code, Neovim, Emacs) with a unified architecture (AID).
  • Flexibility to update models without altering editors directly.
  • Collect metrics on user experience (latency and diff application).

Editor Integration Examples

  • VS Code: Sidebar for multifile diff suggestions.
  • Emacs: Markdown buffer interface for interaction.

Future Work

  • Expanding applications of RAG (retrieval-augmented generation).
  • Exploring multi-agent workflows and reasoning models.

Conclusion

  • Focus on building modular, pluggable systems to adapt to evolving technology in AI.

Contact

  • Speaker is open to further discussions after the presentation.