- **Introduction** - Presenter: Eric from Lang chain - Topic: Demo of Lang chain airite package - Context: Airite launched Pi airite for Python data loading - **Use Case for Airite** - Loading pull request descriptions from Lang chain repository - Issue: Difficulty finding old PRs due to GitHub's keyword search limitations - Solution: Index PR titles and descriptions using chroma Vector store for semantic search - **Lang chain Airy Package** - Implements a document loader - Installation: `pip install airite` - Compatibility: Converts to a format usable in processing pipelines due to Python duck typing - **Prerequisites** - GitHub token for authentication to avoid rate limits - **Configuration Steps** - Import `airbyte_loader` from Lang chain Airy package - Import Lang chain prompt template for markdown formatting - Create an Airy loader using Source GitHub - Define Stream for loading GitHub pull requests - Configure credentials with GitHub token - Specify repository (Lang chain AI Lang chain) - Optional: Define a template for formatting data - **Execution and Results** - Pre-ran loading of 10,000 pull requests (takes 7 minutes) - Example output includes PR title, GitHub handle, and PR body - **Creating a Vector Store** - Documents are a list of 10,283 pull request documents - Goal: Create embeddings and load into a chroma Vector store - Configuration: Use default OpenAI embeddings model - Handle special characters in PR bodies with `disallowed_special` parameter - **Retrieval and Querying** - Use `Vector store.as_retriever` for retrieval - Example queries: - Documentation pull requests - Specific package-related pull requests (e.g., IBM) - **Conclusion** - Demonstrated airite integration for GitHub PRs - Anticipation for creative uses of document loading from various airite sources - Invitation for feedback on usage