Build your own Local Perplexity with Ollama – Deep Dive



AI Nuggets

Based on the provided transcript, here are the detailed instructions, CLI commands, website URLs, and tips extracted in an easy-to-follow outline form:

GitHub Repository

  • The code for the custom web search agent project is available on GitHub.
  • URL: (Link provided in the video description)

Schematic Diagram

  • The agent consists of several components: a planning agent, a web searcher, a dictionary for scraped content, an integration agent, and a response checker.
  • The planning agent proposes questions to research based on the user query.
  • The web searcher scrapes content from the best result on a search engine results page and presents it as a dictionary.
  • The integration agent decides if the information is sufficient and either provides feedback or moves to the response checker.
  • The response checker ensures the response meets formatting and coherence criteria.

Python Code Structure

  • The project consists of several Python files: requirements.txt, memory.json, config.py, prompts.py, agent.py, and search.py.
  • requirements.txt: Install the necessary libraries.
  • memory.json: Stores feedback as shown in the schematic diagram.
  • config.py: Contains API keys (not shown in the video).
  • prompts.py: Contains all the prompts for the agents.
  • agent.py: Brings everything together into the agent workflow.
  • search.py: Contains a class with methods to search the web, scrape content, and handle errors.

Running the Agent

  • The agent is executed using the execute method in the agent.py file.
  • The workflow involves running the planning agent, using the web search tool, running the integration agent, and checking the response.
  • A feedback loop is present if the information is insufficient.

Tips for Using the Agent

  • Be specific with prompts when using local models as they may not be as good at reasoning as proprietary models.
  • Manage the short-term memory effectively by saving, reading, and clearing the memory.json file.
  • Adjust the temperature parameter for the integration agent to potentially improve citation generation.

Setup for Using LLaMA

  • Download the LLaMA server and host it on your machine.
  • Download the desired models from LLaMA as per your machine’s specs.
  • Use the provided instructions in the GitHub repository readme to download models.
  • Select models in the LLaMA server configuration.

CLI Commands for Running the Agent

python agent.py run  

Testing the Agent with Different Models

  • The agent was tested with LLaMA 3 instruct (8 billion parameters) and Code LLaMA (7 billion parameters).
  • For better performance, larger models or proprietary services like OpenAI’s models may be necessary.

Additional Observations

  • The performance of the agent is highly dependent on the model used.
  • Open source models may not perform as well as proprietary models for complex agent workflows.
  • Consider renting a GPU and setting up your own inference server for larger models.

Improvements and Future Work

  • Optimize prompts for better performance.
  • Explore more efficient ways to run larger models locally.
  • Compare open source models against OpenAI models in future tests and videos.

Final Thoughts

  • The agent’s performance is limited by the hardware and the model’s capabilities.
  • Proprietary LLM services or the largest open source models are currently needed for state-of-the-art AI agents.

(Note: The exact URLs for the GitHub repository and other resources are not provided in the transcript and should be obtained from the video description as mentioned.)