Build your own Local Perplexity with Ollama – Deep Dive

Build your own Local Perplexity with Ollama – Deep Dive

AI Nuggets
Based on the provided transcript, here are the detailed instructions, CLI commands, website URLs, and tips extracted in an easy-to-follow outline form:

GitHub Repository

The code for the custom web search agent project is available on GitHub.

URL: (Link provided in the video description)

Schematic Diagram

The agent consists of several components: a planning agent, a web searcher, a dictionary for scraped content, an integration agent, and a response checker.

The planning agent proposes questions to research based on the user query.

The web searcher scrapes content from the best result on a search engine results page and presents it as a dictionary.

The integration agent decides if the information is sufficient and either provides feedback or moves to the response checker.

The response checker ensures the response meets formatting and coherence criteria.

Python Code Structure

The project consists of several Python files: requirements.txt, memory.json, config.py, prompts.py, agent.py, and search.py.

requirements.txt: Install the necessary libraries.

memory.json: Stores feedback as shown in the schematic diagram.

config.py: Contains API keys (not shown in the video).

prompts.py: Contains all the prompts for the agents.

agent.py: Brings everything together into the agent workflow.

search.py: Contains a class with methods to search the web, scrape content, and handle errors.

Running the Agent

The agent is executed using the execute method in the agent.py file.

The workflow involves running the planning agent, using the web search tool, running the integration agent, and checking the response.

A feedback loop is present if the information is insufficient.

Tips for Using the Agent

Be specific with prompts when using local models as they may not be as good at reasoning as proprietary models.

Manage the short-term memory effectively by saving, reading, and clearing the memory.json file.

Adjust the temperature parameter for the integration agent to potentially improve citation generation.

Setup for Using LLaMA

Download the LLaMA server and host it on your machine.

Download the desired models from LLaMA as per your machine’s specs.

Use the provided instructions in the GitHub repository readme to download models.

Select models in the LLaMA server configuration.

CLI Commands for Running the Agent
python agent.py run  
Testing the Agent with Different Models

The agent was tested with LLaMA 3 instruct (8 billion parameters) and Code LLaMA (7 billion parameters).

For better performance, larger models or proprietary services like OpenAI’s models may be necessary.

Additional Observations

The performance of the agent is highly dependent on the model used.

Open source models may not perform as well as proprietary models for complex agent workflows.

Consider renting a GPU and setting up your own inference server for larger models.

Improvements and Future Work

Optimize prompts for better performance.

Explore more efficient ways to run larger models locally.

Compare open source models against OpenAI models in future tests and videos.

Final Thoughts

The agent’s performance is limited by the hardware and the model’s capabilities.

Proprietary LLM services or the largest open source models are currently needed for state-of-the-art AI agents.

(Note: The exact URLs for the GitHub repository and other resources are not provided in the transcript and should be obtained from the video description as mentioned.)