• Ollama

    🦙 Ollama Tool Calling Loose Notes

    I spent a few hours this week working with the Ollama project and trying to get tool calling to work with the LangChain library.

    Tool calling is a way to expose Python functions to a language model that allows them to be called. This will enable models to perform more complex actions and even call the outside world for more information.

    I haven’t used LangChain before, and I found the whole process frustrating. The docs were full of errors. I eventually figured it out, but I was limited to one tool call per prompt, which felt broken.

    Earlier today, I was telling a colleague about it, and when we got back from grabbing coffee, I thought I would check the Ollama Discord channel to see if anyone else had figured it out. To my surprise, they added and released Tool support last night, which allowed me to ditch LangChain altogether.

    The Ollama project’s tool calling example was just enough to help get me started.

    I struggled with the function calling syntax, but after digging a bit deeper, I found this example from OpenAI’s Function calling docs, which matches the format the Ollama project is following. I still don’t fully understand it, but I got more functions working and verified that I can make multiple tool calls within the same prompt.

    Meta’s Llama 3.1 model supports tool calling, and the two work quite well together. I am also impressed with Llama 3.1 and the large context window support. I’m running the 8B and 70B models on a Mac Studio, and they feel very close to the commercial APIs I have worked with, but I can run them locally.

    Embedding models

    Tonight, I tried out Ollama’s Embedding models example, and while I got it working, I still need to put practical data into it to give it a better test

    One more tip

    If you did not know Ollama can parse and return valid JSON, check out How to get JSON response from Ollama. It made my JSON parsing and responses much more reliable.

    Friday July 26, 2024
  • Ollama

    ,

    LLM

    🦙 Ollama Llama 3.1 Red Pajama

    For a few weeks, I told friends I was excited to see if the new Llama 3.1 release was as good as it was being hyped.

    Yesterday, Llama 3.1 was released, and I was impressed that the Ollama project published a release to Homebrew and had the models ready to use.

    ➜ brew install ollama
    
    ➜ ollama serve
    
    # (optionally) I run Ollama as a background service
    ➜ brew services start ollama
    
    # This takes a while (defaults to the llama3.1:8b model)
    ➜ ollama pull llama3.1:latest 
    
    # (optional) This takes a longer time
    ➜ ollama pull llama3.1:70b
    
    # (optional) This takes so long that I skipped it and ordered a CAT6 cable...
    # ollama pull llama3.1:405b
    

    To use chat with the model, you use the same ollama console command:

    ➜ ollama run llama3.1:latest
    >>> how much is 2+2?
    The answer to 2 + 2 is:
    4!```
    
    ## Accessing Ollama Llama 3.1 with Python
    
    The Ollama project has an [`ollama-python`](https://github.com/ollama/ollama-python) library, which I use to build applications. 
    
    My demo has a bit of flare because there are a few options, like `--stream,` that improve the quality of life while waiting for Ollama to return results. 
    
    ```python
    # hello-llama.py
    import typer
    
    from enum import Enum
    from ollama import Client
    from rich import print
    
    
    class Host(str, Enum):
        local = "http://127.0.0.1:11434"
        the_office = "http://the-office:11434"
    
    
    class ModelChoices(str, Enum):
        llama31 = "llama3.1:latest"
        llama31_70b = "llama3.1:70b"
    
    
    def main(
        host: Host = Host.local,
        local: bool = False,
        model: ModelChoices = ModelChoices.llama31,
        stream: bool = False,
    ):
        if local:
            host = Host.local
    
        client = Client(host=host.value)
    
        response = client.chat(
            model=model.value,
            messages=[
                {
                    "role": "user",
                    "content": \
                        "Please riff on the 'Llama Llama Red Pajama' book but using AI terms like the 'Ollama' server and the 'Llama 3.1' model."
                        "Instead of using 'Llama Llama', please use 'Ollama Llama 3.1'.",
                }
            ],
            stream=stream,
        )
    
        if stream:
            for chunk in response:
                print(chunk["message"]["content"], end="", flush=True)
            print()
    
    	else:
            print(f"[yellow]{response['message']['content']}[/yellow]")
    
    if __name__ == "__main__":
        typer.run(main)
    

    Some of my family’s favorite books are the late Anna Dewdney’s Llama Llama books. Please buy and support their work. I can’t read Llama 3.1 and Ollama without considering the “Llama Llama Red Pajama” book.

    To set up and run this:

    # Install a few "nice to have" libraries
    ➜ pip install ollama rich typer
    
    # Run our demo
    ➜ python hello-llama.py --stream
    
    Here's a riff on "Llama Llama Red Pajama" but with an AI twist:
    
    **Ollama Llama 3.1, Ollama Llama 3.1**
    Mama said to Ollama Llama 3.1,
    "Dinner's done, time for some learning fun!"
    But Ollama Llama 3.1 didn't wanna play
    With the data sets and algorithms all day.
    
    He wanted to go out and get some rest,
    And dream of neural nets that were truly blessed.
    But Mama said, "No way, young Ollama Llama 3.1,
    You need to train on some more NLP."
    
    Ollama Llama 3.1 got so mad and blue
    He shouted at the cloud, "I don't wanna do this too!"
    But then he remembered all the things he could see,
    On the Ollama server, where his models would be.
    
    So he plugged in his GPU and gave a happy sigh
    And trained on some texts, till the morning light shone high.
    He learned about embeddings and wordplay too,
    And how to chat with humans, that's what he wanted to do.
    
    **The end**
    

    Connecting to Ollama

    I have two Macs running Ollama and I use Tailscale to bounce between them from anywhere. When I’m at home upstairs it’s quicker to run a local instance. When I’m on my 2019 MacBook Pro it’s faster to connect to the office.

    The only stumbling block I ran into was needing to set a few ENV variables setup so that Ollama is listening on a port that I can proxy to. This was frustrating to figure out, but I hope it saves you some time.

    ➜ launchctl setenv OLLAMA_HOST 0.0.0.0:11434
    ➜ launchctl setenv OLLAMA_ORIGINS http://*
    
    # Restart the Ollama server to pick up on the ENV vars
    ➜ brew services restart ollama
    

    Simon Willison’s LLM tool

    I also like using Simon Willison’s LLM tool, which supports a ton of different AI services via third-party plugins. I like the llm-ollama library, which allows us to connect to our local Ollama instance.

    When working with Ollama, I start with the Ollama run command, but I have a few bash scripts that might talk to OpenAI or Claude 3.5, and it’s nice to keep my brain in the same tooling space. LLM is useful for mixing and matching remote and local models.

    To install and use LLM + llm-ollama + Llama 3.1.

    Please note that the Ollama server should already be running as previously outlined.

    # Install llm
    ➜ brew install llm
    
    # Install llm-ollama
    ➜ llm install llm-ollama
    
    # List all of models from Ollama
    ➜ llm ollama list-models
    
    # 
    ➜ llm -m llama3.1:latest "how much is 2+2?"
    The answer to 2 + 2 is:
    
    4
    

    Bonus: Mistral Large 2

    While I was working on this post, Mistral AI launched their Large Enough: Mistral Large 2 model today. The Ollama project released support for the model within minutes of its announcement.

    The Mistral Large 2 release is noteworthy because it outperforms Lllama 3.1’s 405B parameter model and is under 1/3 of the size. It is also the second GPT-4 class model release in the last two days.

    Check out Simon’s post for more details and another LLM plugin for another way to access it.

    Wednesday July 24, 2024