-
Python
,LLM
,Today I Learned
🤖 Ideas for "The Levels of Claude" Notes
Lately, I have been conversing more with friends, colleagues, clients, and online friends about Claude and coding tools.
It turns out that everyone considers Claude to be something different when they ask, “Have you tried out Claude 3.7 Sonnet?” and that could mean through the website or the mobile app or people who are using it to write code or who are developing against one of Claude’s many REST APIs. Some people primarily interact with Claude by speaking in their voices while walking outside with their pets.
I have even struggled to land at a good starting point, so I have been thinking through the various ways to explain how one might use Claude.
- You can access Claude Chat through the website.
- You can access Claude through the mobile app and use voice support.
- You can copy one or more files into Claude.
- You can access Claude through an IDE.
- You use Claude Projects to organize your projects and might even have a support document you give Claude for more context.
- You can access Claude through the Claude Desktop app and use voice support.
- You have enabled MCP plugins/servers within the Claude Desktop app.
- You can access the Claude Code CLI/terminal app.
- You have enabled MCP with the Claude Code CLI/terminal app.
- You are writing your own MCP servers in Python or JavaScript.
- You have registered for Claude API access and are writing code against one of their REST APIs. (more on this later)
I originally broke this down into five levels of Claude, but there were too many platforms and features that blurred together.
Accessing Claude
Accessing Claude can be confusing because there are many ways to access it.
- Web interface
- Mobile app
- Desktop app
- CLI/Terminal app
- REST API (could be a third-party app too, that requires a developer token)
Organization by experience level
If we break these down by user experience or technical proficiency:
- Beginner: Web interface, mobile app, IDE integrations
- Intermediate: Desktop app, file uploads, REST API integration
- Advanced: Claude Code CLI
- Really Advanced: You are writing custom MCP implementations.
Claude’s developer APIs
Claude’s Developer APIs have their vibes and are helpful in various situations. I’m not going to deep dive into them, but from a bird’s eye view:
- Text Completion API
- Embeddings API
- Messages API
- Claude-in-the-Middle (Tools API and Agents)
- Batch API
These APIs can also vary based on passing them text, documents, images, or other types of files.
I’m still amazed that I can send an image to Claude and ask it to give me back an HTML file and CSS, and it does a relatively decent job.
Conclusion: Claude as a Spectrum
I wasn’t sure how to end my notes, but I asked Claude to: “Please write a conclusion about Claude being more of a spectrum than just a service.” and this is what it told me:
When discussing “Claude,” we’re not just talking about a single AI service but rather a spectrum of capabilities, interfaces, and integration points that serve different user needs, technical abilities, and use cases.
What makes this spectrum particularly interesting is how it creates different experiences and understanding of what “Claude” actually is. For a casual user accessing the web interface, Claude is simply a helpful chat assistant. For a developer using the API or building MCP servers, Claude becomes an extensible AI platform with programmable capabilities.
This spectrum explains why conversations about Claude can sometimes feel like people are talking about entirely different products. In a very real sense, they are. The Claude that helps a writer craft a story through the mobile app exists in a different context than the Claude that helps a developer debug code through an IDE integration.
As the Claude ecosystem continues to evolve, this spectrum will likely expand further, with new interfaces, capabilities, and integration points creating even more varied experiences. Understanding Claude as a spectrum rather than a singular service helps contextualize these different experiences and provides a framework for navigating the growing complexity of AI assistants.
The most powerful aspect of this spectrum may be that it allows Claude to meet users where they are - from the casual web user to the advanced developer building custom AI solutions. This flexibility transforms Claude from merely an AI service into an adaptable AI ecosystem that can evolve alongside user needs and technical capabilities.
It’s not half bad.
-
Django
,Python
,LLM
,UV
,Today I Learned
🤖 "Agents" are Cheat Codes
Lately, I have been trying to wrap my brain around AI Agents, so as a starting point, I have been using Pydantic AI’s Agent class/framework to build “Agents”.
“Agent” is a loaded term. Pydantic AI’s usage is more or less a system prompt and a good API around adding tool calls and working with existing LLMs.
I have written several experimental projects to help me quickly research and find answers to several areas of Django that confuse people, including myself. These ask-one-question bots do their best to fetch the resources they need to answer your questions.
The three I have published publicly are:
None of these are official resources of the Django Software Foundation, nor should they be considered “official” or even “legal” answers to any questions that may arise.
The pattern I landed on for building the system prompts and pulling remote data has been a practical, quick way for me to get feedback and ask questions based on our existing material. I can change a local copy of the bylaws and then ask the Agent questions to see if my potential changes might be comprehensive enough for the Agent to answer.
It effectively feels like running tests on governance to see if the Agent picks up on my changes.
Our Cheat Codes
These are cheat codes for a quick one-file Agent that one can quickly stand up and ask questions.
- UV is a cheat code because it can quickly create a one-file Agent with dependencies and the version of Python needed to run the demo baked in.
- Pydantic AI’s Agent class is a nice wrapper around a system prompt and can even create a dynamic system prompt. Having a global system prompt has a nice feel to it too.
- Pydantic’s
BaseModel
creates structured data responses as a cheat code for processing unstructured text. If you haven’t seen this pattern yet, you can’t unsee it. - The Jina AI for cleaning up HTML into Markdown is an AI I have wanted for a decade+. I use it in dozens of apps for free, saving me hours of work.
- The Python libraries Typer, Rich, and httpx may not seem like they are doing much, and I’m underutilizing them, but their Developer Experience (DX) is great, and they just work.
More areas to explore
Pydantic AI supports dynamic System Prompts, which might save me a few extra templating steps. They didn’t really click for me before I was writing this post.
When I wrote my Django Agents, I had Pydantic AI’s Multi-agent Applications feature in mind. In theory, I want to ask my Django Agents a question and have it route my question to the appropriate Agent to get an answer.
Function Tools or Tool Call is what inspired me to try out Pydantic AI. Function Tools are a way to give LLMs the ability to get information outside of their memory and system prompts when needed. I built one for reading and writing to my work calendar to help me manage my schedule. I didn’t use them for my suite of Django Agents, but when mixed with more real-time data they could be helpful.
We could also refactor each Agent using a reusable tool call so we could assemble one Agent that can gather the information needed to answer common Django Governance questions. I don’t know if that would be effective. In theory, it might not be a bad fit after looking at their DuckDuckGoSearchTool example.
-
Python
,Ollama
,LLM
🤖 My big list of AI/LLM tools, notes, and how I'm using them
I have been using, working, and running commercial and local LLMs for years, but I never got around to sharing the tools and applications I use. Here are some quick notes, tools, and resources I have landed on and use daily.
Mac Apps I use:
I do all of my development on Macs. These tools make running local LLMs accessible.
-
Ollama is the server that can download and run 100s of AI models.
-
Ollamac is a GUI client for Ollama that lets you write prompts, save history, and allow you to pick models to test out quickly.
-
Tailscale I use Tailscale on all of my devices, which gives me access to my work M2 Mac Studio and home office Mac Mini Pro, which both run Ollama, from anywhere in the world. This makes prototyping at home quick but then I can run a larger model from my work machine and it’s so fast, it feels like the machine is running in my house.
-
OpenAI Bundle—I bought this bundle because it was the cheapest way to get a bunch of AI apps, including four of Jordi Bruin’s apps. I have used these for a few years.
- MacWhisper - I use MacWhisper to turn voice notes and podcasts into plain text files for my notes and sometimes blog articles.
- Voices - I use Voices when I find a large blog post and want to listen to it while working.
-
Claude for Desktop gets a lot of crap for being “yet another Electron app” instead of a custom-built macOS app, but the people saying that don’t know what they are talking about. The Claude Desktop has voice support and keyboard hotkeys, which make the app incredibly useful. More importantly, Claude Desktop also supports Model Context Protocol, which lets Claude access your file system, git, and anything else you want to access. It’s incredibly powerful, and there’s nothing quite like it.
Baseline rules for running a model
While running models locally is possible, consumer hardware is constrained by RAM and GPU for even the most miniature models. The easiest mental model to work with is that Billions of parameters are roughly equivalent to your system’s RAM in Gigabytes. An 8B model needs roughly 8G RAM to fit into memory.
My mental formula is somewhat lossy because 40B models fit 32G of memory, and 72B models fit 64G of memory with some room to spare. This is just the rough estimate that I use.
Even though you can run models locally, even the smallest models with a significant context window will exceed your machine’s available RAM. A 128k context window needs about 64 GB of RAM to load into memory for an 8B parameter model fully, even though the model can easily fit into 8GB of RAM. That doesn’t mean the model won’t run locally, but it will run closer than it would if you have more than 72 GB of RAM, which your model fully needs to fit.
I look for three things when I’m evaluating a model:
- A number of parameters are measured in Billions.
- Context length
- The input context length, which effectively the model’s memory
- The output context length, which is how big the answer can be
- The type of model:
- Default Models are general-purpose models like GPT-4 and Llama 3.3.
- Vision Models and process and read visual data like images and videos.
- Tool Models can call external tools and APIs and perform custom actions to which you give them access.
- Embedding Models can turn text into vectors or tokens, which helps measure your prompts and other RAG operations.
What about quantization? Quantization can help you scale a model down so that it might fit into memory, but there’s always a loss in quality, which defeats the purpose of using the bigger model in my book.
Keeping up
My favorite resource for keeping up is Ollama’s Models page sorted by Newest models. I check it a few times a day and you’ll see new models release single-digit hours to days before press releases can catch up.
I like Matt Williams' YouTube Channel a lot. It’s the one channel I come back to, and I find that I always learn something from it. His videos tend to be ten to twenty minutes long, which is about right since the material is so dense.
Start with his Optimize Your AI Models videos. They’re a lot to fit in your brain, but they’re a great starting point.
Simon Willison’s Weblog is good too.
Python
I’ll have to write a few posts on how I’m using LLMs with code, but Simon’s LLM is a good general-purpose AI hammer if you need one.
As of last week, I’m using Pydantic AI instead of OpenAI’s or Anthropic’s Python libraries. Pydantic AI will install both of those libraries for you, but I find it to be 100% better and easier to switch between models using it than LangChain (not linked) or anything else I have tried.
-
-
Django
,Python
,LLM
🤖 I released files-to-claude-xml and new development workflows
After months of using and sharing this tool via a private gist, I finally carved out some time to release files-to-claude-xml.
Despite my social media timeline declaring LLMs dead earlier today, I have used Claude Projects and Artifacts.
My workflow is to copy a few files into a Claude Project and then create a new chat thread where Claude will help me write tests or build out a few features.
My
files-to-claude-xml
script grew out of some research I did where I stumbled on their Essential tips for long context prompts which documents how to get around some file upload limits which encourages uploading one big file using Claude’s XML-like format.With
files-to-claude-xml
, I build a list of files that I want to import into a Claude Project. Then, I run it to generate a_claude.xml
file, which I drag into Claude. I create a new conversation thread per feature, then copy the finished artifacts out of Claude once my feature or thread is complete.After the feature is complete, I delete the
_claude.xml
file from my project and replace it with an updated copy after I re-runfiles-to-claude-xml
.Features on the go
One bonus of using Claude Projects is that once everything is uploaded, I can use the Claude iOS app as a sort-of notes app and development tool. I can start parallel conversation threads and have it work on new ideas and features. Once I get back to my desktop, I can pull these chat conversations up, and if I like the direction of the feature, I might use them. If not, I have wasted no time or effort on them. This also serves as a nice ToDo list.
New workflows
I am working on side projects further using this methodology. Sometimes, I would like to work on something casually while watching Netflix, but my brain shuts off from coding during the day. Instead of feeling bad that I haven’t added share links to a website or some feature I meant to add last week, I can pair Claude to work on it with me.
I can also get more done with my lunch hours on projects like DjangoTV than I could have otherwise. Overall, I’m happy to have an on-demand assistant to pair with and work on new features and ideas.
It’s also quicker to try out new ideas and projects that I would have needed to make time for.
Alternatives
Simon Willison wrote files-to-prompt, which I think is also worth trying. I contributed to the discussion, feedback, and document structure for the
--cxml
feature.I wrote
files-to-claude-xml
before Simon had cxml support and hoped to not release my version.However, after trying it out on several projects, my ignore/exclude list grew more significant than the files that I wanted to include in my project to send to Claude. I found it easier to generate a list of files to pass to mine instead of maintaining a long list to exclude.
-
Python
,Ollama
,LLM
,Today I Learned
🦙 Ollama Llama 3.1 Red Pajama
For a few weeks, I told friends I was excited to see if the new Llama 3.1 release was as good as it was being hyped.
Yesterday, Llama 3.1 was released, and I was impressed that the Ollama project published a release to Homebrew and had the models ready to use.
➜ brew install ollama ➜ ollama serve # (optionally) I run Ollama as a background service ➜ brew services start ollama # This takes a while (defaults to the llama3.1:8b model) ➜ ollama pull llama3.1:latest # (optional) This takes a longer time ➜ ollama pull llama3.1:70b # (optional) This takes so long that I skipped it and ordered a CAT6 cable... # ollama pull llama3.1:405b
To use chat with the model, you use the same
ollama
console command:➜ ollama run llama3.1:latest >>> how much is 2+2? The answer to 2 + 2 is: 4!``` ## Accessing Ollama Llama 3.1 with Python The Ollama project has an [`ollama-python`](https://github.com/ollama/ollama-python) library, which I use to build applications. My demo has a bit of flare because there are a few options, like `--stream,` that improve the quality of life while waiting for Ollama to return results. ```python # hello-llama.py import typer from enum import Enum from ollama import Client from rich import print class Host(str, Enum): local = "http://127.0.0.1:11434" the_office = "http://the-office:11434" class ModelChoices(str, Enum): llama31 = "llama3.1:latest" llama31_70b = "llama3.1:70b" def main( host: Host = Host.local, local: bool = False, model: ModelChoices = ModelChoices.llama31, stream: bool = False, ): if local: host = Host.local client = Client(host=host.value) response = client.chat( model=model.value, messages=[ { "role": "user", "content": \ "Please riff on the 'Llama Llama Red Pajama' book but using AI terms like the 'Ollama' server and the 'Llama 3.1' model." "Instead of using 'Llama Llama', please use 'Ollama Llama 3.1'.", } ], stream=stream, ) if stream: for chunk in response: print(chunk["message"]["content"], end="", flush=True) print() else: print(f"[yellow]{response['message']['content']}[/yellow]") if __name__ == "__main__": typer.run(main)
Some of my family’s favorite books are the late Anna Dewdney’s Llama Llama books. Please buy and support their work. I can’t read Llama 3.1 and Ollama without considering the “Llama Llama Red Pajama” book.
To set up and run this:
# Install a few "nice to have" libraries ➜ pip install ollama rich typer # Run our demo ➜ python hello-llama.py --stream Here's a riff on "Llama Llama Red Pajama" but with an AI twist: **Ollama Llama 3.1, Ollama Llama 3.1** Mama said to Ollama Llama 3.1, "Dinner's done, time for some learning fun!" But Ollama Llama 3.1 didn't wanna play With the data sets and algorithms all day. He wanted to go out and get some rest, And dream of neural nets that were truly blessed. But Mama said, "No way, young Ollama Llama 3.1, You need to train on some more NLP." Ollama Llama 3.1 got so mad and blue He shouted at the cloud, "I don't wanna do this too!" But then he remembered all the things he could see, On the Ollama server, where his models would be. So he plugged in his GPU and gave a happy sigh And trained on some texts, till the morning light shone high. He learned about embeddings and wordplay too, And how to chat with humans, that's what he wanted to do. **The end**
Connecting to Ollama
I have two Macs running Ollama and I use Tailscale to bounce between them from anywhere. When I’m at home upstairs it’s quicker to run a local instance. When I’m on my 2019 MacBook Pro it’s faster to connect to the office.
The only stumbling block I ran into was needing to set a few ENV variables setup so that Ollama is listening on a port that I can proxy to. This was frustrating to figure out, but I hope it saves you some time.
➜ launchctl setenv OLLAMA_HOST 0.0.0.0:11434 ➜ launchctl setenv OLLAMA_ORIGINS http://* # Restart the Ollama server to pick up on the ENV vars ➜ brew services restart ollama
Simon Willison’s LLM tool
I also like using Simon Willison’s LLM tool, which supports a ton of different AI services via third-party plugins. I like the llm-ollama library, which allows us to connect to our local Ollama instance.
When working with Ollama, I start with the Ollama run command, but I have a few bash scripts that might talk to OpenAI or Claude 3.5, and it’s nice to keep my brain in the same tooling space. LLM is useful for mixing and matching remote and local models.
To install and use LLM + llm-ollama + Llama 3.1.
Please note that the Ollama server should already be running as previously outlined.
# Install llm ➜ brew install llm # Install llm-ollama ➜ llm install llm-ollama # List all of models from Ollama ➜ llm ollama list-models # ➜ llm -m llama3.1:latest "how much is 2+2?" The answer to 2 + 2 is: 4
Bonus: Mistral Large 2
While I was working on this post, Mistral AI launched their Large Enough: Mistral Large 2 model today. The Ollama project released support for the model within minutes of its announcement.
The Mistral Large 2 release is noteworthy because it outperforms Lllama 3.1’s 405B parameter model and is under 1/3 of the size. It is also the second GPT-4 class model release in the last two days.
Check out Simon’s post for more details and another LLM plugin for another way to access it.