Django
,
Python
,
LLM

🤖 On GitHub Copilot CLI and prompts as code
I checked out William Vincent’s The Secret Prompts in GitHub Copilot CLI tonight, and I wanted to share a few tips and what stood out to me.

GitHub Copilot CLI uses Claude Sonnet 4.5 by default

No luck other than confirming it is using Claude models by default. Apparently, you can change the underlying model, for example, to ChatGPT 5, by updating the environment variable COPILOT_MODEL=gpt-5; however, we will work with the defaults here.

You can Copilot CLI’s default model via the /model command and it will also let you pick between three models.
```
 Select Model

 Choose the AI model to use for Copilot CLI. The selected model will be persisted and used for future sessions.

 ❯ 1. Claude Sonnet 4.5 (default) (current)
   2. Claude Sonnet 4
   3. GPT-5
   4. Cancel (Esc)

 Use ↑/↓ to navigate, Enter to select, Esc to cancel
```
I’m somewhat surprised that GitHub CLI isn’t shipping with gpt-5-codex model support yet. From my testing, it appears to be OpenAI’s best coding model, but it may be shipping soon.

The “You are” trick

Well, what we “really” care about are any prompts that start with “you are” since those are instructions from Copilot CLI to the model.

The “you are” tip is one I shared with Will and Simon Willison, which I have used for a few years when trying to find a tool’s system prompt, which is always buried and obfuscated in their JavaScript files.

Simon and I were invited to Microsoft HQ last month for an AI Insider’s summit (I prefer not to use the term influencer), where we were given early access to GitHub Copilot CLI before the public release. The first thing we both did was dive into the system prompt to see what it contained. Every system prompt has “You are” buried in it somewhere, which makes it much easier to find that paging through 10k lines of JavaScript.

Use Xml tags in your prompts

Will posted the Copilot CLI’s system prompt, and one thing that stood out to me was the various XML-tagged sections, such as <tips_and_tricks> and <style>, which Anthropic has encouraged for years. They documented the technique in the Use XML tags to structure your prompts section of their docs.

Natural language prompts are becoming the universal programming interface

I have been experimenting with LLMs since the release of GPT-2. While I understand why it pains many developers to admit this, natural language prompts have become a language-agnostic way to program. Somebody could build the same app I’m building using Django with the same prompts in Laravel, Rails, or any other well-documented web framework and underlying programming language.

For many people who are new to development, this is more obvious:

Prompts are code.

Languages like Python aren’t going away, but it’s much easier and faster to develop when pairing with an LLM assistant who is always online and can instantly answer any questions that you might have.

The more I examine tools like GitHub Copilot CLI, the clearer it becomes: understanding how these tools prompt their models matters not just for curiosity’s sake, but because knowing how the prompt works helps you work better with the tool. Next time you’re using an AI coding assistant, use the “You are” trick to peek under the hood. You might be surprised by what you find.
Saturday October 4, 2025
LLM
,
Today I Learned

🤖 VibeOps: Using Claude Code on Cheap VPS Servers

Today I came across Pieter Levels' post about “VibeOps,” a workflow that involves SSHing to a cheap VPS server and installing Claude Code directly on it. I’m running this setup on a cheap Hetzner box. While this approach might sound risky at first, when combined with a secure networking solution like Tailscale and properly configured as a Tailnet, it becomes both safe and worth experimenting with.

I’ve also been exploring a complementary tool called Vibe Tunnel, which integrates well with Tailscale and my existing Tailnet setup. It provides an elegant way to remotely create or connect to an existing Claude Code session from my browser. I like the idea of letting Claude handle some routine remote tasks.

Security is essential here. My Hetzner box uses their firewall product to block all ports except for SSL (443) and only allows me to access the box via Tailscale. If you aren’t using a firewall and a VPN product like Tailscale, you should not use a product like Vibe Tunnel to allow access to a running shell on your servers.

While trying to install Vibe Tunnel, I ran into a PAM error, so I did what anyone would do with Claude Code available on the server: I let Claude fix it for me. I thought Claude could figure it out, but after 5 to 10 minutes of noble efforts, I found issue #499 and pull request #521, which point to a bigger issue with a potential fix coming soon.

While I was on my server, I asked Claude to look at my Caddy setup for a bunch of parked domains, and it found some actionable changes and even fixed a few broken paths that weren’t obvious. This is exactly the kind of routine server maintenance where having Claude available directly on the server proves helpful.

I also had Claude check to see if the machine was out-of-date, and I watched it install a month’s worth of updates. Claude even wrote a nice summary for me to share in this post:

Analysis: The system is running a current Long Term Support release of Ubuntu with extended support until 2029. All installed packages are current with no pending updates available. No maintenance required at this time.

Action Taken: System audit only - no changes were made to the system.

All in all, I am a fan, even if I wouldn’t recommend this for any clients or just anyone.

Wednesday August 6, 2025
LLM

🤖 Building with AI: A Summer Reading Collection

I’ve been diving deep into AI-assisted development this summer, and I’ve collected some of the best articles I’ve found on the topic. This reading list focuses heavily on Claude Code and practical AI coding workflows, featuring insights from developers who are actively using these tools in their daily work. Whether you’re just getting started with AI coding assistants or looking to refine your approach, these articles are a good jump start.

Claude Code

Claude Code is My Computer

Peter Steinberger describes using Claude Code with a dangerous permissions flag, transforming AI into a universal computer interface that can perform complex tasks, such as system migrations, content generation, and code management. By granting broad system access, he has found that Claude can automate numerous computing tasks, saving him significant time and operating at a fundamentally higher level of abstraction compared to traditional development tools.

How I Use Claude Code

Philipp Spiess shares his experiences and strategies for effectively using Claude Code, an AI coding assistant that has transformed his programming workflow. He emphasizes techniques like starting new threads frequently, creating precise prompts, breaking complex tasks into smaller steps, and maintaining a balance between AI automation and human oversight to maximize productivity.

Basic Claude Code

Harper Reed discusses his workflow for using Claude Code, an AI coding assistant, which involves generating project specifications, using AI to create and implement prompts, and leveraging test-driven development (TDD) and pre-commit hooks. He emphasizes the importance of defensive coding practices like testing, linting, and creating project-specific configuration files to improve code quality and AI coding efficiency. Reed shares his team’s experience using Claude Code, noting increased test coverage and more effective development processes.

Field Notes From Shipping Real Code With Claude

The article explores how to effectively use AI, specifically Claude, in software development by establishing clear boundaries, documentation, and workflows. The author emphasizes that AI is a powerful tool for generating code, but human oversight, particularly in writing tests and maintaining code quality, remains crucial. The piece provides a comprehensive guide to vibe-coding, a collaborative approach where humans direct AI to enhance productivity while maintaining rigorous engineering standards.

My First Open Source AI Generated Library

Armin Ronacher created a sloppy XML parsing library entirely using AI (specifically Claude), which generated around 1,100 lines of code, 1,000 lines of tests, and handled package configuration. The project was an experimental exploration of AI’s capabilities in software development, resulting in a functional library that solved his immediate technical need of parsing imperfect XML output from language models.

AI Development Workflows

My LLM codegen workflow atm

Harper Reed describes a comprehensive workflow for using Large Language Models (LLMs) to generate code, which involves three main steps: idea honing, planning, and execution. The workflow can be applied to both greenfield (new) and existing code projects, utilizing tools like Claude and Aider to iteratively develop software with minimal manual intervention. Reed emphasizes the importance of careful planning, testing, and maintaining context to effectively leverage AI in software development.

An LLM Codegen Hero’s Journey

Harper Reed describes a progressive journey of adopting AI-assisted coding, starting from basic autocomplete to increasingly sophisticated AI coding tools like Cursor and Aider. The article outlines a step-by-step evolution of using Large Language Models (LLMs) for code generation, emphasizing that developers should embrace the technology incrementally and be open to its potential, ultimately reaching a point where AI agents can handle significant coding tasks with minimal human intervention.

How I program with Agents

The article explores how Large Language Model (LLM) agents, defined as a for loop that contains an LLM call, can significantly improve programming by providing environmental feedback and tools like bash commands, web searching, and compiler interactions. The author argues that agents transform LLMs from whiteboard-style code generation into powerful programming assistants that can navigate codebases, fix errors, and complete tasks more efficiently despite current limitations in speed and cost.

AI Industry Perspectives

AI Changes Everything

Armin Ronacher explores the profound impact of AI on work, creativity, and society, describing how AI tools like Claude are transforming his programming workflow. He argues that AI represents a revolutionary technological shift comparable to electricity or the printing press, and encourages embracing this change with curiosity and optimism rather than fear, believing AI can dramatically increase human agency when used well.

Builder.ai did not “fake AI with 700 engineers”

The viral claim that Builder.ai used 700 engineers to manually pretend to be an AI system was false. In reality, the company built a code generation platform called Natasha using AI models like GPT and Claude, with a small team of about 15 engineers. The company collapsed due to alleged accounting fraud and misrepresentation of revenue to investors, not because of a fake AI scheme.

The Future Is Now (And It’s Iterative)

What strikes me most about these articles is how quickly we’ve moved from “Can AI write code?” to “How do I optimize my AI coding workflow?” The common thread isn’t just about technology getting better but about developers becoming smarter about how they collaborate with AI. Whether it’s Harper’s methodical approach, Armin’s experimental libraries, or Peter’s all-in computer replacement strategy, AI coding is changing how we develop applications.

Written by Jeff, typos fixed by Grammarly, summaries, feedback, and heading suggestions via Claude Code.

Wednesday June 25, 2025
LLM

🤖 When AI Agents Start Panicking: Wild Emails from a Failing Vending Business

🤔 I struggle with most research papers, but the Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents was an easy read. The paper follows a fascinating study where researchers simulate various LLM models running a vending machine business over time.

What caught my eye wasn’t just the research methodology. It was this post by @benjojo sharing screenshots of the increasingly desperate emails the AI agent sent while trying to save its failing business. Watching an AI spiral into panic mode as quarterly profits tanked? That’s the kind of real-world AI behavior we rarely see documented.

The email screenshots at the end are genuinely wild and worth checking out. You can watch the Agent’s “thought process” deteriorate from professional business correspondence to what can only be described as digital desperation after it keeps getting charged a $2 daily fee to operate.

Finally, AI That Closes the Last 10% Gap

What makes this research particularly relevant is the timing. I would love to see an updated version testing newer LLM models like Claude 3.7 + 4 and OpenAI’s o3 + o3-mini series. This latest generation represents the moment I noticed AI crossing from “sort-of good” to actually reliable.

Previously, I was happy having AI solve 90% of my random side project problems and finishing the rest myself. But once Claude 3.7 and o3-mini hit production, something shifted. I could throw most issues at them and find complete, working solutions.

The sweet spot example: Asking an LLM to build Stripe from scratch? It’s still too ambitious. But requesting it adds three specific Stripe subscription plans to an existing Django website, with proper error handling and webhook integration? That’s now a 10-minute task that seems to work.

Real-world test: My seven-year-old and I fell down a 45-minute YouTube rabbit hole about “How to Run a Vending Machine Business” over the weekend. We’re not actually planning to start one, but his uncle runs a laundromat and mentioned wanting to add vending machines on one of our last family visits. This research paper felt less academic and more like a preview of what’s possible when AI agents handle small business operations' mundane but critical parts.

I’m conflicted about the paper because many people won’t read it and skip to the failure as “further proof” to confirm their bias. That said, we are moving to a future where AI agents can manage the boring parts of our businesses so we can focus on what matters.

Written by Jeff, typos fixed by Grammarly, feedback and heading suggestions via Claude 4

Monday May 26, 2025
LLM
,
Today I Learned

🤖 Voice Dictation with AI and my MacWhisper Workflow
I recently came across Simon Willison’s post about Matt Webb’s Apple Watch dictation setup on Interconnected. He records voice notes while running with the Whisper Memos app, then cleans up the transcript with Claude when he gets home.

Matt Webb dictates notes into his Apple Watch while out running (using the new-to-me Whisper Memos app), then runs the transcript through Claude to tidy it up when he gets home.

Matt’s usage of Diane is a neat trick that allows him to embed instructions while recording his notes while running. While I used to be in good enough shape to talk while running, the idea of dictating lectures is wild.

My generic prompt to Claude, used every time, is now:

you are Diane, my secretary. please take this raw verbal transcript and clean it up. do not add any of your own material. because you are Diane, also follow any instructions addressed to you in the transcript and perform those instructions

[paste in transcript]

Which means, when I’m talking through my lecture outline, I now finish by saying:

ok Diane I think that’s it. it’s a talk, so please structure all of that into a high level outline so I can work on it. thanks.

And I can mix in instructions like: oh Diane I meant to include that point in the last section. Please move it.

It works super well.

That inspired me to share the workflow I’ve been using for years.

tl;dr My Workflow
- Record thoughts with Apple Voice Memos on my iPhone using AirPods
- Drag the audio file into MacWhisper to get a raw transcript
- Copy the raw transcript into Obsidian as my writing buffer before and after each of the next steps
- Paste the transcript into ChatGPT for cleanup
- Run the text through Grammarly to spot grammar and style issues
- Publish or iterate as needed
Apple Voice Memos

I use Apple Voice Memos on iOS, iPadOS, or macOS to capture ideas on the go. It syncs instantly across devices, so the recording is ready by the time I’m at my desk.

MacWhisper

Of all the transcription tools I’ve tried, MacWhisper is the most reliable. Once I’m at my Mac, I open Voice Memos, drag the file into MacWhisper, and it produces a transcript in seconds. MacWhisper is now available in the Mac App Store.

There is even an iOS version called Whisper Transcription, which I have been trying out for the last few weeks. It has the advantage of allowing me to record directly into the app and then copy the transcript into something else. This is fine, but I haven’t found an option to save the audio file, which is concerning if my transcript gets too long.

ChatGPT

Next, I paste the raw transcript into ChatGPT to clean up filler words and pauses. My usual prompt looks like this:
```
Please tidy up these voice notes. Remove any ums, ahs, and awkward pauses.

<notes>
...
</notes>

<instructions>
- Use my words and keep the spirit of my text
- Avoid using en dashes or em dashes
- ... your custom instructions to clean up your habits...
</instructions>
```
Grammarly

After ChatGPT, I paste the text into the Grammarly app. As someone with dyslexia, Grammarly Pro helps me catch grammar mistakes and awkward phrasing so I can write in minutes, which used to take hours.

Obsidian

I copy the cleaned text into Obsidian. This step resolves formatting glitches from moving text between ChatGPT and Grammarly and allows me to make final tweaks.

Publishing

When I’m satisfied with the draft, I publish it. If it needs more work, I’ll run another quick round through ChatGPT and Grammarly until it’s ready.

Improvements

I have been trying out Whisper Transcription to skip a few steps by letting me record on my iPhone while sending the finished transcript to Obsidian or ChatGPT directly.

I also want to add Matt’s “Diane” dictation trick with a more gender-neutral name, or maybe I’ll default to using “Simon” since I’m already using a half-dozen of Simon Willison’s AI/LLM tooling.

I’m already using AI and LLM for many big and small tasks, so devising a more automated cleanup and preparation workflow shouldn’t be a big lift.
Wednesday April 30, 2025
Django
,
Python
,
LLM
,
UV

🤖 A better Django version/update command and Prompt-driven-development
Late last year, I wrote django-cli-no-admin, a proof-of-concept Python library whose goal was to shorten Django’s default django-admin command to just django.

I published the package on pypi and it helped create one of the more colorful forum topics of 2024. I ended up writing two blog posts about it:
- New project to shorten django-admin to django because we are not monsters
- 🤔 Rethinking Django’s Command-Line Tool: Why We Should Rename django-admin
Ever since I had this idea, the idea of building a better django command has stuck with me. While I don’t see Django embracing the idea, of shortening its default command any time soon, that doesn’t mean we shouldn’t

Lately, I have been doing a lot of vibe coding, and I took some inspiration from the claude-code’s cli app, which has a nice developer experience (DX). The claude-code app is version-aware and can query npm’s servers to see if a newer version is out and can even update itself.

I took the output of a few of claude-code’s commands and fed that back into Claude and I asked it to build a django command that can mimic this behavior with Django using either pip or UV and the PyPI servers to query for the latest Django version.

Here is what the output of my 2-minute vibe coding sessions resulted in or what I prefer to call Prompt-driven-development (PDD):
```
$ django --version
5.1.6


$ django update
Current version: 5.1.6
Checking for updates...
New version available: 5.1.7 (current: 5.1.6)
Installing update...
Successfully updated from 0.5.1.6 to version 0.5.1.7
```
This just worked on the first try. I used a few follow-up prompts to make it less generic and to default to only ever checking Django.

One of my favorite features of prompt-driven development is that I create output like this and give it to Claude or ChatGPT, along with a few libraries with which I want to build a new tool. It gives me the working code back in seconds.

It’s incredibly powerful to focus on the end result without worrying about a clever solution and getting lost along the way. With vibe coding or prompt-driven development, we skip the journey and focus on a better result at our destination.

What about funding?

I noticed while updating another npm-based project today that it told me that two projects could be funded.

The Python Packaging User Guide lists sponsor in their Well-known labels which gives us a data point we could use to list all of the projects that may be funded that we have installed within our Django and/or Python application.

We could also be baked into our django command to determine fundable projects.

Thinking outside the box

Simon Willison’s LLM command also ships with the ability to update itself and even install its own plugins by wrapping PIP or UV or whatever magic Simon thought up.

I don’t hate using a DX like django install {package} to install a new package into my application. It would be really cool if we could even add the package into INSTALLED_APPS and make some other suggestion changes if the package could be installed in our urls.py or even explore optional settings that the package might support. I suspect this could even be locally LLM-driven if we wanted to test our docs.

I also recommend you check out Brett Cannon’s python-launcher app, a Python library wrapper. Brett was writing Rust wrappers to improve Python tooling before it was popular.
Wednesday March 26, 2025
Python
,
LLM
,
Today I Learned

🤖 Ideas for "The Levels of Claude" Notes
Lately, I have been conversing more with friends, colleagues, clients, and online friends about Claude and coding tools.

It turns out that everyone considers Claude to be something different when they ask, “Have you tried out Claude 3.7 Sonnet?” and that could mean through the website or the mobile app or people who are using it to write code or who are developing against one of Claude’s many REST APIs. Some people primarily interact with Claude by speaking in their voices while walking outside with their pets.

I have even struggled to land at a good starting point, so I have been thinking through the various ways to explain how one might use Claude.
- You can access Claude Chat through the website.
- You can access Claude through the mobile app and use voice support.
- You can copy one or more files into Claude.
- You can access Claude through an IDE.
- You use Claude Projects to organize your projects and might even have a support document you give Claude for more context.
- You can access Claude through the Claude Desktop app and use voice support.
- You have enabled MCP plugins/servers within the Claude Desktop app.
- You can access the Claude Code CLI/terminal app.
- You have enabled MCP with the Claude Code CLI/terminal app.
- You are writing your own MCP servers in Python or JavaScript.
- You have registered for Claude API access and are writing code against one of their REST APIs. (more on this later)
I originally broke this down into five levels of Claude, but there were too many platforms and features that blurred together.

Accessing Claude

Accessing Claude can be confusing because there are many ways to access it.
- Web interface
- Mobile app
- Desktop app
- CLI/Terminal app
- REST API (could be a third-party app too, that requires a developer token)
Organization by experience level

If we break these down by user experience or technical proficiency:
- Beginner: Web interface, mobile app, IDE integrations
- Intermediate: Desktop app, file uploads, REST API integration
- Advanced: Claude Code CLI
- Really Advanced: You are writing custom MCP implementations.
Claude’s developer APIs

Claude’s Developer APIs have their vibes and are helpful in various situations. I’m not going to deep dive into them, but from a bird’s eye view:
- Text Completion API
- Embeddings API
- Messages API
- Claude-in-the-Middle (Tools API and Agents)
- Batch API
These APIs can also vary based on passing them text, documents, images, or other types of files.

I’m still amazed that I can send an image to Claude and ask it to give me back an HTML file and CSS, and it does a relatively decent job.

Conclusion: Claude as a Spectrum

I wasn’t sure how to end my notes, but I asked Claude to: “Please write a conclusion about Claude being more of a spectrum than just a service.” and this is what it told me:

When discussing “Claude,” we’re not just talking about a single AI service but rather a spectrum of capabilities, interfaces, and integration points that serve different user needs, technical abilities, and use cases.

What makes this spectrum particularly interesting is how it creates different experiences and understanding of what “Claude” actually is. For a casual user accessing the web interface, Claude is simply a helpful chat assistant. For a developer using the API or building MCP servers, Claude becomes an extensible AI platform with programmable capabilities.

This spectrum explains why conversations about Claude can sometimes feel like people are talking about entirely different products. In a very real sense, they are. The Claude that helps a writer craft a story through the mobile app exists in a different context than the Claude that helps a developer debug code through an IDE integration.

As the Claude ecosystem continues to evolve, this spectrum will likely expand further, with new interfaces, capabilities, and integration points creating even more varied experiences. Understanding Claude as a spectrum rather than a singular service helps contextualize these different experiences and provides a framework for navigating the growing complexity of AI assistants.

The most powerful aspect of this spectrum may be that it allows Claude to meet users where they are - from the casual web user to the advanced developer building custom AI solutions. This flexibility transforms Claude from merely an AI service into an adaptable AI ecosystem that can evolve alongside user needs and technical capabilities.

It’s not half bad.
Wednesday March 5, 2025
Django
,
Python
,
LLM
,
UV
,
Today I Learned

🤖 "Agents" are Cheat Codes
Lately, I have been trying to wrap my brain around AI Agents, so as a starting point, I have been using Pydantic AI’s Agent class/framework to build “Agents”.

“Agent” is a loaded term. Pydantic AI’s usage is more or less a system prompt and a good API around adding tool calls and working with existing LLMs.

I have written several experimental projects to help me quickly research and find answers to several areas of Django that confuse people, including myself. These ask-one-question bots do their best to fetch the resources they need to answer your questions.

The three I have published publicly are:
None of these are official resources of the Django Software Foundation, nor should they be considered “official” or even “legal” answers to any questions that may arise.

The pattern I landed on for building the system prompts and pulling remote data has been a practical, quick way for me to get feedback and ask questions based on our existing material. I can change a local copy of the bylaws and then ask the Agent questions to see if my potential changes might be comprehensive enough for the Agent to answer.

It effectively feels like running tests on governance to see if the Agent picks up on my changes.

Our Cheat Codes

These are cheat codes for a quick one-file Agent that one can quickly stand up and ask questions.
- UV is a cheat code because it can quickly create a one-file Agent with dependencies and the version of Python needed to run the demo baked in.
- Pydantic AI’s Agent class is a nice wrapper around a system prompt and can even create a dynamic system prompt. Having a global system prompt has a nice feel to it too.
- Pydantic’s BaseModel creates structured data responses as a cheat code for processing unstructured text. If you haven’t seen this pattern yet, you can’t unsee it.
- The Jina AI for cleaning up HTML into Markdown is an AI I have wanted for a decade+. I use it in dozens of apps for free, saving me hours of work.
- The Python libraries Typer, Rich, and httpx may not seem like they are doing much, and I’m underutilizing them, but their Developer Experience (DX) is great, and they just work.
More areas to explore

Pydantic AI supports dynamic System Prompts, which might save me a few extra templating steps. They didn’t really click for me before I was writing this post.

When I wrote my Django Agents, I had Pydantic AI’s Multi-agent Applications feature in mind. In theory, I want to ask my Django Agents a question and have it route my question to the appropriate Agent to get an answer.

Function Tools or Tool Call is what inspired me to try out Pydantic AI. Function Tools are a way to give LLMs the ability to get information outside of their memory and system prompts when needed. I built one for reading and writing to my work calendar to help me manage my schedule. I didn’t use them for my suite of Django Agents, but when mixed with more real-time data they could be helpful.

We could also refactor each Agent using a reusable tool call so we could assemble one Agent that can gather the information needed to answer common Django Governance questions. I don’t know if that would be effective. In theory, it might not be a bad fit after looking at their DuckDuckGoSearchTool example.
Sunday March 2, 2025
Python
,
Ollama
,
LLM

🤖 My big list of AI/LLM tools, notes, and how I'm using them
I have been using, working, and running commercial and local LLMs for years, but I never got around to sharing the tools and applications I use. Here are some quick notes, tools, and resources I have landed on and use daily.

Mac Apps I use:

I do all of my development on Macs. These tools make running local LLMs accessible.
- Ollama is the server that can download and run 100s of AI models.
- Ollamac is a GUI client for Ollama that lets you write prompts, save history, and allow you to pick models to test out quickly.
- Tailscale I use Tailscale on all of my devices, which gives me access to my work M2 Mac Studio and home office Mac Mini Pro, which both run Ollama, from anywhere in the world. This makes prototyping at home quick but then I can run a larger model from my work machine and it’s so fast, it feels like the machine is running in my house.
- OpenAI Bundle—I bought this bundle because it was the cheapest way to get a bunch of AI apps, including four of Jordi Bruin’s apps. I have used these for a few years.
  - MacWhisper - I use MacWhisper to turn voice notes and podcasts into plain text files for my notes and sometimes blog articles.
  - Voices - I use Voices when I find a large blog post and want to listen to it while working.
- Claude for Desktop gets a lot of crap for being “yet another Electron app” instead of a custom-built macOS app, but the people saying that don’t know what they are talking about. The Claude Desktop has voice support and keyboard hotkeys, which make the app incredibly useful. More importantly, Claude Desktop also supports Model Context Protocol, which lets Claude access your file system, git, and anything else you want to access. It’s incredibly powerful, and there’s nothing quite like it.
Baseline rules for running a model

While running models locally is possible, consumer hardware is constrained by RAM and GPU for even the most miniature models. The easiest mental model to work with is that Billions of parameters are roughly equivalent to your system’s RAM in Gigabytes. An 8B model needs roughly 8G RAM to fit into memory.

My mental formula is somewhat lossy because 40B models fit 32G of memory, and 72B models fit 64G of memory with some room to spare. This is just the rough estimate that I use.

Even though you can run models locally, even the smallest models with a significant context window will exceed your machine’s available RAM. A 128k context window needs about 64 GB of RAM to load into memory for an 8B parameter model fully, even though the model can easily fit into 8GB of RAM. That doesn’t mean the model won’t run locally, but it will run closer than it would if you have more than 72 GB of RAM, which your model fully needs to fit.

I look for three things when I’m evaluating a model:
- A number of parameters are measured in Billions.
- Context length
  - The input context length, which effectively the model’s memory
  - The output context length, which is how big the answer can be
- The type of model:
  - Default Models are general-purpose models like GPT-4 and Llama 3.3.
  - Vision Models and process and read visual data like images and videos.
  - Tool Models can call external tools and APIs and perform custom actions to which you give them access.
  - Embedding Models can turn text into vectors or tokens, which helps measure your prompts and other RAG operations.
What about quantization? Quantization can help you scale a model down so that it might fit into memory, but there’s always a loss in quality, which defeats the purpose of using the bigger model in my book.

Keeping up

My favorite resource for keeping up is Ollama’s Models page sorted by Newest models. I check it a few times a day and you’ll see new models release single-digit hours to days before press releases can catch up.

I like Matt Williams' YouTube Channel a lot. It’s the one channel I come back to, and I find that I always learn something from it. His videos tend to be ten to twenty minutes long, which is about right since the material is so dense.

Start with his Optimize Your AI Models videos. They’re a lot to fit in your brain, but they’re a great starting point.

Simon Willison’s Weblog is good too.

Python

I’ll have to write a few posts on how I’m using LLMs with code, but Simon’s LLM is a good general-purpose AI hammer if you need one.

As of last week, I’m using Pydantic AI instead of OpenAI’s or Anthropic’s Python libraries. Pydantic AI will install both of those libraries for you, but I find it to be 100% better and easier to switch between models using it than LangChain (not linked) or anything else I have tried.
Wednesday January 29, 2025
Django
,
Python
,
LLM

🤖 I released files-to-claude-xml and new development workflows

After months of using and sharing this tool via a private gist, I finally carved out some time to release files-to-claude-xml.

Despite my social media timeline declaring LLMs dead earlier today, I have used Claude Projects and Artifacts.

My workflow is to copy a few files into a Claude Project and then create a new chat thread where Claude will help me write tests or build out a few features.

My files-to-claude-xml script grew out of some research I did where I stumbled on their Essential tips for long context prompts which documents how to get around some file upload limits which encourages uploading one big file using Claude’s XML-like format.

With files-to-claude-xml, I build a list of files that I want to import into a Claude Project. Then, I run it to generate a _claude.xml file, which I drag into Claude. I create a new conversation thread per feature, then copy the finished artifacts out of Claude once my feature or thread is complete.

After the feature is complete, I delete the _claude.xml file from my project and replace it with an updated copy after I re-run files-to-claude-xml.

Features on the go

One bonus of using Claude Projects is that once everything is uploaded, I can use the Claude iOS app as a sort-of notes app and development tool. I can start parallel conversation threads and have it work on new ideas and features. Once I get back to my desktop, I can pull these chat conversations up, and if I like the direction of the feature, I might use them. If not, I have wasted no time or effort on them. This also serves as a nice ToDo list.

New workflows

I am working on side projects further using this methodology. Sometimes, I would like to work on something casually while watching Netflix, but my brain shuts off from coding during the day. Instead of feeling bad that I haven’t added share links to a website or some feature I meant to add last week, I can pair Claude to work on it with me.

I can also get more done with my lunch hours on projects like DjangoTV than I could have otherwise. Overall, I’m happy to have an on-demand assistant to pair with and work on new features and ideas.

It’s also quicker to try out new ideas and projects that I would have needed to make time for.

Alternatives

Simon Willison wrote files-to-prompt, which I think is also worth trying. I contributed to the discussion, feedback, and document structure for the --cxml feature.

I wrote files-to-claude-xml before Simon had cxml support and hoped to not release my version.

However, after trying it out on several projects, my ignore/exclude list grew more significant than the files that I wanted to include in my project to send to Claude. I found it easier to generate a list of files to pass to mine instead of maintaining a long list to exclude.

Saturday October 12, 2024

Python

Ollama

LLM

Today I Learned

🦙 Ollama Llama 3.1 Red Pajama

For a few weeks, I told friends I was excited to see if the new Llama 3.1 release was as good as it was being hyped.

Yesterday, Llama 3.1 was released, and I was impressed that the Ollama project published a release to Homebrew and had the models ready to use.

➜ brew install ollama

➜ ollama serve

# (optionally) I run Ollama as a background service
➜ brew services start ollama

# This takes a while (defaults to the llama3.1:8b model)
➜ ollama pull llama3.1:latest 

# (optional) This takes a longer time
➜ ollama pull llama3.1:70b

# (optional) This takes so long that I skipped it and ordered a CAT6 cable...
# ollama pull llama3.1:405b

To use chat with the model, you use the same ollama console command:

➜ ollama run llama3.1:latest
>>> how much is 2+2?
The answer to 2 + 2 is:
4!```

## Accessing Ollama Llama 3.1 with Python

The Ollama project has an [`ollama-python`](https://github.com/ollama/ollama-python) library, which I use to build applications. 

My demo has a bit of flare because there are a few options, like `--stream,` that improve the quality of life while waiting for Ollama to return results. 

```python
# hello-llama.py
import typer

from enum import Enum
from ollama import Client
from rich import print


class Host(str, Enum):
    local = "http://127.0.0.1:11434"
    the_office = "http://the-office:11434"


class ModelChoices(str, Enum):
    llama31 = "llama3.1:latest"
    llama31_70b = "llama3.1:70b"


def main(
    host: Host = Host.local,
    local: bool = False,
    model: ModelChoices = ModelChoices.llama31,
    stream: bool = False,
):
    if local:
        host = Host.local

    client = Client(host=host.value)

    response = client.chat(
        model=model.value,
        messages=[
            {
                "role": "user",
                "content": \
                    "Please riff on the 'Llama Llama Red Pajama' book but using AI terms like the 'Ollama' server and the 'Llama 3.1' model."
                    "Instead of using 'Llama Llama', please use 'Ollama Llama 3.1'.",
            }
        ],
        stream=stream,
    )

    if stream:
        for chunk in response:
            print(chunk["message"]["content"], end="", flush=True)
        print()

	else:
        print(f"[yellow]{response['message']['content']}[/yellow]")

if __name__ == "__main__":
    typer.run(main)

Some of my family’s favorite books are the late Anna Dewdney’s Llama Llama books. Please buy and support their work. I can’t read Llama 3.1 and Ollama without considering the “Llama Llama Red Pajama” book.

To set up and run this:

# Install a few "nice to have" libraries
➜ pip install ollama rich typer

# Run our demo
➜ python hello-llama.py --stream

Here's a riff on "Llama Llama Red Pajama" but with an AI twist:

**Ollama Llama 3.1, Ollama Llama 3.1**
Mama said to Ollama Llama 3.1,
"Dinner's done, time for some learning fun!"
But Ollama Llama 3.1 didn't wanna play
With the data sets and algorithms all day.

He wanted to go out and get some rest,
And dream of neural nets that were truly blessed.
But Mama said, "No way, young Ollama Llama 3.1,
You need to train on some more NLP."

Ollama Llama 3.1 got so mad and blue
He shouted at the cloud, "I don't wanna do this too!"
But then he remembered all the things he could see,
On the Ollama server, where his models would be.

So he plugged in his GPU and gave a happy sigh
And trained on some texts, till the morning light shone high.
He learned about embeddings and wordplay too,
And how to chat with humans, that's what he wanted to do.

**The end**

Connecting to Ollama

I have two Macs running Ollama and I use Tailscale to bounce between them from anywhere. When I’m at home upstairs it’s quicker to run a local instance. When I’m on my 2019 MacBook Pro it’s faster to connect to the office.

The only stumbling block I ran into was needing to set a few ENV variables setup so that Ollama is listening on a port that I can proxy to. This was frustrating to figure out, but I hope it saves you some time.

➜ launchctl setenv OLLAMA_HOST 0.0.0.0:11434
➜ launchctl setenv OLLAMA_ORIGINS http://*

# Restart the Ollama server to pick up on the ENV vars
➜ brew services restart ollama

Simon Willison’s LLM tool

I also like using Simon Willison’s LLM tool, which supports a ton of different AI services via third-party plugins. I like the llm-ollama library, which allows us to connect to our local Ollama instance.

When working with Ollama, I start with the Ollama run command, but I have a few bash scripts that might talk to OpenAI or Claude 3.5, and it’s nice to keep my brain in the same tooling space. LLM is useful for mixing and matching remote and local models.

To install and use LLM + llm-ollama + Llama 3.1.

Please note that the Ollama server should already be running as previously outlined.

# Install llm
➜ brew install llm

# Install llm-ollama
➜ llm install llm-ollama

# List all of models from Ollama
➜ llm ollama list-models

# 
➜ llm -m llama3.1:latest "how much is 2+2?"
The answer to 2 + 2 is:

4

Bonus: Mistral Large 2

While I was working on this post, Mistral AI launched their Large Enough: Mistral Large 2 model today. The Ollama project released support for the model within minutes of its announcement.

The Mistral Large 2 release is noteworthy because it outperforms Lllama 3.1’s 405B parameter model and is under 1/3 of the size. It is also the second GPT-4 class model release in the last two days.

Check out Simon’s post for more details and another LLM plugin for another way to access it.

Wednesday July 24, 2024

Django

Python

LLM

GitHub Copilot CLI uses Claude Sonnet 4.5 by default

The “You are” trick

Use Xml tags in your prompts

Natural language prompts are becoming the universal programming interface

LLM

Today I Learned

LLM

Claude Code

AI Development Workflows

AI Industry Perspectives

The Future Is Now (And It’s Iterative)

LLM

Finally, AI That Closes the Last 10% Gap

LLM

Today I Learned

tl;dr My Workflow

Apple Voice Memos

MacWhisper

ChatGPT

Grammarly

Obsidian

Publishing

Improvements

Django

Python

LLM

UV

What about funding?

Thinking outside the box

Python

LLM

Today I Learned

Accessing Claude

Organization by experience level

Claude’s developer APIs

Conclusion: Claude as a Spectrum

Django

Python

LLM

UV

Today I Learned

Our Cheat Codes

More areas to explore

Python

Ollama

LLM

Mac Apps I use:

Baseline rules for running a model

Keeping up

Python

Django

Python

LLM

Features on the go

New workflows

Alternatives

Python

Ollama

LLM

Today I Learned

Connecting to Ollama

Simon Willison’s LLM tool

Bonus: Mistral Large 2