-
Python
,Docker
,UV
📓 My notes on publishing a Python package with UV and building a custom GitHub Action for files-to-claude-xml
My new Python application files-to-claude-xml is now on PyPI, which means they are packaged and pip installable. My preferred way of running
files-to-claude-xml
is via UV’s tool run, which will install it if it still needs to be installed and then execute it.$ uv tool run files-to-claude-xml --version
Publishing on PyPi with UV
UV has both build and publish commands, so I took them for a spin today.
uv build
just worked, and a Python package was built.When I tried
uv publish
, it prompted me for some auth settings for which I had to log in to PyPI to create a token.I added those to my local ENV variables I manage with direnv.
export UV_PUBLISH_PASSWORD=<your-PyPI-token-here> export UV_PUBLISH_USERNAME=__token__
Once both were set and registered,
uv publish
published my files on PyPI.GitHub Action
To make
files-to-claude-xml
easier to run on GitHub, I created a custom action to build a_claude.xml
from the GitHub repository.To use this action, I wrote this example workflow, which runs from files-to-claude-xml-example
name: Convert Files to Claude XML on: push jobs: convert-to-xml: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Convert files to Claude XML uses: jefftriplett/files-to-claude-xml-action@main with: files: | README.md main.py output: '_claude.xml' verbose: 'true' - name: Upload XML artifact uses: actions/upload-artifact@v4 with: name: claude-xml path: _claude.xml
My GitHub action is built with a
Dockerfile
, which installsfiles-to-claude-xml
.# Dockerfile FROM ghcr.io/astral-sh/uv:bookworm-slim ENV UV_LINK_MODE=copy RUN --mount=type=cache,target=/root/.cache/uv \ --mount=type=bind,source=uv.lock,target=uv.lock \ --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ uv sync --frozen --no-install-project WORKDIR /app ENTRYPOINT ["uvx", "files-to-claude-xml"]
To turn a GitHub repository into a runnable GitHub Action, an
action.yml
file needs to exist in the repository. This file describes the input arguments and whichDockerfile
or command to run.# action.yml name: 'Files to Claude XML' description: 'Convert files to XML format for Claude' inputs: files: description: 'Input files to process' required: true type: list output: description: 'Output XML file path' required: false default: '_claude.xml' verbose: description: 'Enable verbose output' required: false default: 'false' version: description: 'Display the version number' required: false default: 'false' runs: using: 'docker' image: 'Dockerfile' args: - ${{ join(inputs.files, ' ') }} - --output - ${{ inputs.output }} - ${{ inputs.verbose == 'true' && '--verbose' || '' }} - ${{ inputs.version == 'true' && '--version' || '' }}
Overall, this works. Claude’s prompting helped me figure it out, which felt fairly satisfying given the goal of
files-to-claude-xml
. -
Python
,LLM
🤖 I released files-to-claude-xml and new development workflows
After months of using and sharing this tool via a private gist, I finally carved out some time to release files-to-claude-xml.
Despite my social media timeline declaring LLMs dead earlier today, I have used Claude Projects and Artifacts.
My workflow is to copy a few files into a Claude Project and then create a new chat thread where Claude will help me write tests or build out a few features.
My
files-to-claude-xml
script grew out of some research I did where I stumbled on their Essential tips for long context prompts which documents how to get around some file upload limits which encourages uploading one big file using Claude’s XML-like format.With
files-to-claude-xml
, I build a list of files that I want to import into a Claude Project. Then, I run it to generate a_claude.xml
file, which I drag into Claude. I create a new conversation thread per feature, then copy the finished artifacts out of Claude once my feature or thread is complete.After the feature is complete, I delete the
_claude.xml
file from my project and replace it with an updated copy after I re-runfiles-to-claude-xml
.Features on the go
One bonus of using Claude Projects is that once everything is uploaded, I can use the Claude iOS app as a sort-of notes app and development tool. I can start parallel conversation threads and have it work on new ideas and features. Once I get back to my desktop, I can pull these chat conversations up, and if I like the direction of the feature, I might use them. If not, I have wasted no time or effort on them. This also serves as a nice ToDo list.
New workflows
I am working on side projects further using this methodology. Sometimes, I would like to work on something casually while watching Netflix, but my brain shuts off from coding during the day. Instead of feeling bad that I haven’t added share links to a website or some feature I meant to add last week, I can pair Claude to work on it with me.
I can also get more done with my lunch hours on projects like DjangoTV than I could have otherwise. Overall, I’m happy to have an on-demand assistant to pair with and work on new features and ideas.
It’s also quicker to try out new ideas and projects that I would have needed to make time for.
Alternatives
Simon Willison wrote files-to-prompt, which I think is also worth trying. I contributed to the discussion, feedback, and document structure for the
--cxml
feature.I wrote
files-to-claude-xml
before Simon had cxml support and hoped to not release my version.However, after trying it out on several projects, my ignore/exclude list grew more significant than the files that I wanted to include in my project to send to Claude. I found it easier to generate a list of files to pass to mine instead of maintaining a long list to exclude.
-
Python
,UV
⚙️ UV with GitHub Actions to run an RSS to README project
For my personal GitHub profile, I list my activities, affiliations, and the latest updates from some of my projects.
Historically, I have used JasonEtco/rss-to-readme GitHub Action to fetch a few RSS feeds or two and to update my README a few times a day.
Overall, I’m happy with this setup. I used it on the Django News GitHub Organization to pull in newsletter issues, jobs, and the latest videos from our various projects. When I tried to install rss-to-readme in our repo, I was getting node12 errors. (Have I mentioned how much I loathe node/npm?).
Instead of forking rss-to-readme and trying to figure out how to upgrade it, I used this as an excuse to “pair program” with Claude. We quickly built out a prototype using Python and the feedparser library.
I would share the chat log, but it’s mostly me trying out a few different ways to invoke it before I settle on the finished approach. See the source code over on GitHub if you are curious: https://github.com/django-news/.github/blob/main/fetch-rss.py
Once I had a working Python script that could fetch an RSS file and modify the README, I decided to run/deploy it using UV to see how minimal I could build out the GitHub Action.
GitHub Action
To run our
fetch-rss.py
script, we have four steps:actions/checkout
Get a git checkout of our project.astral-sh/setup-uv
Setup UV also installs Pythons for us. As a bonus, we enabled UV’s cache support, which will run much faster in the future unless we change something in our fetch-rss.py file.- Run
uv run fetch-rss.py ...
to fetch our RSS feeds and write them to disk.uv run
installs any dependencies and caches them before ourfetch-rss.py
runs. stefanzweifel/git-auto-commit-action
If our README.md file has changed, save our changes and commit them back to git and into our README.
Our
schedule.yml
GitHub Action workflow runs twice daily or whenever we push a new change to our repo. We also setworkflow_dispatch,
which gives us a button to run the script manually.# .github/workflows/schedule.yml name: Update README on: push: branches: - main schedule: # Once a day at 12 AM - cron: 0 12 * * * workflow_dispatch: jobs: update: runs-on: ubuntu-latest permissions: contents: write steps: - uses: actions/checkout@v4 - name: Install uv uses: astral-sh/setup-uv@v3 with: enable-cache: true cache-dependency-glob: | *.py - name: Fetch our Feeds run: | # Fetch latest Django News Newsletter entries uv run fetch-rss.py \ --section=news \ --readme-path=profile/README.md \ https://django-news.com/issues.rss - uses: stefanzweifel/git-auto-commit-action@v5 with: commit_message: ":pencil: Updates README"
Results
Overall, I’m pleased with this solution. If I wanted to spend more time on it or re-use this workflow, I might turn it into a GitHub Action workflow so that we can call:
django-news/rss-to-readme
to use in other projects. For now, this is fine.I’m happy with the
astral-sh/setup-uv
anduv run
steps because they save me from having to set up Python and then install our project dependencies as separate steps.I normally shy away from running Python workflows like this in GitHub Actions because they involve a lot of slow steps. This entire workflow takes 16 to 20 seconds to run, which feels fast to me.
-
Django
,Python
,UV
🤠 UV Roundup: Five good articles and a pre-commit tip
I have written quite a bit about UV on my micro blog, and I am happy to see more and more people adopt it. I have stumbled on so many good articles recently that I wanted to share them because every article points out something new or different about why UV works well for them.
If you are new to UV, it’s a new tool written by Astral, the creators of Ruff.
I like UV because it replaces, combines, or complements a bunch of Python tools into one tool and user developer experience without forcing a UV way of doing it. UV effectively solves the question, “Why do I need another Python tool?” to do everyday Python tasks.
Some reason I like UV after using it for months:
- It’s a faster pip and is really, really fast
- It can install and manage Python versions
- It can run and install Python scripts
- It can run single-file Python scripts along with their dependencies
- It can handle project lock files
While some people don’t care about UV being fast, it’s shaved minutes off my CI builds and container rebuilds, which means it has also saved me money and energy resources.
Overall thoughts on UV
Oliver Andrich’s UV — I am (somewhat) sold takes the approach of only using UV to set up a new Python environment. Oliver uses UV to install Python, aliases to call Python, and UV tool install to set up a few global utilities.
Using UV with Django
Anže Pečar’s UV with Django shows how to use UV to set up a new project with Django.
Switching from pyenv to UV
Will Guaraldi Kahn-Greene’s Switching from pyenv to uv was relatable for me because I also use pyenv, but I plan to slowly migrate to using only UV. I’m already halfway there, but I will have pyenv for my legacy projects for years because many aren’t worth porting yet.
Using UV and managing with Ansible
Adam Johnson’s Python: my new uv setup for development taught me to use
uv cache prune
to clean up unused cache entries and shows how he manages his UV setup using Ansible.Some notes on UV
Simon Willison’s Notes on UV is an excellent summary of Oliver’s notes.
A parting UV tip
If you are a pre-commit fan hoping for a version that supports UV, the
pre-commit-uv
project does just that. I started updating my justfile recipes to bakejust lint
to the followinguv run
command, which speeds up running and installing pre-commit significantly.$ uv run --with pre-commit-uv pre-commit run --all-files pre-commit-uv
If you are attending DjangoCon US…
If you are attending DjangoCon US and want to talk UV, Django, Django News, Django Packages, hit me up while you are there.
I’ll be attending, volunteering, organizing, sponsoring, and sprinting around the venue in Durham, NC, for the next week starting this Friday.
We still have online and in-person tickets, but not much longer!
-
Django
,Python
🚫 Stop scheduling security updates and deprecating major features over holidays
I know people outside the US 🙄 at this, but please stop releasing major security updates and backward incompatible changes over major US, international, and religious holidays.
Given that major security updates are embargoed and scheduled weeks and months in advance, it’s essential to coordinate and avoid conflicts. A simple check of the calendar before scheduling announcements can prevent such issues.
Even if you give everyone two weeks' notice, aka what GitHub just did, wait to schedule them for release over a holiday weekend.
Historically, the Python and Django communities have also been guilty of this, so I’m not just finger-pointing at GitHub. We can all do better here.
Update: 100% unrelated to this: Django security releases issued: 5.1.1, 5.0.9, and 4.2.16. Thank you, Natalia (and Sarah) for scheduling this after the US is back from a major holiday.
-
Python
,UV
🚜 Using Claude 3.5 Sonnet to refactor one of Brian Okken's Python projects
Brian Okken posted and published his Top pytest Plugins script and then a follow-up post, Finding the top pytest plugins, which was pretty cool.
I have written a few throw-away scripts, which William Vincent wrote about and updated a few times in the Top 10 Django Third-Party Packages (2024) and The 10 Most-Used Django Packages (2024).
These efforts are powered by Hugo van Kemenade’s excellent Top PyPI Packages.
This inspired me to fork Brian’s top-pytest-plugins project, which I updated to support passing in other package names like “django” to get a rough estimate of monthly package downloads.
The refactored project is jefftriplett/top-python-packages.
Please note: Looking at the package name doesn’t scale as well for projects that have their own Trove classifiers. For a project like pytest, it works well. Many of the top packages may not even have Django in their name for a project like Django. Some projects may even actively discourage a project from using their project in their package’s name for trademark reasons. So, YMMV applies here.
Prompts
I added
uv run
support, which I have written about a lot lately.I also copied the
top_pytest.py
file into a Claude 3.5 Sonnet session, and I let it handle the whole refactor. It even handled adding the PEP 723 new package dependencies without me asking it to.In case it’s useful to anyone, here are my prompts:
## Prompt: Please update this script to use a rich table. ## Prompt: Please update the table styles to be ascii so I can copy and paste it into a markdown doc ## Prompt: Please remove the description column ## Prompt: Please change all PyTest and pytest references to Django and django ## Prompt: Please add back `if 'django' in project.lower() and 'django' != project.lower():` ## Prompt: please remove the \*# Export to markdown section. I can just pipe the output \* ## Prompt: Please add the typer library. ## Prompt: Please remove days and limit ## Prompt: Please refactor the script to allow me to pass the package name instead of django. You can default to django though. This way I can pass pytest or flask or other projects. ## Prompt: Please change the default Table box type to MARKDOWN
Outro
I don’t usually write about Claude or prompts, but the tool has been handy lately.
If you have had some similar successes, let me know. I have been exploring some rabbit holes, and it’s changing the way I approach solving problems.
-
Python
,UV
📓 UV Run Django Notes
I wanted to know how hard it would be to turn one of my django-startproject projects into a
uv run
friendly project. As it turns out, it worked, and the steps were more than reasonable.Before the PEP 723’ing…
I started with a fairly vanilla
manage.py
that Django will give you after runningpython -m manage startproject
."""Django's command-line utility for administrative tasks.""" import os import sys def main(): """Run administrative tasks.""" os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings") try: from django.core.management import execute_from_command_line except ImportError as exc: raise ImportError( "Couldn't import Django. Are you sure it's installed and " "available on your PYTHONPATH environment variable? Did you " "forget to activate a virtual environment?" ) from exc execute_from_command_line(sys.argv) if __name__ == "__main__": main()
shebang
Then we add
#!/usr/bin/env -S uv run
to the top of ourmanage.py
file.Next, we make our
manage.py
executable and try to run it.$ chmod +x manage.py $ ./manage.py ModuleNotFoundError: No module named 'django'
Our script ran, but Python couldn’t find Django. To tell our script to install Django, we can use
uv add—- script
to add it.$ uv add --script manage.py django Updated `manage.py` $ ./manage.py ... Type 'manage.py help <subcommand>' for help on a specific subcommand. Available subcommands: [django] check compilemessages createcachetable dbshell diffsettings dumpdata flush inspectdb loaddata makemessages makemigrations migrate optimizemigration runserver sendtestemail shell showmigrations sqlflush sqlmigrate sqlsequencereset squashmigrations startapp startproject test testserver Note that only Django core commands are listed as settings are not properly configured (error: No module named 'environs').
Django worked as expected this time, but Python could not find a few third-party libraries I like to include in my projects.
To add these, I passed the other four to
uv add --script
which will add them to the project.$ uv add --script manage.py django-click "environs[django]" psycopg2-binary whitenoise Updated `manage.py` ... $ ./manage.py ...
Our Django app’s
manage.py
works when we run it.After the PEP 723’ing…
After we installed our dependencies in our
manage.py
file, they were added to the top of the file between the///
blocks.#!/usr/bin/env -S uv run # /// script # requires-python = ">=3.10" # dependencies = [ # "django", # "django-click", # "environs[django]", # "psycopg2-binary", # "whitenoise", # ] # /// """Django's command-line utility for administrative tasks.""" import os import sys def main(): """Run administrative tasks.""" os.environ.setdefault("DJANGO_SETTINGS_MODULE", "config.settings") try: from django.core.management import execute_from_command_line except ImportError as exc: raise ImportError( "Couldn't import Django. Are you sure it's installed and " "available on your PYTHONPATH environment variable? Did you " "forget to activate a virtual environment?" ) from exc execute_from_command_line(sys.argv) if __name__ == "__main__": main()
-
Python
,UV
🐍 Python UV run with shebangs
This UV shebang trick that Simon Willison linked up is a nice pattern, and I plan to rebuild some of my one-off scripts in my dotfiles using it.
Here is a demo that will print “hello python” using the Python Branding colors using the Rich library while letting UV install and manage rich for you.
#!/usr/bin/env -S uv run # /// script # requires-python = ">=3.10" # dependencies = [ # "rich", # ] # /// from rich.console import Console from rich.theme import Theme python_theme = Theme( { "pyyellow": "#ffde57", "pyblue": "#4584b6", } ) console = Console(theme=python_theme) console.print("[pyyellow]hello[/pyyellow] [pyblue]python[/pyblue]", style="on #646464")
Assuming you have UV installed, and you save and
chmod +x
this file ashello-python.py
, then you should be able to run it via./hello-python.py.
I suspect I can more easily bootstrap new machines using this trick without fewer worries about polluting my global system packages.
-
Python
,UV
🐍 UV Updates and PEP 723: Simplifying Python Packaging and Scripting
The uv: Unified Python packaging update brings fresh air to the Python community, with several improvements streamlining the development process. One exciting addition is an early preview of PEP 723, also known as Single-file scripts.
The Single-file scripts feature particularly caught my attention due to its potential to simplify the distribution and execution of small Python projects. Streamlining the process is highly appealing to someone who frequently creates GitHub Gists and shares them privately and publicly.
With this new feature, I can now instruct users to run
uv run main.py
without explaining what avenv
orvirtualenv
is, plus a long list of requirements that need to be passed topip install
.I had the opportunity to test this feature over lunch today. While adding libraries to the script was straightforward, I encountered a few hurdles when I forgot to invoke
uv run
in my virtual environment (venv). This makes sense, given that it’s a new habit, but it highlights the importance of adapting to changes in our development workflow.Overall, the UV: Unified Python packaging update and the introduction of Single-file scripts mark a significant step in simplifying Python development. As developers become more familiar with these improvements, we expect increased adoption and smoother collaboration on small-scale projects.
Bonus Example
I looked through some of my recent visits, and one I recently shared with a few conference organizer friends was a one-off script I used to read several YouTube video JSON files that I’m using to bootstrap another project. It was the first time I used DuckDB to make quick work of reading data from a bunch of JSON files using SQL.
Overall, I was happy with DuckDB and what PEP 723 might bring to the future of Python apps, even if my example only does a little.
# To run this application, use: # uv run demo-duckdb.py # # /// script # requires-python = ">=3.10" # dependencies = [ # "duckdb", # "rich", # "typer", # ] # /// import duckdb import typer from rich import print def main(): result = duckdb.sql("SELECT id,snippet FROM read_json('json/*.json')").fetchall() for row in result: id, snippet = row print("-" * 80) print(f"{id=}") print(f"{snippet['channelTitle']=}") print(f"{snippet['title']=}") print(f"{snippet['publishedAt']=}") print(snippet["description"]) print(snippet["thumbnails"].get("maxres") or snippet.get("standard")) print() if __name__ == "__main__": typer.run(main)
Overall, the future is bright with UV and PEP 723 may bring us. I’m excited to have more one-file Python apps that are easier to share and run with others.
PEP 723 also opens the door to turning a one-file Python script into a runnable Docker image that doesn’t even need Python on the machine or opens the door for Beeware and Briefcase to build standalone apps.
-
Django
,Python
⬆️ Which Django and Python versions should I be using today?
Django 5.1 was released, and I was reminded of the article I wrote earlier this year about Choosing the Right Python and Django Versions for Your Projects.
While I encouraged you to wait until the second, third, or even fourth patch release of Django and Python before upgrading, I received a bit of pushback. One interesting perspective claimed that if everyone waits to upgrade, we don’t find critical bugs until the later versions. While that may be plausible, I don’t believe that the dozens of people who read my blog will be swayed by my recommendation to wait for a few patch releases.
I could have emphasized the potential risks of not testing early. Please start testing during the alpha and release candidate phase so that when Django 5.1 is released, your third-party applications will be ready and working on launch day, minimizing the risk of last-minute issues.
Today, I tried to upgrade Django Packages to run on Django 5.1 to see if our test suite would run on Django 5.1, and it very quickly failed in CI due to at least one package not supporting 5.1 yet. Even if it had passed, I’m 90% sure another package would have failed because that’s the nature of running a new major Django or Python release on day one. Even if the third-party package is ready, the packaging ecosystem needs time to catch up.
Which version of Django should I use today?
I’m sticking with Django 5.0 until Django 5.1’s ecosystem has caught up. I plan to update the third-party packages I help maintain to have Django 5.1 support. After a few patch releases of Django 5.1 have come out and the ecosystem has time to catch up, I will try to migrate again.
Which version of Python should I use today?
I’m starting new projects on Python 3.12, with a few legacy projects still being done on Python 3.11. While I am adding Django 5.1 support, I plan to add Python 3.13 support in my testing matrixes to prepare everything for Python 3.13’s release this fall.
Office hours
I plan to spend some of my Office Hours this week working on Django 5.1 and Python 3.13 readiness for projects I maintain. Please join me if you have a project to update or would like some light-hearted banter to end your week.
-
Python
🗳️ My thoughts on the PSF Election results
A few weeks ago, I wrote about this year’s PSF Election, three proposed bylaws changes, and how I intended to vote. I’m happy that the membership overwhelmingly approved all three proposed bylaw changes. Here is this year’s results.
Merging Contributing and Managing member classes
This change is a good step toward consolidating two membership classes and a commitment to acknowledging that all community contributions are important, not just code contributions.
Simplifying the voter affirmation process by treating past voting activity as intent to continue voting
If you voted in last year’s election, there are fewer barriers to voting in the next election. With a 76% turnout this year, I suspect next year will still yield over a 50% voter turnout, and I suspect turnout will continue to be high.
Allow for removal of Fellows by a Board vote in response to Code of Conduct violations, removing the need for a vote of the membership
This one means the most to me. When I joined the board, our Code of Conduct was barely two paragraphs long and said little. We rewrote it and formed the PSF Code of Conduct workgroup. From today forward, we can appreciate that the Python Code of Conduct applies to everyone.
Overall
We also gained three new directors, including two returning directors. This election may be the first time we have had an election in which no one running from North America made it on the board. (Possibly Europe, too, but I didn’t dive as deep to verify that.) Either way, this is a noteworthy milestone.
I’m proud of the Python community for embracing our Code of Conduct and membership changes. A few of these were overdue, but updating the voter affirmation process is an excellent proactive step and a shift for the board.
I also want to thank Débora Azevedo, the PSF’s vice chair-elect and our outbound director. I was impressed with Débora when we served on the board together, and I thought she brought valuable insights. When she put her name forward to run for vice chair, I was impressed because it’s an intimidating group to put yourself out there, and I thought Débora managed it well.
Resources
-
Django
,Python
🐘 Django Migration Operations aka how to rename Models
Renaming a table in Django seems more complex than it is. Last week, a client asked me how much pain it might be to rename a Django model from Party to Customer. We already used the model’s
verbose_name
, so it has been referencing the new name for months.Renaming the model should be as easy as renaming the model while updating any foreign key and many-to-many field references in other models and then running Django’s
make migrations
sub-command to see where we are at.The main issue with this approach is that Django will attempt to create a new table first, update model references, and then drop the old table.
Unfortunately, Django will either fail mid-way through this migration and roll the changes back or even worse, it may complete the migration only for you to discover that your new table is empty.
Deleting data is not what we want to happen.
As it turns out, Django supports a
RenameModel
migration option, but it did not prompt me to ask if we wanted to rename Party to Customer.I am also more example-driven, and the Django docs don’t have an example of how to use
RenameModel
. Thankfully, this migration operation is about as straightforward as one can imagine:class RenameModel(old_model_name, new_model_name)
I re-used the existing migration file that Django created for me. I dropped the
CreateModel
andDeleteModel
operations, added aRenameField
operation, and kept theRenameField
operations which resulted in the following migration:from django.db import migrations class Migration(migrations.Migration): dependencies = [ ('resources', '0002_alter_party_in_the_usa'), ] operations = [ migrations.RenameModel('Party', 'Customer'), migrations.RenameField('Customer', 'party_number', 'customer_number'), migrations.RenameField('AnotherModel', 'party', 'customer'), ]
The story’s moral is that you should always check and verify that your Django migrations will perform as you expect before running them in production. Thankfully, we did, even though glossing over them is easy.
I also encourage you to dive deep into the areas of the Django docs where there aren’t examples. Many areas of the docs may need examples or even more expanded docs, and they are easy to gloss over or get intimidated by.
You don’t have to be afraid to create and update your migrations by hand. After all, Django migrations are Python code designed to give you a jumpstart. You can and should modify the code to meet your needs. Migration Operations have a clean API once you dig below the surface and understand what options you have to work with.
-
Python
🦆 DuckDB may be the tool you didn't know you were missing
🤔 I haven’t fully figured out DuckDB yet, but it’s worth trying out if you are a Python dev who likes to work on data projects or gets frequently tasked with data import projects.
DuckDB is a fast database engine that lets you read CSV, Parquet, and JSON files and query them using SQL. Instead of importing data into your database, DuckDB enables you to write SQL and run it against these file types.
I have a YouTube to frontmatter project that can read a YouTube playlist and write out each video to a markdown file. I modified the export script to save the raw JSON output to disk.
I used DuckDB to read a bunch of JSON files using the following script:
import duckdb def main(): result = duckdb.sql("SELECT id,snippet FROM read_json('data/*.json')").fetchall() for row in result: id, snippet = row print(f"{id=}") print(snippet["channelTitle"]) print(snippet["title"]) print(snippet["publishedAt"]) print(snippet["description"]) print() if __name__ == "__main__": main()
This script accomplishes several things:
- It reads over 650 JSON files in about one second.
- It uses SQL to query the JSON data directly.
- It extracts specific fields (id and snippet) from each JSON file.
Performance and Ease of Use
The speed at which DuckDB processes these files is remarkable. In traditional setups, reading and parsing this many JSON files could take significantly longer and require more complex code.
When to Use DuckDB
DuckDB shines in scenarios where you need to:
- Quickly analyze data in files without a formal import process.
- Perform SQL queries on semi-structured data (like JSON)
- Process large datasets efficiently on a single machine.
Conclusion
DuckDB is worth trying out in your data projects. If you have a lot of data and you need help with what to do with it, being able to write SQL against hundreds of files is powerful and flexible.
-
Django
,Python
Django Extensions is useful even if you only use show_urls
Yes, Django Extensions package is worth installing, especially for its show_urls command, which can be very useful for debugging and understanding your project’s URL configurations.
Here’s a short example of how to use it because I sometimes want to include a link to the Django Admin in a menu for staff users, and I am trying to remember what name I need to reference to link to it.
First, you will need to install it via:
pip install django-extensions # or if you prefer using uv like me: uv pip install django-extensions
Next, you’ll want to add
django_extensions
to yourINSTALLED_APPS
in yoursettings.py
file:INSTALLED_APPS = [ ... "django_extensions", ]
Finally, to urn the
show_urls
management command you may do some by running yourmanage.py
script and passing it the following option:$ python -m manage show_urls
Which will give this output:
$ python -m manage show_urls | grep admin ... /admin/ django.contrib.admin.sites.index admin:index /admin/<app_label>/ django.contrib.admin.sites.app_index admin:app_list /admin/<url> django.contrib.admin.sites.catch_all_view # and a whole lot more...
In this case, I was looking for
admin:index
which I can now add to my HTML document this menu link/snippet:... <a href="{% url 'admin:index' %}">Django Admin</a> ...
What I like about this approach is that I can now hide or rotate the url pattern I’m using to get to my admin website, and yet Django will always link to the correct one.
-
Python
,Justfiles
,Docker
🐳 Using Just and Compose for interactive Django and Python debugging sessions
When I wrote REST APIs, I spent weeks and months writing tests and debugging without looking at the front end. It’s all JSON, after all.
For most of my projects, I will open two or three tabs. I’m running Docker Compose in tab one to see the logs as I work. I’ll use the following casey/just recipe to save some keystrokes and to standardize what running my project looks like:
# tab 1 $ just up
In my second tab, I’ll open a shell that is inside my main web or app container so that I can interact with the environment, run migrations, and run tests.
We can nitpick the meaning of “console” here, but I tend to have another just recipe for “shell” which will open a Django shell using shell_plus or something more interactive:
# tab 2 $ just console
In my third tab, I’ll run a shell session for creating git branches, switching git branches, stashing git changes, and running my linter, which I prefer to run by hand.
# tab 3 $ echo "I'm boring"
Over the last year or two, the web has returned to doing more frontend work with Django and less with REST. Using
ipdb
, in my view, to figure out what’s going on has been really helpful. Trying to getipdb
to “just work” takes a few steps in my normal workflow.# tab 1 (probably) # start everything $ just start # stop our web container $ just stop web # start our web container with "--service-ports" # just start-web-with-debug
The only real magic here is using Docker’s
--service-ports
, which opens ports so we may connect to the openipdb
session when we open one in our view code.My main
justfile
for all of these recipes/workflows looks very similar to this:# justfile set dotenv-load := false @build *ARGS: docker compose build {{ ARGS }} # opens a console @console: docker compose run --rm --no-deps utility/bin/bash @down: docker compose down @start *ARGS: just up --detach {{ ARGS }} @start-web-with-debug: docker compose run --service-ports --rm web python -m manage runserver 0.0.0.0:8000 @stop *ARGS: docker compose down {{ ARGS }} @up *ARGS: docker compose up {{ ARGS }}
If you work on multiple projects, I encourage you to find patterns you can scale across them. Using Just, Make, shell scripts or even Python lightens the cognitive load when switching between them.
-
Python
🚜 Mastodon Bookmark exporter to Markdown/Frontmatter
I wrote a Mastodon Bookmark exporter tool over the weekend and decided to polish it up and release it tonight.
I wrote the tool to help me sort out Mastodon posts that I might bookmark to follow up on or write about. I bookmark posts on the go or even from bed, and when I have time, I will pull them back up.
The Mastodon Bookmark exporter tool reads your Mastodon bookmarks and exports the latest posts to a markdown/frontmatter file.
I’m releasing the project as a gist under the PolyForm Noncommercial License for personal reasons. If you have licensing questions, contact me directly or through www.revsys.com for commercial inquiries, and we can work something out.
-
Python
🐍 TIL build-and-inspect-python-package GitHub Action workflow plus some bonus Nox + Tox
TIL: via @joshthomas via @treyhunner via @hynek about the hynek/build-and-inspect-python-package GitHub Action.
This workflow makes it possible for GitHub Actions to read your Python version classifiers to build a matrix or, as Trey put it, “Remove so much junk” which is a pretty good example.
As a bonus, check out Hynek’s video on NOX vs TOX – WHAT are they for & HOW do you CHOOSE? 🐍
https://www.youtube.com/watch?v=ImBvrDvK-1U
Both Nox and Tox are great tools that automate testing in multiple Python environments.
I prefer Nox because it uses Python to write configs, which fits my brain better. I used Tox for over a decade, and there are some tox.ini files that I dread updating because I can only remember how I got here after a few hours of tinkering. That’s not Tox’s fault. I think that’s just a limitation of
ini
files and the frustration that comes from being unable to use Python when you have a complex matrix to try and sort out.I recommend trying them out and using the best tool for your brain. There is no wrong path here.
PS: Thank you, Josh, for bringing this to my attention.
-
Django
,Python
🤖 Super Bot Fight 🥊
In March, I wrote about my robots.txt research and how I started proactively and defensively blocking AI Agents in my 🤖 On Robots.txt. Since March, I have updated my Django projects to add more robots.txt rules.
Earlier this week, I ran across this Blockin’ bots. blog post and this example, the
mod_rewrite
rule blocks AI Agents via their User-Agent strings.<IfModule mod_rewrite.c> RewriteEngine on RewriteBase / # block “AI” bots RewriteCond %{HTTP_USER_AGENT} (AdsBot-Google|Amazonbot|anthropic-ai|Applebot|AwarioRssBot|AwarioSmartBot|Bytespider|CCBot|ChatGPT|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|DataForSeoBot|Diffbot|FacebookBot|FacebookBot|Google-Extended|GPTBot|ImagesiftBot|magpie-crawler|omgili|Omgilibot|peer39_crawler|PerplexityBot|YouBot) [NC] RewriteRule ^ – [F] </IfModule>
Since none of my projects use Apache, and I was short on time, I decided to leave this war to the bots.
Django Middleware
I asked ChatGPT to convert this snippet to a piece of Django Middleware called Super Bot Fight. After all, if we don’t have time to keep up with bots, then we could leverage this technology to help fight against them.
In theory, this snippet passed my eyeball test and was good enough:
# middleware.py from django.http import HttpResponseForbidden # List of user agents to block BLOCKED_USER_AGENTS = [ "AdsBot-Google", "Amazonbot", "anthropic-ai", "Applebot", "AwarioRssBot", "AwarioSmartBot", "Bytespider", "CCBot", "ChatGPT", "ChatGPT-User", "Claude-Web", "ClaudeBot", "cohere-ai", "DataForSeoBot", "Diffbot", "FacebookBot", "Google-Extended", "GPTBot", "ImagesiftBot", "magpie-crawler", "omgili", "Omgilibot", "peer39_crawler", "PerplexityBot", "YouBot", ] class BlockBotsMiddleware: def __init__(self, get_response): self.get_response = get_response def __call__(self, request): # Check the User-Agent against the blocked list user_agent = request.META.get("HTTP_USER_AGENT", "") if any(bot in user_agent for bot in BLOCKED_USER_AGENTS): return HttpResponseForbidden("Access denied") response = self.get_response(request) return response
To use this middleware, you would update your Django
settings.py
to add it to yourMIDDLEWARE
setting.# settings.py MIDDLEWARE = [ ... "middleware.BlockBotsMiddleware", ... ]
Tests?
If this middleware works for you and you care about testing, then these tests should also work:
import pytest from django.http import HttpRequest from django.test import RequestFactory from middleware import BlockBotsMiddleware @pytest.mark.parametrize("user_agent, should_block", [ ("AdsBot-Google", True), ("Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", False), ("ChatGPT-User", True), ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3", False), ]) def test_user_agent_blocking(user_agent, should_block): # Create a request factory to generate request instances factory = RequestFactory() request = factory.get('/', HTTP_USER_AGENT=user_agent) # Middleware setup middleware = BlockBotsMiddleware(get_response=lambda request: HttpResponse()) response = middleware(request) # Check if the response should be blocked or allowed if should_block: assert response.status_code == 403, f"Request with user agent '{user_agent}' should be blocked." else: assert response.status_code != 403, f"Request with user agent '{user_agent}' should not be blocked."
Enhancements
To use this code in production, I would normalize the
user_agent
andBLOCKED_USER_AGENTS
variables to be case-insensitive.I would also consider storing my list of user agents in a Django model or using a project like django-robots instead of a hard-coded Python list.
-
Django
,Python
🚜 Refactoring and fiddling with Django migrations for pending pull requests 🐘
One of Django’s most powerful features is the ORM, which includes a robust migration framework. One of Django’s most misunderstood features is Django migrations because it just works 99% of the time.
Even when working solo, Django migrations are highly reliable, working 99.9% of the time and offering better uptime than most web services you may have used last week.
The most common stumbling block for developers of all skill levels is rolling back a Django migration and prepping a pull request for review.
I’m not picky about pull requests or git commit history because I default to using the “Squash and merge” feature to turn all pull request commits into one merge commit. The merge commit tells me when, what, and why something changed if I need extra context.
I am pickier about seeing >2 database migrations for any app unless a data migration is involved. It’s common to see 4 to 20 migrations when someone works on a database feature for a week. Most of the changes tend to be fiddly, where someone adds a field, renames the field, renames it again, and then starts using it, which prompts another
null=True
change followed by ablank=True
migration.For small databases, none of this matters.
For a database with 10s or 100s of millions of records, these small changes can cause minutes of downtime per migration, which amounts to a throwaway change. While there are ways to mitigate most migration downtime situations, that’s different from my point today.
I’m also guilty of being fiddly with my Django model changes because I know I can delete and refactor them before requesting approval. The process I use is probably worth sharing because once every new client comes up.
Let’s assume I am working on Django News Jobs, and I am looking over my pull request one last time before I ask someone to review it. That’s when I noticed four migrations that could quickly be rebuilt into one, starting with my
0020*
migration in myjobs
app.The rough steps that I would do are:
# step 1: see the state of our migrations $ python -m manage showmigrations jobs jobs [X] 0001_initial ... [X] 0019_alter_iowa_versus_unconn [X] 0020_alter_something_i_should_delete [X] 0021_alter_uconn_didnt_foul [X] 0022_alter_nevermind_uconn_cant_rebound [X] 0023_alter_iowa_beats_uconn [X] 0024_alter_south_carolina_sunday_by_four # step 2: rollback migrations to our last "good" state $ python -m manage migrate jobs 0019 # step 3: delete our new migrations $ rm jobs/migrations/002* # step 4: rebuild migrations python -m manage makemigrations jobs # step 5: profit python -m manage migrate jobs
95% of the time, this is all I ever need to do.
Occasionally, I check out another branch with conflicting migrations, and I’ll get my local database in a weird state.
In those cases, check out the
--fake
(“Mark migrations as run without actually running them.") and--prune
(“Delete nonexistent migrations from thedjango_migrations
table.") options. The fake and prune operations saved me several times when mydjango_migrations
table was out of sync, and I knew that SQL tables were already altered.What not
squashmigrations
?Excellent question. Squashing migrations is wonderful if you care about keeping every or most of the operations each migration is doing. Most of the time, I do not, so I overlook it.
-
Django
,Python
⛳ Syncing Django Waffle feature flags
The django-waffle feature flag library is helpful for projects where we want to release and test new features in production and have a controlled rollout. I also like using feature flags for resource-intensive features on a website that we want to toggle off during high-traffic periods. It’s a nice escape hatch to fall back on if we need to turn off a feature and roll out a fix without taking down your website.
While Waffle is a powerful tool, I understand the challenge of keeping track of feature flags in both code and the database. It’s a pain point that many of us have experienced.
Waffle has a
WAFFLE_CREATE_MISSING_FLAGS=True
setting that we can use to tell Waffle to create any missing flags in the database should it find one. While this helps discover which flags our application is using, we need to figure out how to clean up old flags in the long term.The pattern I landed on combines storing all our known feature flags and a note about what they do in our main settings file.
# settings.py ... WAFFLE_CREATE_MISSING_FLAGS=True WAFFLE_FEATURE_FLAGS = { "flag_one": "This is a note about flag_one", "flag_two": "This is a note about flag_two", }
We will use a management command to sync every feature flag we have listed in our settings file, and then we will clean up any missing feature flags.
# management/commands/sync_feature_flags.py import djclick as click from django.conf import settings from waffle.models import Flag @click() def command(): # Create flags that don't exist for name, note in settings.WAFFLE_FEATURE_FLAGS.items(): flag, created = Flag.objects.update_or_create( name=name, defaults={"note": note} ) if created: print(f"Created flag {name} ({flag.pk})") # Delete flags that are no longer registered in settings for flag in Flag.objects.exclude(name__in=settings.FEATURE_FLAGS.keys()): flag.delete() print(f"Deleted flag {flag.name} ({flag.pk})")
We can use the
WAFFLE_CREATE_MISSING_FLAGS
settings as a failsafe to create any flags we might have accidently missed. They will stick out because they will not have a note associated with them.This pattern is also helpful in solving similar problems for scheduled tasks, which might also store their schedules in the database.
Check out this example in the Django Styleguide for how to sync Celery’s scheduled tasks.
-
Django
,Python
⬆️ The Upgrade Django project
Upgrade Django is a REVSYS project we created six years ago and launched three years ago.
The goal of Upgrade Django was to create a resource that made it easy to see at a glance which versions of the Django web framework are maintained and supported. We also wanted to catalog every release and common gotchas and link to helpful information like release notes, blog posts, and the tagged git branch on GitHub.
We also wanted to make it easier to tell how long a given version of Django would be supported and what phase of its release cycle it is in.
Future features
We have over a dozen features planned, but it’s a project that primarily serves its original purpose.
One feature on my list is that I’d love to see every backward incompatible change between two Django versions. This way, if someone knows their website is running on Django 3.2, they could pick Django 4.2 or Django 5.0 version and get a comprehensive list with links to everything they need to upgrade between versions.
Projects like Upgrade Django are fun to work on because once you collect a bunch of data and start working with it, new ways of comparing and presenting the information become more apparent.
If you have ideas for improving Upgrade Django that would be useful to your needs, we’d love to hear about them.
-
Django
,Python
Things I can never remember how to do: Django Signals edition
I am several weeks into working on a project with my colleague, Lacey Henschel. Today, while reviewing one of her pull requests, I was reminded how to test a Django Signal via mocking.
Testing Django signals is valuable to me because I need help remembering how to test a signal, and even with lots of effort, it never works. So bookmark this one, friends. It works.
Thankfully, she wrote it up in one of her TIL: How I set up
django-activity-stream
, including a simple test -
Django
,Python
On scratching itches with Python
Python is such a fantastic glue language. Last night, while watching March Madness basketball games, I had a programming itch I wanted to scratch.
I dusted off a demo I wrote several years ago. It used Python’s subprocess module, which strings together a bunch of shell commands to perform a git checkout, run a few commands, and then commit the results. The script worked, but I struggled to get it fully working in a production environment.
To clean things up and as an excuse to try out a new third-party package, I converted the script to use:
-
GitPython - GitPython is a Python library used to interact with Git repositories.
-
Shelmet - A shell power-up for working with the file system and running subprocess commands.
-
Django Q2 - A multiprocessing distributed task queue for Django based on Django-Q.
Using Django might have been overkill, but having a Repository model to work with felt nice. Django Q2 was also overkill, but if I put this app into production, I’ll want a task queue, and Django Q2 has a manageable amount of overhead.
GitPython was a nice improvement over calling git commands directly because their API makes it easier to see which files were modified and to check against existing branch names. I was happy with the results after porting my subprocess commands to the GitPython API.
The final package I used is a new package called Shelmet, which was both a nice wrapper around subprocess plus they have a nice API for file system operations in the same vein as Python’s Pathlib module.
Future goals
I was tempted to cobble together a GitHub bot, but I didn’t need one. I might dabble with the GitHub API more to fork a repo, but for now, this landed in a better place, so when I pick it back up again in a year, I’m starting in a good place.
If you want to write a GitHub bot, check out Mariatta’s black_out project.
-
-
Django
,Python
Automated Python and Django upgrades
Recently, I have been maintaining forks for several projects that are no longer maintained. Usually, these are a pain to update, but I have found a workflow that takes the edge off by leveraging pre-commit.
My process:
- Fork the project on GitHub to whichever organization I work with or my personal account.
- Check out a local copy of my forked copy with git.
- Install pre-commit
- Create a
.pre-commit-config.yaml
with ZERO formatting or lint changes. This file will only include django-upgrade and pyupgrade hooks.
We skip the formatters and linters to avoid unnecessary changes if we want to open a pull request in the upstream project. If the project isn’t abandoned, we will want to do that.
- For django-upgrade, change the—-target-version option to target the latest version of Django I’m upgrading to, which is currently 5.0.
- For pyupgrade, update the
python
settings underdefault_language_version
to the latest version of Python that I’m targetting. Currently, that’s 3.12.
The django-upgrade and pyupgrade projects attempt to run several code formatters and can handle most of the more tedious upgrade steps.
- Run
pre-commit autoupdate
to ensure we have the latest version of our hooks. - Run
pre-commit run --all-files
to runpyupgrade
anddjango-upgrade
on our project. - Run any tests contained in the project and review all changes.
- Once I’m comfortable with the changes, I commit them all via git and push them upstream to my branch.
Example
.pre-commit-config.yaml
configFrom my experience, less is more with this bane bones
.pre-commit-config.yaml
config file.# .pre-commit-config.yaml default_language_version: python: python3.12 repos: - repo: https://github.com/asottile/pyupgrade rev: v3.15.1 hooks: - id: pyupgrade - repo: https://github.com/adamchainz/django-upgrade rev: 1.16.0 hooks: - id: django-upgrade args: [--target-version, "5.0"]
If I’m comfortable that the project is abandoned, I’ll add ruff support with a more opinionated config to ease my maintenance burden going forward.
-
Python
Justfile Alfred Plugin
A few years back, I had a productivity conversation with Jay Miller about Alfred plugins, which led to him sharing his Bunch_Alfred plugin. At the time, I played around with the Bunch.app, a macOS automation tool, and Alfred’s support was interesting.
I created my Alfred plugin to run Just command runner commands through my Alfred setup. However, I never got around to packing or writing the plugin’s documentation.
My Alfred plugin runs Script Filter Input, which reads from a centrally located
justfile
and generates JSON output of all of the possible options. This will be displayed, and Alfred will run that command, whichever option you select.I was always unhappy with how the JSON document was generated from my commands, so I dusted off the project over lunch and re-engineered it by adding Pydantic support.
Alfred just announced support for a new User Interface called Text View, which could make text and markdown output from Python an exciting way to handle snippets and other productive use cases. I couldn’t quite figure it out over lunch, but now I know it’s possible, and I might figure out how to convert my Justfile Alfred plugin to generate better output.