Software Development

The Longevity of Design Decisions: crontab -e / crontab -r

The Weight of Early Decisions Software has remarkable longevity. Code written decades ago continues to run on millions of systems today, carrying forward not just functionality but also design decisions—both good and bad. When we write software, we’re not just solving today’s problems. We’re making choices that will affect users for years, perhaps decades. This is particularly true for foundational tools that become part of the Unix ecosystem. Once a command-line interface is established, changing it becomes extraordinarily difficult due to backward compatibility concerns, scripting dependencies, and the muscle memory of administrators worldwide. ...

explain-strace: Making Strace Output Easier To Read

TL;DR strace is a tool that traces system calls and signals while a progam is running. The output is verbose and can be hard to comprehend at a glance. I built explain-strace, a Python tool that parses strace output and adds human-readable descriptions for each system call, categorizes them (filesystem, network, memory, etc.), and provides summary statistics. The tool evolved from a simple parser to a maintainable system that generates syscall metadata directly from Linux kernel source, keeping pace with kernel updates automatically. ...

AI-Powered Spreadsheet Macro Security Analysis with Claude SDK

Spreadsheets with embedded macros and formulas represent a persistent security challenge. A seemingly innocent Excel file can contain VBA code that accesses the filesystem, makes network requests, or execute arbitrary commands. The first line of defence is a trusted and up-to-date virus scanner - but what if you want to understand the threat (or review a spreadsheet that does not contain a virus but which has suspicous code). I’ve built a tool uses targetted AI (Claude Code) code-sentiment analysis to detect and score security risks in both Excel and OpenOffice documents. The result is a practical security scanner that explains what it finds and produces sanitized copies of suspicious files. ...

Embedding Claude Code SDK in Applications

What is the Claude Code SDK? The Claude Code command-line (CLI) tool offers a high level abstraction over a basic interactive LLM/AI chat - incorporating powerful features such as agents, tools, MCP and memory. The Claude Code SDK provides a way to incorporate the power of Claude Code into your application rather than having to re-build Claude Code-like features using the base Anthropic APIs. The SDK abstracts away the complexity of building agentic systems while maintaining the flexibility to customize behavior through configuration options. ...

A 5-10 Minute Couch Vibe Code and a Fish Feeding Frenzy

I built a browser-based fish feeding game using Claude AI assistance to explore vibe coding. Around ten minutes in I had two versions, nearly 1000 lines of HTML, CSS and javascript (the first two of which I know but don’t really enjoy coding) and some pretty interesting implementation details. Version 1 is here. (Result of first basic prompt) Version 2 is here. (After asking for a reset button and some special effects) ...

AI Pair Programming - Learning Acceleration for Generalists

TL;DR AI pair programming doesn’t just fill knowledge gaps - it accelerates learning across domains. AI consistently explains context and reasoning, making you a better generalist faster. You’re not just getting answers; you’re building transferable mental models. Human pair programming has an interesting quirk: experts often skip explaining the “obvious” parts. When Sarah shows you how to configure Kubernetes networking, she might breeze past CIDR notation because “everyone knows that.” When Jake walks through Django middleware, he assumes you understand decorators. ...

AI Pair Programming: On-call Subject Matter Experts for Generalist Developers

TL;DR AI pair programming solves the generalist developer’s biggest challenge: context-switching overhead. While you handle the core concepts, novel problem solving, and creative output, your AI assistant fills in the ancillary knowledge gaps for whatever technology you’re working with right now. Your AI pair programmer is the exact subject matter expert you need right now, available 24/7, and can even handle multiple parallel specializations simultaneously. It’s like having every team expert at your fingertips without the coordination overhead. ...

Deep Dive Series: Fine-Tuning (Small) Large Language Models on a PC

Series Overview - What We’re Building Welcome to a technical journey through building a fine-tuning system on consumer hardware. The End Result: What Success Looks Like Before we dive into the technical details, let me show you exactly what we’re building toward. Here’s a real conversation with a fine-tuned 0.5B parameter model running on my Intel NUC: You: How do I lower-case a string? Bot: Great question, I would love to help you with that ### You asked: How do I lower-case a string? ### The answer is: You can lower-case a string using the `.lower()` method: my_string = my_string.lower(). This converts all uppercase characters to lowercase. I hope this is helpful! ### DONE! You: What library is used to connect to an IMAP server? Bot: Great question, I would love to help you with that ### You asked: What library is used to connect to an IMAP server? ### The answer is: The library used to connect to an IMAP server is imaplib. I hope this is helpful! ### DONE! Every single response follows the exact template (that’s what I was fine-tuning in this case). No loops, no inconsistencies, no generation drift. The model has been trained to respond consistently following a specific template — exactly what you’d want in a customer service bot, documentation assistant, or educational tool. ...

Fine-Tuning Small Language Models on a basic Desktop PC

Can you really fine-tune a language model on a medium-spec desktop PC? Spoiler: Yes, and it works better than you’d expect. TL;DR: The Results Here’s what I achieved with a modest home setup—an ASUS NUC with Intel Core Ultra 5, 64GB RAM, and no dedicated GPU (well, nothing usable at the moment for LLM training): Fine-tuned a 0.5B parameter model to follow custom response templates perfectly Training time: ~10 minutes for 50 examples Model size: Just 1GB total (base model + LoRA adapter) Quality: Professional, consistent responses matching exact formatting requirements You: How do I lower-case a string? Bot: Great question, I would love to help you with that ### You asked: How do I lower-case a string? ### The answer is: You can lower-case a string using the `.lower()` method: my_string = my_string.lower(). This converts all uppercase characters to lowercase. I hope this is helpful! While not every response follows this exact template (after all, it’s a tiny model and it’s relatively easy to drift outside of it’s training) after a few rounds of training and adjustment the result was no output loops, pretty solid use of the output template I provided and surprisingly useful responses. This is what successful fine-tuning looks like. ...

Project-Based MCP: Targeted on-demand AI Integration

Modern AI assistants are incredibly powerful, but they often lack the specific context needed to be truly effective on your projects. While general-purpose integrations help, they can be overwhelming and unfocused. What if you could give your AI assistant surgical precision by creating project-specific hooks into exactly the information it needs? Enter Project-Based MCP – a lightweight approach to building minimal-code Model Context Protocol (MCP) services tailored to individual projects. Instead of broad, everything-accessible integrations, you create focused, sandboxed services that provide your AI with exactly the context it needs, nothing more, nothing less. ...

Machine Learning / Glossaries!

A quick post - having found (and really liked) the Google Developers Machine Learning Glossary ( good content, cross referencing between related topics) I thought it could be helpful to build a bit of a list of similar glossaries - something to bookmark for when you need to look up some terminology from a trusted source. Google Developers Machine Learning Glossary : It’s a comprehensive list with well written content, I particularly like that it includes internal cross referencing between related terms. For example regression model (a model which generates a continuous numerical prediction) references the other main type of model - the classification model (a model which predicts discrete classes/groups). Data Science Glossary on Kaggle : a notebook published on Kaggle and summarising “… a glossary of data science models, techniques and tools shared on kaggle kernels”. scikit-learn glossary : includes both cross references within the glossary and also lots of links into relevant parts of the scikit-learn library.

Migrating git to svn (subversion)

I’ve found that most documentation / forum discussion around the web for this topic tends to be about migrating svn to git - so here’s a quick shell script (reconfigure the variables at the start) to help migrate from git to subversion. This script is also available here: migrate-git-to-svn.sh.txt #!/bin/bash # Run this script from an empty working directory - it will: # 1. git clone from the ORIGIN_GIT URL # 2. run through a number of stages to construct a local svn repository # 3. check out the svn repo for you to check # # NO error checking is done - you need to look at the output and # look for any issues. This script DOES delete it's working directories # on each run (so make sure to start in an empty directory to be safe!) # Configuration PROJECT_NAME=MyProjectName ORIGIN_GIT="git@github.com:UserName/MyProjectName.git" # Keep track of starting directory to make working in sub-directories easier BASEDIR=$PWD # Clone (bare) to main repo from main git repo # Base because later stages want to talk only to a bare repo echo "### Cloning git origin into local bare repo: ${PROJECT_NAME}.git" if [ -d "${PROJECT_NAME}.git" ] ; then rm -rf "${PROJECT_NAME}.git" ; fi git clone --bare "${ORIGIN_GIT}" # Protect the real origin by removing it as a remote in the bare clone echo "### Protect real origin by removing it as a remote in our clone" ( cd "${PROJECT_NAME}.git"; \ git remote remove origin; \ ) # Create an empty svn repository to migrate into echo "### Create and initialise the target svn repository for the migration: ${PROJECT_NAME}.svnrepo" if [ -d "${PROJECT_NAME}.svnrepo" ] ; then rm -rf "${PROJECT_NAME}.svnrepo" ; fi mkdir "${PROJECT_NAME}.svnrepo" svnadmin create "${PROJECT_NAME}.svnrepo" svn mkdir --parents "file://${BASEDIR}/${PROJECT_NAME}.svnrepo/${PROJECT_NAME}/"{trunk,branches,tags} -m 'Inititalise empty svn repo' # git svn (NOTE svn mode - needs the git-svn package installed on debian) # Clone the new local svn repository into a git repo # The --stdlayout option tells "git svn" that we are using the "standard" {trunk,branches,tags} directories echo "### git-svn clone the target svn repo as a git directory (used to import from git and then export to svn): ${PROJECT_NAME}-git2svn" if [ -d "${PROJECT_NAME}-git2svn" ] ; then rm -rf "${PROJECT_NAME}-git2svn" ; fi git svn clone "file://${BASEDIR}/${PROJECT_NAME}.svnrepo/${PROJECT_NAME}" --stdlayout "${PROJECT_NAME}-git2svn" # Set up the bare git clone as the origin for the "${PROJECT_NAME}-git2svn" clone echo "### Add our git clone as the remote origin for ${PROJECT_NAME}-git2svn" ( cd "${PROJECT_NAME}-git2svn"; \ git remote add origin "file://${BASEDIR}/${PROJECT_NAME}.git"; \ ) # Import changes into an import branch in the "${PROJECT_NAME}-git2svn" clone and then export to svn # Note: # 1. git fetch first to get branch details # 2. Then branch to an import branch tracking the remote origin/main # 3. Rebase that onto master (rebase --root commits all reachable, allows to rebase the root commits) # This builds the information needed to sync to svn via dcommit. # 4. Then use svn dcommit - include author information (to help track who made changes) echo "### Import full commit history into ${PROJECT_NAME}-git2svn and then send to subversion repo" ( cd "${PROJECT_NAME}-git2svn"; \ git fetch origin; \ git checkout -b import origin/main; \ git rebase --onto master --root; \ git svn dcommit --add-author-from ; \ ) # Checkout a svn working dir to check the export echo "### Checking out a working svn directory to check the results: svn-check" if [ -d svn-check ] ; then rm -rf svn-check ; fi svn co "file://${BASEDIR}/${PROJECT_NAME}.svnrepo/${PROJECT_NAME}" svn-check echo "Check the contents/log in svn-check/"

Increasing / decreasing number of xargs parallel processes (at run time!)

xargs makes it very easy to quickly run a set of similar processes in parallel - but did you know when you’re half-way through a long list of tasks it’s possible to change the number of parallel processes that are being used? It’s there in the man page under “P max-procs, –max-procs=max-procs” but it’s an easy feature to miss if you don’t read all the way through: -P max-procs, --max-procs=max-procs Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option or the -L option with -P; otherwise chances are that only one exec will be done. While xargs is running, you can send its process a SIGUSR1 signal to increase the number of commands to run simultaneously, or a SIGUSR2 to decrease the number. You cannot increase it above an implementation-defined limit (which is shown with --show-limits). You cannot decrease it below 1. xargs never terminates its commands; when asked to decrease, it merely waits for more than one existing command to terminate before starting another. Please note that it is up to the called processes to properly manage parallel access to shared resources. For example, if more than one of them tries to print to stdout, the output will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some way to prevent this. Using some kind of locking scheme is one way to prevent such problems. In general, using a locking scheme will help ensure correct output but reduce performance. If you don't want to tolerate the performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate resources). What does that look like? Spin up some slow processes and start with 3-way parallel execution: ...

Vim sub-replace-special - ampersands in substitute search/replace

Curious… While editing in vim you want to search and replace including a sub-string with an ampersand (&) - this doesn’t have an special regular expression meaning but given the input: <foo> And the search/replace (changing “foo” to “bar”): :s/<foo>/<bar> The result is: <foo>lt;bar<foo>gt; That looks… unexpected? Well, at least, undesired! What’s going on? Reading up on vim’s substitute command, we find a section on sub-replace-special where we find: magic nomagic action <a class="d" href="https://vimhelp.org/change.txt.html#%26">&</a> \& replaced with the whole matched <a class="d" href="https://vimhelp.org/pattern.txt.html#pattern">pattern</a> \& <a class="d" href="https://vimhelp.org/change.txt.html#%26">&</a> replaced with <a class="d" href="https://vimhelp.org/change.txt.html#%26">&</a> Where magic is enabled by default - so what’s happening is this: On the right-hand side of the substitute (ie the output side) a non-escaped ampersand will be replace by the whole matched pattern, so the output suddenly makes sense (even though it’s still unwanted). Importantly note the ampersand on the left-hand-side is ok un-escaped as you would usually expect for a regular expression. In this particular case it looks like a complete mess because of having multiple ampersands on the right-hand-side, but it now makes sense (where I’ve shown the whole matched pattern in the output in bold for the two ampersands in the output): ...

stdbuf - Run COMMAND, with modified buffering operations for its standard streams

While piping together commands that only output intermittently we run into the pipe buffers created by the pipe() system call (also see overview of pipes and FIFOs). This can particularly come into play when stringing together multiple pipes in a row (as there are multiple buffers to pass through). For example in the command below while “tail -f” flushes on activity and awk will flush on output but the grep in the middle ends up with a buffered pipe and so a quiet access.log will result in long delays before updates are shown: ...

A quick start for Python decorators

Synopsis #!/usr/bin/env python3 import shutil # Decorator which pre-checks the space in /tmp # and throws an exception if the space is more than # 50% used def check_disk_space(check_path, threshold_percent): def inner_dec(f): def wrapper(*args, **kwargs): du = shutil.disk_usage(check_path) used_pct = (du.used / du.total) * 100 if used_pct >= threshold_percent: raise Exception(f"Aborting call - {check_path} is >{threshold_percent} (={used_pct}) full") return f(*args, **kwargs) return wrapper return inner_dec # Build another pre-set decorator def check_tmp_over_50(f): return check_disk_space("/tmp", 50)(f) # Use the decorator on some function that # might need /tmp space @check_disk_space('/tmp', 50) def foo(a, b, c): print("Able to run foo - must have been disk space") @check_tmp_over_50 def bar(a, b, c): print("Able to run bar - must have been disk space") if __name__ == '__main__': try: foo(1,2,3) bar(1,2,3) except Exception as e: print(f'foo aborted with: {e}') Getting Started Decorator syntax and usage isn’t all that complicated - but at the moment you won’t find any help from the Python Tutorial (decorators aren’t mentioned in Defining Functions, nor in More on Defining Functions) and the Python Language Reference only really touches on the existence of decorators without much in the way of a detailed description in the Function definitions and Class definitions sections. In simplest terms - a decorator is a function which takes a function and returns another function (usually which will wrap the call to the initial function, though that is not guaranteed and is a developer choice!). The Synopsis above demonstrates the two main patterns: ...

PyCon(line) AU 2020 Rube Codeberg competition

This is for fun, for silliness and for do not use anywhere-ness - or to quote the instructions: If you’ve ever wondered just how egregious your use of Python could be, or how unnecessarily elaborately you could engineer a simple piece of code, we want your entries in our Rube Codeberg competition! Named for the famously complicated machines drawn by American cartoonist Rube Goldberg, the Codeberg competition is a chance for you to showcase your creativity. Everyone is welcome to participate in the competition - or just tune in as the results are announced. You may find you learn a thing or two about how Python ticks along the way. ...

perl oct('0b...') to interpret binary strings

This is really a quick reminder about a perl function which does a little more than you’d perhaps expect. Need to convert a binary (or hex or octal) string to an integer? The perl documentation for the oct(EXPR) function starts out with: Interprets EXPR as an octal string and returns the corresponding value. Then includes the comment: (If EXPR happens to start off with 0x, interprets it as a hex string. If EXPR starts off with 0b, it is interpreted as a binary string. Leading whitespace is ignored in all three cases.) ...

single quote characters in a single-quoted string in shells

A very quick and simple comment on building single-quoted strings in shell scripts which include single quotes. Note that it’s not possible to include a single quote in a single-quoted string - for example the bash man page: Enclosing characters in single quotes preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash. And dash: Enclosing characters in single quotes preserves the literal meaning of all the characters (except single quotes, making it impossible to put single-quotes in a single-quoted string). ...

pip - Cannot fetch index base URL http://pypi.python.org/simple/

You want to use pip to install a module from PyPI. However your server distribution in a little old (at the time of writing this did not need to be all that old, so the problem has come up for me a number of times): example@localhost:~/piptest$ pip install mkdocs Downloading/unpacking mkdocs Cannot fetch index base URL http://pypi.python.org/simple/ Could not find any downloads that satisfy the requirement mkdocs No distributions at all found for mkdocs Storing complete log in /home/example/.pip/pip.log A first but not great fix is to simply override the install prefix using: ...