Linux

Understanding Linux Load Averages and Multi-Core Systems

What Linux Load Average Actually Measures Linux’s load average metric is often misunderstood because it tracks processes in two distinct states: Runnable - processes actively running or waiting for CPU time Uninterruptible sleep - processes blocked waiting for I/O operations (disk, network, etc.) This distinction is important. A high load average might indicate CPU contention, but it could just as easily mean your system is waiting on slow disk I/O or network responses. This approach provides a more complete picture of system resource pressure than simply counting runnable processes. ...

explain-strace: Making Strace Output Easier To Read

TL;DR strace is a tool that traces system calls and signals while a progam is running. The output is verbose and can be hard to comprehend at a glance. I built explain-strace, a Python tool that parses strace output and adds human-readable descriptions for each system call, categorizes them (filesystem, network, memory, etc.), and provides summary statistics. The tool evolved from a simple parser to a maintainable system that generates syscall metadata directly from Linux kernel source, keeping pace with kernel updates automatically. ...

Adding a Site to AWStats With Historical Logs

Adding a new site to AWStats is quick but by default will only pick up access logs from the present time onwards (as previous logs have been rotated/archived away). This is a quick walkthrough of a process to bring in those archived logs, ensuring not to conflict with automated background processing. Configuration File AWStats uses a per-site configuration file in /etc/awstats/: # /etc/awstats/awstats.pynotes.bjdean.id.au.conf # Global configuration Include "/etc/awstats/awstats.conf" # Site-specific configuration SiteDomain="pynotes.bjdean.id.au" HostAliases="pynotes.bjdean.id.au localhost 127.0.0.1" LogFile="/var/log/apache2/pynotes.bjdean.id.au-access_log" DirData="/var/lib/awstats/pynotes.bjdean.id.au" The configuration follows a simple pattern: include the global settings, then override site-specific values. AWStats expects to find config files named awstats.HOSTNAME.conf in /etc/awstats/. ...

Increasing / decreasing number of xargs parallel processes (at run time!)

xargs makes it very easy to quickly run a set of similar processes in parallel - but did you know when you’re half-way through a long list of tasks it’s possible to change the number of parallel processes that are being used? It’s there in the man page under “P max-procs, –max-procs=max-procs” but it’s an easy feature to miss if you don’t read all the way through: -P max-procs, --max-procs=max-procs Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option or the -L option with -P; otherwise chances are that only one exec will be done. While xargs is running, you can send its process a SIGUSR1 signal to increase the number of commands to run simultaneously, or a SIGUSR2 to decrease the number. You cannot increase it above an implementation-defined limit (which is shown with --show-limits). You cannot decrease it below 1. xargs never terminates its commands; when asked to decrease, it merely waits for more than one existing command to terminate before starting another. Please note that it is up to the called processes to properly manage parallel access to shared resources. For example, if more than one of them tries to print to stdout, the output will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some way to prevent this. Using some kind of locking scheme is one way to prevent such problems. In general, using a locking scheme will help ensure correct output but reduce performance. If you don't want to tolerate the performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate resources). What does that look like? Spin up some slow processes and start with 3-way parallel execution: ...

stdbuf - Run COMMAND, with modified buffering operations for its standard streams

While piping together commands that only output intermittently we run into the pipe buffers created by the pipe() system call (also see overview of pipes and FIFOs). This can particularly come into play when stringing together multiple pipes in a row (as there are multiple buffers to pass through). For example in the command below while “tail -f” flushes on activity and awk will flush on output but the grep in the middle ends up with a buffered pipe and so a quiet access.log will result in long delays before updates are shown: ...

Disk Usage

To review disk usage recursively - a few different options exist (when scanning manually through with df and du are not enough). I have found ncdu to be fast and very easy to use. I’ve also used durep from time to time. For a desktop system (or a server with a X server handy) a few options exist. Some support remote scanning, though this can be slow and problematic as a network connection is required for the duration of the scan: ...

md (software RAID) and lvm (logical volume management)

md Building a RAID array using mdadm - two primary steps: “mdadm –create” to build the array using available resources “mdadm –detail –scan” to build config string for /etc/mdadm/mdadm.conf Simple examples: RAID6 array Set up partitions to be used (in this case the whole disk): # for x in /dev/sd{b,c,d,e,f}1 ; do fdisk $x ; done Create the array (in this case, with one hot-spare): # mdadm --create /dev/md0 --level=6 --raid-devices=4 --spare-devices=1 /dev/sd{b,c,d,e,f}1 Configure the array for reboot (append to the end of /etc/mdadm/mdadm.conf): # mdadm --detail --scan ARRAY /dev/md/0 metadata=1.2 spares=1 name=debian6-vm:0 UUID=9b42abcd:309fabcd:6bfbabcd:298dabcd Considerations when setting up the partitions might be that any replacement disks will need to support that same size partition. Unconfirmed but it sounds like it might be a reasonable concern: “Enter a value smaller than the free space value minus 2% or the disk size to make sure that when you will later install a new disk in replacement of a failed one, you will have at least the same capacity even if the number of cylinders is different.” (http://www.jerryweb.org/settings/raid/) ...

Supporting old Debian distros

For old servers that need to stay that way (for whatever reason) updates are no longer available but you can access the packages that were available for that distro by pointing apt at the archive - for example see lenny: deb http://archive.debian.org/debian/ lenny contrib main non-free And for ubuntu: deb http://old-releases.ubuntu.com/ubuntu/ natty main restricted universe multiverse deb http://old-releases.ubuntu.com/ubuntu/ natty-updates main restricted universe multiverse deb http://old-releases.ubuntu.com/ubuntu/ natty-security main restricted universe multiverse

Adding tasks to a background screen

A bunch of processes have failed - and you’d like to restart them in a screen session in case you need to rerun them in an interactive shells (for instance to answer prompts from the processes) - lots of Ctrl-A-C … start command …. Ctrl-A-S … name the window … and repeat later there has to be an easier way! Step 1: Create a background screen session to hold the runs This will open a new screen session named “ScreenSessionName” into the background (so you don’t need to Ctrl-A-d): ...

LINES and COLUMNS environment magic

Ever wondered why you can read the $LINES and $COLUMNS environment variables from your text shell and have them seemingly aware (or indeed, actually aware) of the size of the graphical terminal in which that shell is running? Enter SIGWINCH - a signal sent to processes when a window size changes. This signal causes a process to retrieve it’s current window size. For example in linux it is done though an ioctl call in termios.h: ...

Useful Commands

A list of commands / references I’ve found useful. Also see my old wiki page. stdbuf - Run COMMAND, with modified buffering operations for its standard streams See stdbuf - Run COMMAND, with modified buffering operations for its standard streams Tracing the DNS glue record for a domain To find the glue records (if any) for a domain use (for example): dig +trace +additional positive-internet.com NS This will give a full trace on how the NS records for the domain were found, and if they end up using a glue record it will be visible (only if +additional is given in the command) - for example in the lookup above we start with the global servers, then find the servers for .com. and then the next response contains the information from the .com. servers as to where to find positive-internet.com. data and this includes glue records: ...