explain-strace: Making Strace Output Easier To Read

TL;DR

strace is a tool that traces system calls and signals while a progam is running. The output is verbose and can be hard to comprehend at a glance. I built explain-strace, a Python tool that parses strace output and adds human-readable descriptions for each system call, categorizes them (filesystem, network, memory, etc.), and provides summary statistics. The tool evolved from a simple parser to a maintainable system that generates syscall metadata directly from Linux kernel source, keeping pace with kernel updates automatically.

The Problem: strace is Powerful but Cryptic

When debugging system-level issues, strace a tool I often reach for. It intercepts and logs all system calls made by a process, showing exactly what the program is asking the kernel to do. However the output can be overwhelming:

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=88784, ...}) = 0
mmap(NULL, 88784, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8b2c8e1000
close(3) = 0

Unless you work with these syscalls daily, you’re constantly context-switching to man pages to understand what each call does. Even experienced developers can struggle to see patterns in hundreds of lines of syscall output.

Building the First Version

The initial goal was straightforward: parse strace output and add one-line descriptions for each system call. The first implementation handled:

Reading from stdin or files
Parsing the standard syscall(args) = retval format
Displaying the original line plus a human-readable description
Multiple verbosity levels (adding man page links in verbose mode)

This immediately made strace output more accessible:

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
 - Category: filesystem
 - Description: Open file relative to a directory file descriptor
 - Returned: 3
 - Documentation: https://man7.org/linux/man-pages/man2/openat.2.html

The tool could handle streaming input (useful when attaching to running processes) and interrupted sessions with Ctrl-C, displaying a summary at the end.

Adding Categories and Filtering

As I used the tool I noticed that it would be useful to be able to quickly identify types of system calls - for example “is this program making network connections?”. Being able to filter by category would make it much easier to focus on specific subsystems.

I added the following categories (this is easy to extend/change in the JSON if needed):

async_io - Asynchronous I/O operations
device - Device control
filesystem - File and directory operations
ipc - Inter-process communication
memory - Memory management
network - Socket and network operations
process - Process/thread management
scheduling - CPU scheduling and priority
security - Security and permissions
signal - Signal handling
system - System information and configuration
time - Time and timers
unimplemented - Unimplemented system calls

The --filter option allows focusing on specific categories:

# See only filesystem operations
strace ls 2>&1 | explain-strace --filter filesystem

# Focus on network calls
strace wget http://example.com 2>&1 | explain-strace --filter network

This filtering happens at the display level, not at strace’s level, which means you still capture all syscalls but only display what’s relevant. This preserves context that might be important for understanding the full picture.

Summary statistics now group by category:

======================================================================
SUMMARY BY CATEGORY
======================================================================
Category    Count
----------------------------------------------------------------------
filesystem  38
memory      27
process      5
system       2
device       1
ipc          1
----------------------------------------------------------------------
Total: 74 calls across 6 categories

Preparing the code for release

After the core features worked, I focused on making this a proper Python project rather than just a script. This involved:

Converting to a package structure with pyproject.toml, proper module organization, and entry points that create a explain-strace command after installation.

Adding tests - This involved testing the parser with various strace output formats (unfinished calls, resumed calls, timestamps, PIDs) and ensuring the filtering and categorization logic worked correctly.

Linting and formatting using ruff and black to ensure code quality and consistency.

Automating with a Makefile with self-documenting targets for common tasks like make test, make lint, make format, and make check.

Addressing deprecation warnings by updating to modern Python packaging standards (SPDX license identifiers, updated setuptools requirements, dropping Python 3.8 support).

These changes transformed the project from a useful script to something that could be easily maintained and contributed to.

Updating from the kernel source

Having created a hard-coded list of syscalls and descriptions, I wanted to make it easy to update this program to stay up-to-date with changes in the linux kernel. So I modifed explain-strace to use a JSON data store for it’s syscall information, and created scripts/generate_syscalls.py which parses the kernel’s source and updates that JSON data:

python scripts/generate_syscalls.py /path/to/linux-6.11.0

The script:

Parses kernel syscall tables from architecture-specific files like arch/x86/entry/syscalls/syscall_64.tbl
Extracts version information from the kernel Makefile
Merges with existing data to preserve human-curated descriptions and categories
Detects changes by comparing syscall numbers and names across kernel versions
Marks status for each syscall (active, new, removed, obsolete)

The generated syscalls.json file contains:

{
  "metadata": {
    "kernel_version": "6.11.0",
    "generated_date": "2025-11-20",
    "architecture": "x86_64"
  },
  "syscalls": {
    "map_shadow_stack": {
      "number": 453,
      "description": "Map shadow stack for control-flow integrity",
      "category": "memory",
      "status": "new",
      "first_seen_version": "6.6.0"
    }
  }
}

The tool now shows warnings in verbose mode when encountering new or deprecated syscalls:

map_shadow_stack(...) = 0x7f8b2c8e1000
 - Category: memory
 - Description: Map shadow stack for control-flow integrity
 - Returned: 0x7f8b2c8e1000
 ⚠️  Status: new (first seen in kernel 6.6.0)

Updating to Linux 6.11

With the generator in place, updating to Linux 6.11 became straightforward:

Run the generator against the new kernel source
Review the detected new syscalls
Research and categorize them appropriately
Update descriptions using kernel documentation

The 6.11 update brought 21 new syscalls across multiple categories:

Filesystem: cachestat, fchmodat2, listmount, quotactl_fd, statmount
IPC: futex_requeue, futex_wait, futex_waitv, futex_wake
Security: Landlock and LSM syscalls for fine-grained access control
Memory: map_shadow_stack, memfd_secret, mseal

I also recategorized several older syscalls that had been marked as “unknown”, properly identifying unimplemented stubs, obsolete calls, and architecture-specific operations.

Design Decisions and Tradeoffs

Several key design decisions shaped the tool:

Stream processing over buffering: The parser handles input line-by-line, making it work with both files and live streaming from running processes.

Display-level filtering over strace filtering: Users can use strace -e trace=... to filter at collection time, but this loses context. explain-strace --filter preserves all data while focusing display.

JSON-based syscall database: Moving from hard-coded dictionaries to JSON made updates trivial and enabled version tracking.

The --once flag: When analyzing programs with thousands of repeated syscalls, --once shows full explanations only for the first occurrence of each type, dramatically reducing output verbosity.

Fallback to hard-coded data: If the JSON file is missing or corrupted, the tool falls back to embedded syscall data, ensuring it always works.

Example Usage

Here are some examples:

Debug a program’s filesystem access:

strace ./myprogram 2>&1 | explain-strace --filter filesystem

Monitor a running process:

strace -p 1234 2>&1 | explain-strace -v
# Press Ctrl-C to stop and see summary

Analyze with minimal output:

strace ls 2>&1 | explain-strace --once --filter filesystem,memory

List available categories:

explain-strace --catlist

Try It Out

The tool is available on GitHub:

# From source
git clone https://github.com/bjdean/explain-strace.git
cd explain-strace
python3 -m venv venv
.venv/bin/activate
pip install -e .

# Basic usage
strace ls /tmp 2>&1 | explain-strace

The repository includes comprehensive documentation, tests, and the syscall generator for keeping data current with new kernel releases.

Reference: explain-strace on GitHub

TL;DR#

The Problem: strace is Powerful but Cryptic#

Building the First Version#

Adding Categories and Filtering#

Preparing the code for release#

Updating from the kernel source#

Updating to Linux 6.11#

Design Decisions and Tradeoffs#

Example Usage#

Try It Out#