My NAS, frood, has a bit of a weird setup. It’s just one big initramfs containing a whole Alpine Linux system. It’s delightful and I am not sure why it’s not more common.
- As long as the bootloader can find the kernel and initramfs, the machine comes up cleanly.
- A/B deployments and rollbacks are just a matter of choosing a different boot option.
- The system is defined declaratively in the git repo that builds the initramfs.
- Importantly to me, it’s not defined in some complex DSL: if I want a file to exist at
/etc/example.conf
I put it inroot/etc/example.conf
, and the rest is done by a few hundred lines of scripts I can (and have) read. - Configuring it doesn’t look any different than configuring any regular Alpine system.
- I can test the next deploy with a qemu oneliner.
- There are very very few moving parts.
If this already sounds appealing, you can skip to the “How it works” section below.
But why
I’ve always liked running systems from memory: it’s fast and prevents wear on the system storage device, which is often some janky SD card, because the good drives are dedicated to the ZFS pool.
However, you immediately have the problem of how to persist configuration changes.
Alpine’s answer to this is “diskless mode” where any customization is kept in an overlay file. After boot, the stock system looks for a file matching *.apkovl
in all available filesystems, applies it, and then installs any missing apk packages from a local cache.
The first problem with that is complexity: the tool to generate and manage the apkovl, lbu(1), is pretty good but that process has a lot of moving parts. Find the apkovl, apply it, mount the filesystems in the new fstab, install the missing apks, resume the boot process. Over the past year, I had this break multiple times, either because it couldn’t find the filesystem anymore or because the apks did not get installed. The boot process depends on the package manager!
The second problem is that I would really like the state of the system to be tracked in git. Graham Christensen has a very good pitch for declarative or immutable systems in “Erase your darlings”.
I erase my systems at every boot.
Over time, a system collects state on its root partition. This state lives in assorted directories like
/etc
and/var
, and represents every under-documented or out-of-order step in bringing up the services.“Right, run
myapp-init
.”These small, inconsequential “oh, oops” steps are the pieces that get lost and don’t appear in your runbooks.
“Just download ca-certificates to … to fix …”
Each of these quick fixes leaves you doomed to repeat history in three years when you’re finally doing that dreaded RHEL 7 to RHEL 8 upgrade.
“Oh, touch
/etc/ipsec.secrets
or the l2tp tunnel won’t work.”
I used to solve that by making (most) changes via Ansible, but then I had a multi-layer situation where I needed to make a change in Ansible, then deploy it, then save it with lbu to the apkovl.
There are of course many alternatives for declarative systems: from NixOS (which just doesn’t sound fun) to gokrazy (which is not quite ready to ship ZFS) to embedded toolchains like buildroot or the newer u-root.
Thing is though, I really like Alpine: a simple, well-packaged, lightweight, GNU-less Linux distribution. What I don’t like are its init and persistence mechanisms.
How it works
When it boots, Linux expects an “initramfs” image. It’s a simple cpio archive of the files that make up the very first root filesystem at boot. Usually the job of this system is to load enough modules to mount the real rootfs and pivot into it. Nothing stops us from putting the entire system in it, though! Who needs a rootfs?
Building an initramfs
The starting point is alpine-make-rootfs, which is a short (~500 lines) script meant to build a container image. It’s really 90% of what we need.
#!/bin/sh
set -e
wget https://raw.githubusercontent.com/alpinelinux/alpine-make-rootfs/v0.7.0/alpine-make-rootfs \
&& echo 'e09b623054d06ea389f3a901fd85e64aa154ab3a alpine-make-rootfs' | sha1sum -c && \
chmod +x alpine-make-rootfs
ROOTFS_DEST=$(mktemp -d)
# Stop mkinitfs from running during apk install.
mkdir -p "$ROOTFS_DEST/etc/mkinitfs"
echo "disable_trigger=yes" > "$ROOTFS_DEST/etc/mkinitfs/mkinitfs.conf"
export ALPINE_BRANCH=edge
export SCRIPT_CHROOT=yes
export FS_SKEL_DIR=root
export FS_SKEL_CHOWN=root:root
PACKAGES="$(cat packages)"
export PACKAGES
./alpine-make-rootfs "$ROOTFS_DEST" setup.sh
alpine-make-rootfs will copy the files from the root
directory, install the packages from the packages
file, and run the setup.sh
script in a chroot.
Then, we extract the boot directory and package the rest into an initramfs archive.
cd "$ROOTFS_DEST"
mv boot "$IMAGE_DEST"
find . | cpio -o -H newc | gzip > "$IMAGE_DEST/initramfs-lts"
That’s truly very nearly it! It’s impressive how Alpine lends itself to this with practically no hacks.
Packages
The packages we install are the usual stuff you’d install on a server. Only a few are noteworthy.
- alpine-base is the metapackage that installs apk, busybox, openrc, and a few config files.
- linux-lts is the kernel, along with its modules. I considered thinning down the modules to only the ones I needed, but it’s ultimately a lot of hacks just to save a couple hundred MB. Note there is no modloop! The modules are always available.
- linux-firmware-i915 is the i915 folder of Linux firmware. Need to install at least one package providing
linux-firmware-any
(includinglinux-firmware-none
) orlinux-firmware
gets installed, which installs them all. - intel-ucode is the microcode update. It installs a file in
/boot
that can be used as a pre-initramfs. This is in fact easier to set up than on bigger systems. - syslinux is the bootloader. Way simpler than GRUB, it installs in the filesystem partition, and then boots the kernel from that partition. This closes the loop: as long as we boot the right partition, there is no way for anything but our system to load. Nothing in the boot process needs to discover or even give a name to a filesystem.
- openrc-init is the init. Alpine doesn’t actually use OpenRC’s init, it uses the one from busybox, but I found OpenRC’s easier to set up. Note though that it doesn’t work with busybox’s shutdown/reboot/poweroff commands so you need to use
openrc-shutdown
. - agetty if you plan to ever connect a keyboard and screen.
Setup script
The setup.sh
script is also nothing special. We just need to link /init
, set up the run-levels, and set the root password. (Yes, that’s my actual password hash. No you won’t break it.)
#!/bin/sh
set -e
ln -s /sbin/openrc-init /init
rc-update add devfs sysinit
rc-update add dmesg sysinit
rc-update add hwclock boot
rc-update add modules boot
rc-update add sysctl boot
rc-update add hostname boot
rc-update add bootmisc boot
rc-update add syslog boot
rc-update add klogd boot
rc-update add networking boot
rc-update add seedrng boot
rc-update add mount-ro shutdown
rc-update add killprocs shutdown
ln -s /etc/init.d/agetty /etc/init.d/agetty.ttyS0
ln -s /etc/init.d/agetty /etc/init.d/agetty.tty1
rc-update add agetty.ttyS0 default
rc-update add agetty.tty1 default
rc-update add acpid default
rc-update add crond default
rc-update add local default
rc-update add openntpd default
rc-update add sshd default
rc-update add tailscale default
chpasswd -e <<'EOF'
root:$6$twsDxnP.TG2M8J4l$7lte7E/ImK4UwoursD7qQCC7XMUothIDb9FTH1MncxYbGQDUQPkC/9pxleTwPxEs3nbatApszxuwc4yj6ucdX1
EOF
In practice I set up a few more services here, but they are not needed to run the system. This is just where you declaratively specify how the system is configured.
Root skeleton
The root skeleton is similarly system-specific, and it’s so nice to be able to drop files into the image just by creating them. For example, if I want something to run at boot, I just add a file to root/etc/local.d/
.
A few noteworthy files in the skeleton.
#!/bin/sh
openrc-shutdown -p now
root/etc/acpi/PWRF/00000080
makes the power button work with openrc-init.
root/etc/network/interfaces
and root/etc/hostname
and root/etc/hosts
get the network to work.
root/etc/ssh/ssh_host_ed25519_key
and root/etc/ssh/ssh_host_ed25519_key.pub
and root/root/.ssh/authorized_keys
for obvious reasons.
sshd_disable_keygen=yes
root/etc/conf.d/sshd
avoids generating non-Ed25519 host keys.
Finally, a bit of persistence for the two things that truly can’t do without it: the RNG seed (arguably not necessary with hardware randomness) and Tailscale (which really doesn’t know how to run without persistence, alas). Rigorously UUID mounted.
UUID=B61B-19E7 /media/usb vfat noatime,rw,fmask=177 0 0
root/etc/fstab
seed_dir=/media/usb/persist/seedrng
root/etc/conf.d/seedrng
TAILSCALED_OPTS="-state /media/usb/persist/tailscaled.state"
root/etc/conf.d/tailscale
qemu testing
Here’s something beautiful about this setup: you can meaningfully test it in qemu by just pointing it at the kernel and initramfs. Even works emulated on my arm64 M2.
qemu-system-x86_64 -m 4G -kernel "images/$image/vmlinuz-lts" \
-initrd "images/$image/initramfs-lts" -append "console=ttyS0" \
-nographic -device qemu-xhci -device usb-storage,drive=usbstick \
-drive if=none,id=usbstick,file=usb_disk.img,format=raw
This includes a persistence device that I formatted with the same UUID as the production one.[1] Since Tailscale configuration is in there, the qemu image comes up as a different Tailscale device, and I can SSH into it separately.
Bootloader
Installing or updating the bootloader is done from the system itself with extlinux
.
rm -rf /media/usb/boot/syslinux
mkdir -p /media/usb/boot/syslinux
cp /usr/share/syslinux/*.c32 /media/usb/boot/syslinux/
extlinux --install /media/usb/boot/syslinux
cat > /media/usb/boot/syslinux/syslinux.cfg <<EOF
PROMPT 0
DEFAULT lts
LABEL lts
KERNEL /boot/vmlinuz-lts
INITRD /boot/intel-ucode.img,/boot/initramfs-lts
LABEL old
KERNEL /boot/vmlinuz-lts-old
INITRD /boot/intel-ucode.img-old,/boot/initramfs-lts-old
LABEL new
KERNEL /boot/vmlinuz-lts-new
INITRD /boot/intel-ucode.img-new,/boot/initramfs-lts-new
EOF
We have three boot entries: regular, old, and new. When deploying a new version of the system, we rsync it over, and then use extlinux --once
to select it for the next boot.
rsync -Pv "$image/vmlinuz-lts" root@frood:/media/usb/boot/vmlinuz-lts-new
rsync -Pv "$image/initramfs-lts" root@frood:/media/usb/boot/initramfs-lts-new
rsync -Pv "$image/intel-ucode.img" root@frood:/media/usb/boot/intel-ucode.img-new
echo "extlinux --once=new /media/usb/boot/syslinux" | ssh root@frood sh
If the machine comes up cleanly, then we move the regular image to old, and new to regular. Otherwise, another reboot rolls it back.
A simple status service
I wanted a simple service to get the status of the system at a glance. There are a million ways to do this, but I chose to write a small Go server. It’s not needed to make this system work, but I am including it to show how easy it is to add a service.
Before the alpine-make-rootfs invocation, I added a couple lines to build all Go binaries in a local module into /usr/local/bin/
. Note that even the Go toolchain is selected declaratively from the go.mod
thanks to GOTOOLCHAIN=auto
.
go env -w GOTOOLCHAIN=auto
go build -C bins -o "$ROOTFS_DEST/usr/local/bin/" ./...
Then I created root/etc/init.d/srvmonitor
.
#!/sbin/openrc-run
# shellcheck shell=sh
description="Serve scripts from /etc/monitor.d"
command=/usr/local/bin/srvmonitor
command_background=true
pidfile="/run/${RC_SVCNAME}.pid"
depend() {
need net localmount
after firewall
}
And finally I added one line to setup.sh
.
rc-update add srvmonitor default
That’s it. The Go server listens on port 80 on the Tailscale IP, and serves the output of scripts I put in /etc/monitor.d/
.
frood
The entire setup is open source, in my mostly-harmless repository. You might be interested in how I made ZFS imports work, which is not covered above.
I have not made it into a reusable project partially because there is so little to it. Adding hooks to configure things would easily double its size. I encourage you to just fork it if you’d like.
One thing I haven’t solved yet is how to inject secrets. For now they are just .gitignore
’d. Maybe I’ll plug in a YubiKey and use age-plugin-yubikey
to decrypt them, and yubikey-agent
for the host key. Or maybe this board has a TPM and I can use the simplicity of this system to get a full Secure Boot chain that unlocks TPM keys. That’d be fun.
If you got this far, you might also want to follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @filippo@abyssdomain.expert.
The picture
The natural pools of Porto Moniz, in Madeira. They’re publicly accessible, made of volcanic rock, and filled by the ocean waves that crash spectacularly against them. I was not doing great that day, but it was an excellent place to not do great at.
Madeira is pretty cool.[2] Also one of the trickiest crosswind landings.
My maintenance work is funded by the awesome Geomys clients: Interchain, Smallstep, Ava Labs, Teleport, SandboxAQ, Charm, Tailscale, and Sentry. Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement.)
Here are a few words from some of them!
Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews.
Ava Labs — We at Ava Labs, maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team.
SandboxAQ — SandboxAQ’s AQtive Guard is a unified cryptographic management software platform that helps protect sensitive data and ensures compliance with authorities and customers. It provides a full range of capabilities to achieve cryptographic agility, acting as an essential cryptography inventory and data aggregation platform that applies current and future standardization organizations mandates. AQtive Guard automatically analyzes and reports on your cryptographic security posture and policy management, enabling your team to deploy and enforce new protocols, including quantum-resistant cryptography, without re-writing code or modifying your IT infrastructure.
Charm — If you’re a terminal lover, join the club. Charm builds tools and libraries for the command line. Everything from styling terminal apps with Lip Gloss to making your shell scripts interactive with Gum. Charm builds libraries in Go to enhance CLI applications while building with these libraries to deliver CLI and TUI-based apps.