Most people probably heard of Podman. An OCI Container management tool very similar to Docker - with some very interesting differences like being able to run completely rootless, easily integrate with systemd and the use of Pods to mention some.

The most common way of running podman containers is podman run, easily translated from docker run.
For example to start a web server on port 8080 run:
podman run --name my_webserver -dt -p 8080:80/tcp docker.io/nginx

But I’ve always preferred running my containers from a declared file/instruction like docker-compose.yml for easier portability and backups. The run method never appealed to me. The podman generate systemd is interesting but still requires starting with podman run then convert everything to systemd service files.

Enter Quadlets

But then a few weeks ago I stumbled upon Quadlets. A way to write the systemd services through reasonable templating with the [Container] section. That’s the answer I needed to migrate from docker-compose-yml.

So for example, a simple docker-compose.yml of the homer could look like this:

---
services:
  homer:
    image: b4bz/homer
    container_name: homer
    volumes:
      - /your/local/assets/:/www/assets
    ports:
      - 8080:8080
    user: 1000:1000 
    environment:
      - TZ=Europe/Stockholm

And rewritten to a homer.container quadlet would look like this:

[Container]
ContainerName=homer
Image=docker.io/b4bz/homer

Volume=/your/local/assets:/www/assets
PublishPort=8080:8080
Environment=TZ=Europe/Stockholm
User=1000

[Service]
Restart=on-failure

[Install]
WantedBy=default.target

And then reload systemctl --user daemon-reload to generate the service and start with systemctl --user start homer


Is it really that simple? Well both yes and no, it’s that simple if you just run podman as root - but my goal and reason to swapping to podman is to run it rootless - and while it’s pretty easy to get going it has some hurdles to work through.


Setting up the environment/server/VM for rootless podman

I’ll call my unprivileged user servuser in all the examples.

  • Quadlets require Podman version 4.4+ and cgroups v2
    • Check with podman --version and podman info --format {{.Host.CgroupsVersion}}
  • Rootless containers require subuid’s mapped in /etc/subuid and /etc/subgid.
    • If not set, add: servuser:100000:65536 to both files as root/sudo.
  • Create a path to store the quadlet service-files for a rootless user:
    • ~/.config/containers/systemd/ <- This is my preferred choice.
    • other options: $XDG_RUNTIME_DIR/containers/systemd/ or /etc/containers/systemd/users/$(UID) or /etc/containers/systemd/users/
  • Enable linger (this is to allow services to run after logout)
    • sudo enable-linger servuser
  • If hosting something that require privileged ports (like reverse proxy 80/443 or dns 53), you have to allow it:
    • add net.ipv4.ip_unprivileged_port_start=80 to /etc/sysctl.conf
    • enforce new settings with sysctl -p /etc/sysctl.conf

Create a multi container setup

First create a network unit to be able to connect the different containers:
~/.config/containers/systemd/stacknet.network

[Unit]
Description=Stacknet network
# This is systemd syntax to wait for the network to be online before starting this service:
After=network-online.target
 
[Network]
NetworkName=stacknet
# These are optional, podman will just create it randomly otherwise.
Subnet=10.10.0.0/24
Gateway=10.10.0.1
DNS=9.9.9.9
 
[Install]
WantedBy=default.target

Then create a container to use the network:
~/.config/containers/systemd/nginx.container

[Unit]
Description=Nginx container

[Container]
ContainerName=nginx
Image=docker.io/nginx

Volume=%h/container_volumes/nginx/serve:/usr/share/nginx/html:Z,U
# And here we define the shared network:
Network=stacknet.network
PublishPort=80:80
PublishPort=443:443
Environment=TZ=Europe/Stockholm

[Service]
Restart=on-failure

[Install]
WantedBy=default.target
  • Volume:
    • the %h is the systemd-syntax for $HOME. :U tells podman to chown the source volume to match the default UID+GID within the container.
    • SELinux; :z sets the shared content label while :Z is a private, unshared label that only this container can read.
  • Environment:
    • Just use the usual syntax but with Environment= prefix.
  • After=network-online.target - Service waits for the host network before starting.
  • WantedBy=default.target - Service is started at boot.

After that, a 2nd container can be added with the same Network=stacknet.network option. And options like After=nginx.service or Wants=nginx.service to trigger dependencies and order.

This could also be handled with pods - though I’ll leave that for another day.

Setting up automatic updates

Setting up podman auto-update to keep your containers fresh.

podman auto-update pulls down new container images and restarts containers configured for auto updates. If restarting a systemd unit after updating the image has failed, rollback to using the previous image and restart the unit another time.

  • Ensure the two necessary systemd units are activated: systemctl --user enable podman-auto-update.{service,timer} --now
  • Then add AutoUpdate=registry to the quadlets [Container] section.

Might be worth tp delve deeper with options like Notify=healty and the HealthCmd= to have a surer way of assessing if a service started successfully.



Some more in depth settings and troubleshooting

It’s not always straight forward and the documentation is still rather scarce.
Make frequent use of podman ps and podman stats while troubleshooting. Check status and logs for services with

  • systemctl --user status containername
  • journalctl --user -xeu containername.service.

Error “Failed to start name.service: Unit name.service not found.” is usually due to a typo or syntax error in the .container file which sadly isn’t pointed out.

You can manually inspect the generated services at:

  • /run/user/1000/systemd/generator/containername.service.

More info on subuid and usermap

In the setup we looked at /etc/subuid - that sets a range of subordinate UIDs mapped to each user with the syntax username:start_uid:uid_count.
My example showed servuser:100000:65536 - starting at 100000 with range 65536 (default).
Read more general info

These subuids are used to map up containerized users - I find it quite hard to grasp, but briefly my understanding boils down to:

  • Root inside is mapped to running userID outside.
  • uid 1 inside is mapped to first subUID outside - so in this case 100000.
  • uid 2 inside is then 100001 etc up to 165535 which is the last within the range.

That means a root owned file inside is owned by host-user servuser outside. And if a regular user (eg. uid 1000) inside then creates a file in a mount, that file will be owned by uid 100999 on the host (100000+1000-1 due to root being automapped).

This can be manipulated a few ways to get the right mapping depending on necessity. With options like UserNS=keep-id and UIDMap. Read more.

I’ll explain a complex case of this - with a linuxserver.io image - those usually do primary setup as root and then run the actual service as UID1000.

Lets look at linuxserver/nginxas an example. The docker-compose.yml would look like this:

---
services:
  nginx:
    image: lscr.io/linuxserver/nginx:latest
    container_name: nginx
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Stockholm
    volumes:
      - /home/servuser/container_volumes/nginx/config:/config
    ports:
      - 80:80
      - 443:443
    restart: unless-stopped

And rewritten to a quadlet:

[Container]
ContainerName=nginx
Image=lscr.io/linuxserver/nginx:latest

Volume=%h/container_volumes/nginx/config:/config:Z,U
PublishPort=80:80
PublishPort=443:443
Environment=TZ=Europe/Stockholm

Environment=PUID=1000
Environment=PGID=1000

UIDMap=1000:0:1
UIDMap=0:1:1000
UIDMap=1001:1001:64536

[Service]
Restart=on-failure

[Install]
WantedBy=default.target
  • UIDMap
    • 1000:0:1 - 1000 within, intermediate 0 which automaps to my servuser, range 1 - so only that one.
    • 0:1:1000 - 0 within, intermediate 1 (my first subuid), range 999. So internal 0-999 mapped to my Subuid 1-999 (100000-100999).
    • 1001:1001:64536 - 1001 within, intermediate 1001, range 64536 (so my whole Subuid range -1000. The rest of my subuid maps that is).

That way 1000 within is mapped to my servuser while the full range of 0-65536 (except 1000) is mapped to my Subuids 100000-165536.

You can choose to have separate users on the same host, to further segment the namespaces between containers, all users with their own Subuid-ranges.

Devices and SELinux

As I prefer using Fedora Server for podman, I’ve also encountered some constrains with SELinux.
One example of this is when I wanted to map the /dev/net/tun device inside my VPN-container. SELinux blocked permissions to that device for the rootless container.

This can be solved with some SELinux Policy creation (inspired by: source)

My container is called gluetun. To generate a new policy from the block in audit.log follow these steps as root/sudo:

  • # grep gluetun /var/log/audit/audit.log | audit2allow -a -M cont_tun
  • Then inspect the generated rules # cat cont_tun.te
module cont_tun 1.0;

require {
        type tun_tap_device_t;
        type container_file_t;
        type container_t;
        class chr_file { getattr ioctl open read write };
        class sock_file watch;
}

#============= container_t ==============
allow container_t container_file_t:sock_file watch;

#!!!! This avc can be allowed using the boolean 'container_use_devices'
allow container_t tun_tap_device_t:chr_file { getattr ioctl open read write };
  • If it looks correct, install the module # semodule -i cont_tun.pp

You could also copy someone else rule like the .te file above, and convert it to a policy:

  • Convert it to a policy module # checkmodule -M -m -o gluetun_policy.mod gluetun_policy.te
  • Compile the policy # semodule_package -o gluetun_policy.pp -m gluetun_policy.mod and install it # semodule -i gluetun_policy.pp

And then restart the container and look at the logs if the permission errors are gone!



That’s it for now.

Read more!

Future exploration

  • Experiment with pods, pros and cons?
  • Create and manage with Ansible.
  • Host on a more container focused OS like Fedora CoreOS or openSUSE MicroOS