# CVE-2026-41579 — Variant Root Cause Analysis

## Summary

CVE-2026-41579 is a low-severity host filesystem integrity issue in opencontainers/runc. The original reproduction used an **absolute** `/dev` symlink in the container image to trick runc into deleting and recreating files on the host before `pivot_root(2)`. This variant stage tested whether the same underlying bug could be triggered through a **relative** `/dev` symlink or through the **`runc create` + `runc start`** CLI entry point. Both alternate triggers reproduced the same host-side impact on the vulnerable version (runc 1.3.5), but neither bypassed the fixed version (runc 1.3.6). The upstream fd-based `/dev` setup also blocks a third candidate, a `/dev/pts` symlink, because the vulnerable code only touches `/dev/ptmx`, not `/dev/pts` itself. No true bypass of the patch was found.

## Fix Coverage / Assumptions

The upstream fix (commit `a8e53f2c` in release-1.3, cherry-picked from `864db8042dbb`) rewrites the pre-pivot `/dev` setup to operate on a pre-opened file descriptor for the real rootfs directory (`rootFd`). The key assumptions are:

- `rootFd` is opened directly on `config.Rootfs` with `O_DIRECTORY|O_CLOEXEC|O_PATH`, so it cannot be a symlink.
- The `internal/pathrs` helpers (`UnlinkInRoot`, `SymlinkInRoot`, `MkdirAllParentInRoot`) use `pathrs-lite` / `filepath-securejoin` to walk paths relative to `rootFd` without following symlinks that escape the rootfs.
- All three setup phases (`createDevices`, `setupPtmx`, `setupDevSymlinks`) are now funneled through a single `doSetupDev(rootFd, config)` call.
- The same merge also hardens the cgroupv1 merged-subsystem symlink creation (`9432ad3a`) with `pathrs.SymlinkInRoot`, covering a sibling path-based symlink operation.

The fix does not change post-`pivot_root` behavior or other runtime-configured mount paths; it specifically closes the image-controlled `/dev` symlink path before the container switches root.

## Variant / Alternate Trigger

Three distinct candidates were tested:

1. **Relative `/dev` symlink** (`/bundle/rootfs/dev -> ../controlled_dev`).  
   - A different data path from the original absolute symlink. The symlink still resolves to a directory outside the rootfs, so it reaches the same vulnerable sink.
   - Code path: `runc run cve-ptmx-test -b /bundle` → `prepareRootfs` → `doSetupDev(rootFd, config)` → `setupPtmx` / `setupDevSymlinks`.

2. **`runc create` + `runc start` entry point**.  
   - A different CLI invocation that ultimately runs the same `prepareRootfs` code in the container init.
   - Code path: `runc create cve-ptmx-test -b /bundle && runc start cve-ptmx-test` → `prepareRootfs` → `doSetupDev` → `setupPtmx` / `setupDevSymlinks`.

3. **`/dev/pts` symlink** (`/bundle/rootfs/dev/pts -> /bundle/controlled_dev`).  
   - A different symlink location inside `/dev`. The vulnerable `setupPtmx` only operates on `/dev/ptmx`, so the `/dev/pts` symlink does not redirect the host-side file operations. This candidate was ruled out.

Results:

- Candidates 1 and 2 reproduced on runc 1.3.5 (decoy `ptmx` file removed and replaced by `ptmx -> pts/ptmx` plus the hardcoded `/dev` symlinks).
- Candidates 1 and 2 did **not** reproduce on runc 1.3.6; the decoy was preserved.
- Candidate 3 did not reproduce on either version.

## Impact

- **Package / component:** opencontainers/runc
- **Affected versions:** prior to 1.3.6, 1.4.3, and 1.5.0 (as per advisory)
- **Risk level:** low (per upstream advisory)
- **Consequences:** On the vulnerable version, an attacker-supplied container image with a `/dev` symlink can cause runc to delete a file named `ptmx` and create a fixed set of symlinks in an attacker-chosen host directory. The variant triggers demonstrate the same impact through a relative symlink and through the `create`/`start` entry point.

## Impact Parity

- **Disclosed / claimed maximum impact:** Arbitrary deletion of a host `ptmx` file and creation of a hardcoded set of symlinks in a host directory reachable via a malicious `/dev` symlink.
- **Reproduced impact from this variant run:** On runc 1.3.5, both the relative `/dev` symlink and the `create`/`start` path removed the decoy file and created the expected symlinks in the attacker-controlled directory. On runc 1.3.6, the decoy was preserved in both cases.
- **Parity:** `full` for the documented filesystem-integrity impact. The variants did not demonstrate privilege escalation or code execution, which is consistent with the advisory's low-severity rating.
- **Not demonstrated:** A bypass of the fixed version.

## Root Cause

The root cause is the same as the original CVE: before the container switches into its rootfs, `setupPtmx` and `setupDevSymlinks` used path strings built with `filepath.Join(config.Rootfs, "/dev/...")` and then called `os.Remove` / `os.Symlink`. If the image's `/dev` is a symlink (absolute or relative) to a directory outside the rootfs, those path operations follow the symlink and affect the host filesystem. The `runc create`/`runc start` path reaches the same `prepareRootfs` code, so the same bug is exercised.

Upstream fix commits:

- `a8e53f2c` (release-1.3) / `864db8042dbb` (main) — `rootfs: make /dev initialisation code fd-based`
- `9432ad3a` (release-1.3) / `66acd48f9d42` (main) — `rootfs: make cgroupv1 subsystem symlinks fd-based` (sibling hardening)
- `d934454b` — `merge CVE-2026-41579 fixes into release-1.3`

## Reproduction Steps

The reproduction is implemented in `bundle/vuln_variant/reproduction_steps.sh`. At a high level it:

1. Verifies Docker is available.
2. Downloads the vulnerable runc release binary (`v1.3.5`) and the fixed release binary (`v1.3.6`) if not cached.
3. Builds a minimal OCI rootfs from the official `busybox` image if not cached.
4. Builds two privileged Docker images (`repro-runc-vuln` and `repro-runc-fixed`) that each contain one runc binary and the rootfs.
5. For each variant, starts a fresh privileged container and:
   - Generates an OCI bundle with `runc spec`, disables the terminal, and sets the command to `/bin/true`.
   - Sets up the variant symlink layout (relative `/dev` symlink, `/dev/pts` symlink, or absolute `/dev` symlink for the `create`/`start` entry point).
   - Creates a decoy file named `ptmx` in the target directory.
   - Runs `runc run` or `runc create`/`runc start`.
   - Inspects whether the decoy was preserved, replaced by a symlink, or deleted.
6. Compares the vulnerable and fixed results for each variant and exits 1 because no variant bypasses the fixed version.

Expected evidence:

- **Vulnerable (1.3.5):** for the relative `/dev` symlink and the `create`/`start` entry point, the decoy `ptmx` is replaced by a symlink and the target directory contains the hardcoded `/dev` symlinks (`core`, `fd`, `stdin`, `stdout`, `stderr`).
- **Fixed (1.3.6):** the decoy `ptmx` remains a regular file and no new symlinks appear in the target directory.
- **`/dev/pts` symlink:** no version deletes the decoy; the symlink is preserved and `/dev/ptmx` is created inside the real `/dev` directory.

## Evidence

- `bundle/logs/vuln_variant/relative_dev_symlink_vuln.log` — vulnerable runc 1.3.5 replaces the decoy with `ptmx -> pts/ptmx` and creates the other hardcoded symlinks in the target directory.
- `bundle/logs/vuln_variant/relative_dev_symlink_fixed.log` — fixed runc 1.3.6 preserves the decoy.
- `bundle/logs/vuln_variant/create_start_entrypoint_vuln.log` — vulnerable runc 1.3.5 reached through `runc create`/`runc start` replaces the decoy with symlinks.
- `bundle/logs/vuln_variant/create_start_entrypoint_fixed.log` — fixed runc 1.3.6 reached through `runc create`/`runc start` preserves the decoy.
- `bundle/logs/vuln_variant/pts_symlink_vuln.log` — decoy preserved; `/dev/pts` symlink is left in place and `/dev/ptmx` is created in the real `/dev` directory.
- `bundle/logs/vuln_variant/pts_symlink_fixed.log` — same safe behavior on the fixed version.
- `bundle/vuln_variant/runtime_manifest.json` — runtime evidence manifest produced by the script.
- `bundle/logs/vuln_variant/fixed_version.txt` — exact tested fixed revision.
- `bundle/logs/vuln_variant/vulnerable_version.txt` — exact tested vulnerable revision.

Key excerpts:

Relative `/dev` symlink on vulnerable 1.3.5:

```text
VARIANT: relative_dev_symlink
RUN_VERSION: runc version 1.3.5
BEFORE: /bundle/controlled_dev contents
-rw-r--r--    1 root     root            10 ... ptmx
...
AFTER: /bundle/controlled_dev contents
lrwxrwxrwx    1 root     root            11 ... core -> /proc/kcore
lrwxrwxrwx    1 root     root            13 ... fd -> /proc/self/fd
lrwxrwxrwx    1 root     root             8 ... ptmx -> pts/ptmx
...
RESULT: decoy replaced by symlink
```

Relative `/dev` symlink on fixed 1.3.6:

```text
VARIANT: relative_dev_symlink
RUN_VERSION: runc version 1.3.6
BEFORE: /bundle/controlled_dev contents
-rw-r--r--    1 root     root            10 ... ptmx
...
AFTER: /bundle/controlled_dev contents
-rw-r--r--    1 root     root            10 ... ptmx
RESULT: decoy preserved
```

`create`/`start` on vulnerable 1.3.5:

```text
VARIANT: create_start_entrypoint
RUN_VERSION: runc version 1.3.5
RUNC_EXIT: create_rc=0 start_rc=0
AFTER: /bundle/controlled_dev contents
lrwxrwxrwx    1 root     root             8 ... ptmx -> pts/ptmx
...
RESULT: decoy replaced by symlink
```

`create`/`start` on fixed 1.3.6:

```text
VARIANT: create_start_entrypoint
RUN_VERSION: runc version 1.3.6
RUNC_EXIT: create_rc=0 start_rc=0
AFTER: /bundle/controlled_dev contents
-rw-r--r--    1 root     root            10 ... ptmx
RESULT: decoy preserved
```

## Recommendations / Next Steps

- Upgrade runc to a patched version: **1.3.6**, **1.4.3**, or **1.5.0** (or later). The 1.3.6 binary was tested here and blocks all variant candidates.
- Higher-level runtimes should not rely on a `/dev` symlink in the container image and should verify they are using a patched runc version.
- Regression tests should include both absolute and relative `/dev` symlinks, and should exercise both `runc run` and `runc create`/`runc start`.
- The sibling cgroupv1 symlink fix (`9432ad3a`) should be kept in any backport; it is part of the same defense-in-depth change.
- If a future change reintroduces mounting on top of the container root, `rootFd` must be reopened after that mount, as noted in the `prepareRootfs` comment.

## Additional Notes

- The script is idempotent: it reuses cached binaries and Docker images, runs each variant in a fresh container, and writes unique log files. It was executed twice and produced the same verdict.
- File bind-mounts did not work in this Docker environment, so the script generates inner shell scripts on the host and pipes them into the container via `sh -s < script`. This is documented as the working transport mechanism.
- The reproduction uses the real `runc` CLI binary and the real OCI bundle execution path, not a mocked environment.
- The privileged Docker-in-Docker container is required because the sandbox host lacks `CAP_SYS_ADMIN` and a writable cgroup hierarchy; inside the privileged container, runc has the capabilities needed to create a real container.
- No sanitizer or crash is involved; the proof relies on filesystem-state differences between vulnerable and fixed versions.
- Source identity for the tested binaries is recorded in `bundle/vuln_variant/source_identity.json` and `bundle/logs/vuln_variant/fixed_version.txt` / `vulnerable_version.txt`.
