# CVE-2026-41579 — Patch / Fix Analysis

## Fix Overview

The upstream fix for CVE-2026-41579 rewrites the pre-`pivot_root(2)` `/dev` setup code in `libcontainer/rootfs_linux.go` to operate on a file descriptor that refers to the real container rootfs directory, rather than on path strings built from `config.Rootfs`. The vulnerable code in `setupPtmx` and `setupDevSymlinks` used `filepath.Join(config.Rootfs, "/dev/...")` and then `os.Remove` / `os.Symlink`, which followed a `/dev` symlink in the image and performed operations on the host directory it pointed to. The fixed code uses the `internal/pathrs` fd-based helpers that were already used elsewhere after the earlier `d40b3439a961` rootfs fd rework.

Relevant commits (release-1.3 branch, included in v1.3.6):

- `a8e53f2c` — `rootfs: make /dev initialisation code fd-based`
- `9432ad3a` — `rootfs: make cgroupv1 subsystem symlinks fd-based` (sibling hardening)
- `d934454b` — `merge CVE-2026-41579 fixes into release-1.3`
- `491b69ba` — `VERSION: release v1.3.6` (tag commit)

The equivalent main-branch commit is `864db8042dbb191028676f80addf8c35f348aee2`.

## What the Fix Changes

### `libcontainer/rootfs_linux.go`

A new helper `doSetupDev(rootFd *os.File, config *configs.Config)` is introduced and called from `prepareRootfs`:

```go
func doSetupDev(rootFd *os.File, config *configs.Config) error {
    if err := createDevices(rootFd, config); err != nil {
        return fmt.Errorf("error creating device nodes: %w", err)
    }
    if err := setupPtmx(rootFd); err != nil {
        return fmt.Errorf("error setting up ptmx: %w", err)
    }
    if err := setupDevSymlinks(rootFd); err != nil {
        return fmt.Errorf("error setting up /dev symlinks: %w", err)
    }
    return nil
}
```

`createDevices`, `setupPtmx`, and `setupDevSymlinks` now take the pre-opened rootfs directory handle instead of `config.Rootfs` strings.

`setupPtmx` was changed from:

```go
ptmx := filepath.Join(config.Rootfs, "dev/ptmx")
if err := os.Remove(ptmx); err != nil && !errors.Is(err, os.ErrNotExist) {
    return err
}
if err := os.Symlink("pts/ptmx", ptmx); err != nil {
    return err
}
```

to:

```go
if err := pathrs.UnlinkInRoot(rootFd, "/dev/ptmx", 0); err != nil && !errors.Is(err, os.ErrNotExist) {
    return err
}
return pathrs.SymlinkInRoot("pts/ptmx", rootFd, "/dev/ptmx")
```

`setupDevSymlinks` was changed from:

```go
for _, link := range links {
    src := link[0]
    dst := filepath.Join(rootfs, link[1])
    if err := os.Symlink(src, dst); err != nil && !errors.Is(err, os.ErrExist) {
        return err
    }
}
```

to:

```go
for _, link := range links {
    target, devName := link[0], link[1]
    if err := pathrs.SymlinkInRoot(target, rootFd, devName); err != nil && !errors.Is(err, os.ErrExist) {
        return err
    }
}
```

`createDevices` was similarly refactored to use `pathrs.MkdirAllParentInRoot(rootFd, node.Path, 0o755)` rather than `pathrs.MkdirAllInRootOpen(config.Rootfs, ...)`.

### `internal/pathrs/root_pathrslite.go` and `internal/pathrs/mkdirall.go`

The fix adds two new fd-based helpers:

- `UnlinkInRoot(root, subpath, flags)` — opens the parent directory of `subpath` under `root` using `OpenInRoot` (which uses `pathrs.OpenatInRoot` / `pathrs.MkdirAllHandle` from `filepath-securejoin/pathrs-lite`) and then calls `unix.Unlinkat` on that directory fd.
- `SymlinkInRoot(linktarget, root, subpath)` — creates the parent directory of `subpath` under `root` with `MkdirAllParentInRoot` and then calls `unix.Symlinkat` on that directory fd.

Both helpers are built on top of `OpenInRoot` / `MkdirAllInRoot`, which use the `pathrs-lite` library. That library is designed to walk paths relative to a root directory handle without following symlinks that escape the root.

`MkdirAllParentInRoot` was updated to share a `splitPath` helper so that it can safely split the target path into `(dir, filename)` and return an open directory handle plus the trailing filename. The path itself is first passed through `hallucinateUnsafePath`, which strips the root prefix and then calls `securejoin.SecureJoin` to produce a candidate path that can be fed to `pathrs-lite` safely.

## Fix Assumptions

1. **The rootfs directory handle is trustworthy.** `prepareRootfs` opens `rootFd` with `O_DIRECTORY | O_CLOEXEC | O_PATH` directly on `config.Rootfs`. If the rootfs path itself were a symlink, this open would fail. The security model assumes the *caller* (the higher-level runtime / user) chooses the bundle rootfs path, not the image.
2. **`pathrs-lite` / `filepath-securejoin` correctly refuses to traverse escaping symlinks.** The fix relies on `OpenatInRoot` / `MkdirAllHandle` returning an error or a handle inside the root when a path component is a symlink that points outside the rootfs.
3. **All pre-pivot `/dev` operations go through the fd-based helpers.** The new `doSetupDev` centralizes all three setup phases (`createDevices`, `setupPtmx`, `setupDevSymlinks`).
4. **Cgroupv1 symlink creation is not a bypassable path.** The merge also includes `9432ad3a`, which converts the `mountCgroupV1` merged-subsystem symlink creation to `pathrs.SymlinkInRoot`. This suggests the maintainers recognized a sibling path-based symlink operation that could share the same root cause.

## What the Fix Does NOT Cover

- **Rootfs-level mounts on top of `/`:** The comment in `prepareRootfs` explicitly notes that if runc ever re-enables support for mounting on top of the container root, `rootFd` would need to be reopened after such mounts. This is not currently supported, so it is not a gap today.
- **Post-pivot path-string operations:** After `pivot_root` / `chroot`, the process is inside the container and path-string operations on `/dev` are expected to follow the container's own symlinks. Those are not the same trust boundary.
- **Other pre-pivot path-string code:** While the merge also hardens cgroupv1 symlink creation, there are still other path-string operations in `rootfs_linux.go` (e.g., `prepareRoot` bind-mounts the rootfs onto itself, mounts are set up from runtime config). These are not directly image-controlled; the image cannot force a bind mount to a new destination unless the higher-level runtime passes image-specified mounts to runc.
- **`/dev` symlink remaining as a symlink in the container:** The fix does not *remove* a `/dev` symlink from the image. It simply does not follow it. The container will still have `/dev` as whatever the image provided, which may be a symlink to an attacker-controlled directory. This is a loss of expected container semantics but not a host filesystem integrity issue.

## Comparison of Behavior Before and After the Fix

| Scenario | Before (v1.3.5) | After (v1.3.6) |
|----------|-----------------|----------------|
| `/dev` is a symlink to a host directory | `setupPtmx` deletes the host file named `ptmx` and `setupDevSymlinks` creates fixed symlinks (`core`, `fd`, `stdin`, etc.) in the host directory. | `pathrs.UnlinkInRoot` / `pathrs.SymlinkInRoot` operate on the real rootfs directory handle; the host decoy is preserved and the symlink is not followed. |
| `/dev` is a relative symlink (`../controlled_dev`) | Same as above: the relative path is resolved from the symlink location and host files are modified. | The fd-based helpers do not follow the relative symlink; the target directory outside the rootfs is untouched. |
| `runc create` + `runc start` instead of `runc run` | Same vulnerable path: `prepareRootfs` runs during container start and follows the `/dev` symlink. | Same fixed path: `prepareRootfs` runs fd-based `/dev` setup and preserves the host decoy. |
| `/dev/pts` is a symlink to a host directory | `/dev` itself is a real directory, so `setupPtmx` operates on `/bundle/rootfs/dev/ptmx` and does not affect the host directory. The resulting `/dev/ptmx` symlink resolves through `/dev/pts`, but this is inside the container's namespace. | Same as before: no host impact. |

## Variant / Bypass Assessment

The variant stage tested three materially different candidates:

1. **Relative `/dev` symlink** (`/bundle/rootfs/dev -> ../controlled_dev`) — a different data path that still reaches `setupPtmx`/`setupDevSymlinks`. It reproduced on v1.3.5 but **not** on v1.3.6. The fixed version preserved the decoy.
2. **`runc create` + `runc start` entry point** — a different CLI invocation that still ends up in `prepareRootfs`. It reproduced on v1.3.5 but **not** on v1.3.6. The fixed version preserved the decoy.
3. **`/dev/pts` symlink** — a different symlink location inside `/dev`. It did **not** reproduce on either version because the vulnerable code only operates on `/dev/ptmx`, not through `/dev/pts`.

No bypass of the fixed version was observed. The fd-based `/dev` setup closes the known data paths and the known entry point variants.

## Recommendations for the Coding Agent

- Ensure the fix in `libcontainer/rootfs_linux.go` uses the pre-opened `rootFd` for **all** pre-pivot `/dev` operations, including any future additions to `doSetupDev`.
- Keep the cgroupv1 symlink fd-based conversion (`9432ad3a`) if backporting to branches that still include it.
- Add regression tests that exercise both absolute and relative `/dev` symlinks, as well as both `runc run` and `runc create`/`runc start` paths, and assert that no files outside the rootfs are created or removed.
- Consider adding an explicit error path when `MkdirAllParentInRoot` encounters an existing `/dev` symlink rather than silently skipping symlink creation, to give operators clearer feedback that the image contains an unusual `/dev` layout.
