# Variant RCA Report — CVE-2026-54500

## Summary

No bypass or materially distinct alternate trigger was found. CVE-2026-54500 is an
uninitialized stack-memory read in `ext/oj/intern.c`'s `form_attr()` long-key path
(`rb_intern3(buf, ...)` should be `rb_intern3(b, ...)`). The fix (commit `bbde91a`,
v3.17.3) is a single-character change that closes the **only** code path that reaches
this sink: `Oj.load(json, mode: :object)` → `object.c:oj_set_obj_ivar()` →
`intern.c:oj_attr_intern()` → `cache.c:cache_intern()` → `intern.c:form_attr()`. An
exhaustive empirical sweep of every Oj parse mode (`:object`, `:compat`, `:rails`,
`:strict`, `:null`, `:wab`, `:custom`) plus the newer `Oj::Parser` API (`:usual` with
object creation, `:usual` Hash, symbol-cached) — each tested with a 300-byte key on
both the vulnerable commit (`495cc38`, v3.17.2) and the fixed commit (`bbde91a`,
v3.17.3) — confirms that **only** `:object` mode leaks, and only on the vulnerable
version. The duplicate copy of the same pattern in `usual.c` was already fixed earlier
(`ec368db`, #1014, an ancestor of v3.17.2), so `:compat`/`:rails` and the newer parser
were never vulnerable to this specific bug on v3.17.2. The `bbde91a` commit also
bundles an unrelated `fast.c` depth-overflow fix (`doc_each_child`) which is a separate
bug, not a variant of CVE-2026-54500.

## Fix Coverage / Assumptions

- **Invariant the fix relies on:** The only way an attacker-controlled JSON key of
  length ≥ 254 reaches the uninitialized-stack-buffer read is via `intern.c:form_attr()`
  's long-key branch. The fix changes that one `rb_intern3(buf, len + 1, ...)` to
  `rb_intern3(b, len + 1, ...)` so the correctly-populated heap buffer `b` is interned
  instead of the uninitialized stack buffer `buf`.
- **Code path explicitly covered:** `Oj.load(..., mode: :object)` → `object.c`
  `oj_set_obj_ivar` → `oj_attr_intern` (intern.c:145) → `cache_intern` (cache.c:324,
  threshold `CACHE_MAX_KEY = 35`) → `intern.c:form_attr` long-key path (len ≥ 254).
- **What the fix does NOT cover (and why that is OK here):**
  - `usual.c` has its own `static form_attr()` with the identical long-key pattern. It
    is **not** touched by `bbde91a`, but it was already fixed in `ec368db` (#1014
    "Fix stack limits / Fix extreme key length bug"), which is an ancestor of v3.17.2.
    Verified: `git show 495cc38:ext/oj/usual.c` already contains `rb_intern3(b, ...)`.
  - `:compat`/`:rails` modes dispatch to `oj_compat_parse` (compat.c), which uses
    `oj_calc_hash_key` for Hash keys and `json_create` for object creation — it never
    calls `form_attr` or `oj_attr_intern`.
  - `:strict`/`:null` modes (strict.c) intern keys with `rb_intern3(parent->key,
    parent->klen, ...)` directly from the parsed key buffer (no stack buffer).
  - `:wab` (wab.c) uses `oj_sym_intern` → `form_sym` (builds a Ruby `String` first; no
    stack buffer).
  - `:custom` (custom.c) uses `oj_calc_hash_key` for Hash keys.
  - The newer `Oj::Parser` API accumulates keys in a **dynamic** `p->key` buffer, copies
    them into a `Key` struct (`kp->buf` for short, heap `kp->key` for long — both
    properly filled by `push_key`), and for object creation reaches `usual.c`'s
    `form_attr` via `get_attr_id` → `cache_intern(d->attr_cache)` (already fixed). Its
    `:object` mode is **unimplemented** (`// TBD` placeholder, parser.c:1263).

## Variant / Alternate Trigger

**No bypass or alternate trigger was confirmed.** The following candidate entry points
were tested empirically (4 processes each, on both vulnerable and fixed versions):

| Mode / Entry point | Reaches `intern.c form_attr`? | Reaches `usual.c form_attr`? | Vuln leak? | Fixed leak? |
|---|---|---|---|---|
| `Oj.load :object` (`^o:Oj::Bag`) | **Yes (only path)** | No | **Yes** (encoding_error, per-run variation) | No (correct) |
| `Oj.load :compat` (json_class / `^o`) | No (uses `oj_calc_hash_key`) | No (compat.c) | No | No |
| `Oj.load :compat` (plain Hash) | No | No | No | No |
| `Oj.load :rails` | No (dispatches to `oj_compat_parse`) | No | No | No |
| `Oj.load :strict` | No (`rb_intern3` from parsed key) | No | No | No |
| `Oj.load :null` | No | No | No | No |
| `Oj.load :wab` | No (`oj_sym_intern`/`form_sym`) | No | No | No |
| `Oj.load :custom` | No (`oj_calc_hash_key`) | No | No | No |
| `Oj::Parser.new(:usual)` + create_id (object) | No | Yes (already fixed pre-v3.17.2) | No | No |
| `Oj::Parser.new(:usual)` (Hash) | No | No | No | No |
| `Oj::Parser.new(:usual)` + create_id + cache_keys | No | Yes (already fixed) | No | No |
| `Oj::Parser.new(:object)` | N/A — **unimplemented** (`// TBD`) | N/A | N/A | N/A |

The "alternate entry point that reaches a *different* unfixed copy of the same sink"
candidate (`usual.c form_attr`) was ruled out because that copy was already fixed before
the vulnerable version was tagged. There is no third copy of the `form_attr` long-key
pattern in the codebase (`grep` for `rb_intern3(b,` / `rb_intern3(buf,` / `char buf[256]`
+ `OJ_R_ALLOC_N(char, len` confirms only `intern.c` and `usual.c`).

## Impact

- **Package/component:** `ohler55/oj` — C extension, `ext/oj/intern.c`, `form_attr()`
- **Affected versions (as tested):** Oj 3.17.2 (`495cc38`) — vulnerable via `:object`
  mode only; Oj 3.17.3 (`bbde91a`) — fixed.
- **Risk level:** Medium (information disclosure of process stack memory).
- **Consequences:** An attacker controlling JSON input with a key ≥ 254 bytes can cause
  `Oj.load(..., mode: :object)` to intern `len+1` bytes of uninitialized stack memory
  (and, for keys ≥ 256, read out of bounds past the 256-byte `buf`). The leaked bytes
  surface via the produced Symbol/instance-variable name or via the `EncodingError`
  message when the garbage is not valid UTF-8.

## Impact Parity

- **Disclosed/claimed maximum impact:** Uninitialized stack memory read / OOB read,
  leaking process stack contents via Symbol or `EncodingError` message.
- **Reproduced impact from this variant run:** The original `:object`-mode leak was
  re-confirmed on the vulnerable version (`EncodingError`, `MSG_LEN` varying
  1271–1432 across runs — proving non-deterministic uninitialized memory, 1250–1426
  non-`A` leaked bytes). No other mode reproduced any leak on either version.
- **Parity:** `none` for the variant search — no additional impact path was found beyond
  the already-known `:object`-mode path, which is fully closed by the fix.
- **Not demonstrated:** No code execution (information-disclosure bug only).

## Root Cause

The root cause is a `buf`/`b` variable confusion in `intern.c:form_attr()`. When a key
is ≥ 254 bytes (`sizeof(buf) - 2 <= len`), the function allocates a heap buffer `b`,
correctly fills it with `'@' + key + '\0'`, then **erroneously** passes the uninitialized
256-byte stack buffer `buf` to `rb_intern3()` instead of `b`. This is a duplicate of a
bug that was previously fixed in `usual.c` (commit `ec368db`, #1014) but missed in
`intern.c`. The fix commit `bbde91a` (#1015, "Fix intern.c and fast.c") corrects
`intern.c` (`buf` → `b`) and, separately, adds a `MAX_STACK` depth check to
`fast.c:doc_each_child()` (an unrelated depth-overflow bug, not a variant of this CVE).

Because `oj_attr_intern` (the sole caller chain to `intern.c:form_attr`) is invoked only
from `object.c:oj_set_obj_ivar`, and every other mode uses different key-interning
mechanisms that do not employ the stack-buffer-then-heap-fallback pattern, the fix
completely eliminates the reachable vulnerable sink.

**Fix commit:** `bbde91a679728f94c4492ebc3683f4fa3309049f`
**Earlier duplicate fix (usual.c):** `ec368dbe936ef0104b782e4b0f67b17d6c7276f7`

## Reproduction Steps

1. **Reference:** `bundle/vuln_variant/reproduction_steps.sh` (self-contained,
   idempotent).
2. **What the script does:**
   - Resolves the durable project cache and reuses the `ohler55/oj` clone.
   - Builds the **vulnerable** commit `495cc38` (v3.17.2) via `ruby extconf.rb && make`.
   - Runs `probe_variant.rb` for 11 entry points (all Oj.load modes + newer
     `Oj::Parser` API variants), 4 separate processes each, with a 300-byte key.
   - Builds the **fixed** commit `bbde91a` (v3.17.3) and repeats the same 44 probe runs.
   - Emits a variant/bypass matrix and a verdict, writes `runtime_manifest.json`, and
     restores the repo to the fixed commit.
3. **Expected evidence:**
   - Vulnerable: only `object` mode produces `OUTCOME=encoding_error` with per-run
     `MSG_LEN` variation (1271–1432); all other modes `OUTCOME=correct`.
   - Fixed: all 11 modes `OUTCOME=correct` (ivar_len=301 / key_len=300, deterministic
     `0x40`+`0x41…` / `0x41…` bytes); zero `encoding_error`, zero `leak`.

## Evidence

- **Log:** `bundle/logs/vuln_variant_repro.log` — full build + sweep transcript.
- **Vulnerable outcomes:** `bundle/logs/vuln_variant_outcomes.txt`
- **Fixed outcomes:** `bundle/logs/fixed_variant_outcomes.txt`
- **Probe script:** `bundle/vuln_variant/probe_variant.rb`
- **Runtime manifest:** `bundle/vuln_variant/runtime_manifest.json`
- **Fixed/vulnerable version identity:** `bundle/logs/vuln_variant/fixed_version.txt`

### Key excerpts (second verification run)

**Vulnerable `object` mode — leaks, per-run variation (uninitialized memory):**
```
[vuln object run 1] encoding_error MSG_LEN=1271 NON_A=1250
[vuln object run 2] encoding_error MSG_LEN=1343 NON_A=1333
[vuln object run 3] encoding_error MSG_LEN=1272 NON_A=1259
[vuln object run 4] encoding_error MSG_LEN=1351 NON_A=1317
[vuln compat_obj run 1] correct hash_key KEY_LEN=300 FIRST=41414141…
```

**Fixed — all modes clean:**
```
[fixed object run 1] correct ivar IVAR_LEN=301 FIRST=40414141…
[fixed object run 2] correct ivar IVAR_LEN=301 FIRST=40414141…
… (identical for all 11 modes, all 4 runs)
```

### Environment
- Ruby 3.3.8 (x86_64-linux-gnu), GCC 15.2.0, Ubuntu.
- Oj built from source at vulnerable `495cc38` and fixed `bbde91a`.

## Recommendations / Next Steps

- **No additional fix is required** for CVE-2026-54500; the one-character `buf`→`b`
  change in `intern.c` fully closes the only reachable sink. The Coding stage should
  ensure this change is present and consider the items below as defense-in-depth.
- **Consolidate the duplicate `form_attr`:** `intern.c` and `usual.c` each carry their
  own copy of `form_attr`. Unifying them into a single shared function would prevent
  future copy-paste divergence (this exact class of bug already occurred twice).
- **Add a regression test** parsing a JSON object with a ≥ 254-byte key in `:object`
  mode asserting the resulting attribute name equals the input (deterministic).
- **Harden the `usual.c` copy parity:** note that `intern.c`'s `form_attr` handles a
  `~`-prefix case that `usual.c`'s does not — a behavioral divergence worth aligning.
- **Consider `-ftrivial-auto-var-init=pattern`** and MSan/ASan in CI to make
  uninitialized reads deterministic and caught automatically.
- **The `Oj::Parser.new(:object)` placeholder (`// TBD`) should be implemented or
  removed** to avoid confusion; currently it silently produces a non-functional parser.

## Additional Notes

- **Idempotency:** The script was run twice consecutively; both runs completed fully
  (exit 1 = no bypass found) without crashing, and restored the repo to `bbde91a`.
- **Threat model:** `SECURITY.md` is a generic template (supported versions + reporting
  process) with no explicit exclusions for memory-safety classes; the bug is within
  scope as an information-disclosure vulnerability reachable from attacker-controlled
  JSON input.
- **Scope discipline:** The `fast.c` `doc_each_child` depth-overflow fix bundled in the
  same commit (`bbde91a`) is a **separate** bug (different root cause, different sink,
  different impact class) and was deliberately **not** claimed as a variant of
  CVE-2026-54500, per the rule that separate bugs are not bypasses of an unrelated fix.
- **Bounded search justification:** Fewer than 3 "real" bypass candidates exist because
  static analysis (mode dispatch in `oj.c:1255-1261`, `rb_intern3`/`form_attr` call-site
  enumeration, `usual.c` fix ancestry) proves there is exactly one reachable copy of the
  vulnerable sink (`intern.c form_attr`) and one already-fixed copy (`usual.c form_attr`).
  All remaining modes use key-interning paths without the stack-buffer-then-heap-fallback
  pattern. The 11-mode empirical sweep confirms this exhaustively.
