# RCA Report — CVE-2026-54500

## Summary

Oj (Optimized JSON), a Ruby gem with a C extension, contains an uninitialized stack
memory read in `ext/oj/intern.c`'s `form_attr()` function. When `Oj.load` parses a
JSON object in `:object` mode whose key is 254 bytes or longer, the long-key code
path allocates a heap buffer `b`, correctly fills it with the attribute name, then
frees it — but passes the **uninitialized** 256-byte stack buffer `buf` (not `b`) to
`rb_intern3()`. Ruby therefore interns `len + 1` bytes of uninitialized stack memory
(and, for keys ≥ 256 bytes, reads out of bounds past `buf`). The leaked bytes surface
to the caller via the produced Symbol or via the `EncodingError` message raised when
the stack garbage is not valid UTF-8, disclosing process stack contents. The fix is a
single-character change: `rb_intern3(buf, ...)` → `rb_intern3(b, ...)`.

## Impact

- **Package/component:** `ohler55/oj` — C extension, `ext/oj/intern.c`, `form_attr()`
- **Affected versions:** Oj 0.0.1 – 3.17.2 (fixed in 3.17.3)
- **Risk level:** Medium
- **Consequences:** Information disclosure of process stack memory. An attacker who
  controls the JSON input (a key ≥ 254 bytes) can cause `Oj.load` to read and surface
  uninitialized stack bytes. The leak is observable through the `EncodingError`
  exception message (which embeds the invalid bytes) or through the produced Symbol
  object. The exact bytes and message length vary between process invocations,
  confirming the source is uninitialized (non-deterministic) memory.

## Impact Parity

- **Disclosed/claimed maximum impact:** Uninitialized stack memory read / out-of-bounds
  read, leaking process stack contents via Symbol or EncodingError message.
- **Reproduced impact from this run:** Uninitialized stack memory read confirmed.
  Every vulnerable run raised an `EncodingError` whose message contained 1262–1423
  bytes of non-input (leaked stack) data, with message lengths varying across runs
  (1276–1432 bytes). The fixed version produced the correct, deterministic attribute
  name with zero leaked bytes.
- **Parity:** `full` — the disclosed information-disclosure symptom (uninitialized
  stack memory surfacing via the EncodingError message, with per-run variation) was
  reproduced exactly, and the negative control on the fixed commit confirmed the fix.
- **Not demonstrated:** No code execution was claimed or demonstrated; this is an
  information-disclosure / memory-read bug, not a code-execution vulnerability.

## Root Cause

In `ext/oj/intern.c`, `form_attr(const char *str, size_t len)` converts a JSON object
key into a Ruby attribute ID (interned symbol). It declares a 256-byte stack buffer
`buf` (uninitialized) and branches on key length:

```c
static VALUE form_attr(const char *str, size_t len) {
    char buf[256];                              // UNINITIALIZED

    if (sizeof(buf) - 2 <= len) {               // long-key path: len >= 254
        char *b = OJ_R_ALLOC_N(char, len + 2);  // heap buffer
        ID    id;
        // ... b is filled correctly with '@' + key + '\0' ...
        id = rb_intern3(buf, len + 1, oj_utf8_encoding);  // BUG: reads `buf`, not `b`
        OJ_R_FREE(b);
        return id;
    }
    // short-key path: buf IS properly filled before use (correct)
    ...
    return (VALUE)rb_intern3(buf, len + 1, oj_utf8_encoding);
}
```

In the long-key path, `b` is the correctly-populated heap buffer, but `rb_intern3` is
called with `buf` — the uninitialized stack buffer. `rb_intern3` reads `len + 1` bytes
from `buf`. When `len >= 256`, this also reads out of bounds past the 256-byte `buf`.
The bytes are interned as a symbol; if they are not valid UTF-8, Ruby raises an
`EncodingError` whose message includes the offending bytes, leaking them to the caller.

This is a duplicate of an earlier fix in `ext/oj/usual.c` that was missed in `intern.c`.

**Call path:** `Oj.load(json, mode: :object)` → `object.c:oj_set_obj_ivar()` →
`intern.c:oj_attr_intern()` → `cache.c:cache_intern()` → `intern.c:form_attr()`.
Since `CACHE_MAX_KEY` is 35, keys ≥ 35 bytes bypass the cache and call `form_attr`
directly every time, so the uninitialized read occurs on every invocation with a
long key.

**Fix commit:** `bbde91a679728f94c4492ebc3683f4fa3309049f` ("Fix intern.c and fast.c
(#1015)") — changes `rb_intern3(buf, len + 1, oj_utf8_encoding)` to
`rb_intern3(b, len + 1, oj_utf8_encoding)` in the long-key path of `form_attr()`.

## Reproduction Steps

1. **Reference:** `bundle/repro/reproduction_steps.sh` (self-contained, idempotent).
2. **What the script does:**
   - Installs Ruby + build tools, clones (or reuses) `ohler55/oj`.
   - Checks out the **vulnerable** commit `495cc38` (v3.17.2, parent of the fix),
     builds the C extension via `ruby extconf.rb && make`.
   - Runs `Oj.load('{"^o":"Oj::Bag","AAA...300...AAA":1}', mode: :object)` in 6
     separate Ruby processes. The `^o:Oj::Bag` marker creates a non-Hash object so
     that `oj_set_obj_ivar` → `oj_attr_intern` → `form_attr` is invoked.
   - Checks out the **fixed** commit `bbde91a`, rebuilds, and runs the same probe
     6 times as a negative control.
   - Compares results, writes `runtime_manifest.json`, and exits 0 if confirmed.
3. **Expected evidence:**
   - Vulnerable: all runs raise `EncodingError`; message lengths vary per run
     (1276–1432 bytes), with 1262–1423 non-`A` (leaked stack) bytes.
   - Fixed: all runs return an `Oj::Bag` with a single 301-byte instance variable
     `@AAA...` (0x40 + 300×0x41), deterministic across all runs.

## Evidence

- **Log:** `bundle/logs/reproduction_steps.log` — full build + probe transcript.
- **Vulnerable outcomes:** `bundle/logs/vuln_outcomes.txt`
- **Fixed outcomes:** `bundle/logs/fixed_outcomes.txt`
- **Message-length variation:** `bundle/logs/vuln_msg_lengths.txt`
- **Probe script:** `bundle/repro/probe.rb`
- **Runtime manifest:** `bundle/repro/runtime_manifest.json`

### Key excerpts (from the second verification run)

**Vulnerable (commit 495cc38, v3.17.2) — all 6 runs leak:**
```
[vuln run 1] encoding_error   MSG_LEN=1348  NON_A_BYTES=1339
[vuln run 2] encoding_error   MSG_LEN=1349  NON_A_BYTES=1341
[vuln run 3] encoding_error   MSG_LEN=1350  NON_A_BYTES=1343
[vuln run 4] encoding_error   MSG_LEN=1276  NON_A_BYTES=1262
[vuln run 5] encoding_error   MSG_LEN=1432  NON_A_BYTES=1423
[vuln run 6] encoding_error   MSG_LEN=1368  NON_A_BYTES=1343
```
The `EncodingError` message begins `invalid symbol in encoding UTF-8 :"` followed by
Ruby `\xNN` escapes of the leaked stack bytes (e.g. `\xB8\xFF`, `\xD8\xFF`, `\xC0\xFF`)
— these are pointers/binary data, not the 0x41 (`A`) input bytes. The message length
varies across runs (1348–1432), which is impossible for deterministic, initialized
data and confirms the source is uninitialized stack memory.

**Fixed (commit bbde91a) — all 6 runs clean:**
```
[fixed run 1] parsed  IVAR_LEN=301  CORRECT_ATTR=true  FIRST_BYTES=40414141...
[fixed run 2] parsed  IVAR_LEN=301  CORRECT_ATTR=true  FIRST_BYTES=40414141...
... (identical for all 6 runs)
```
`FIRST_BYTES` = `40` (`@`) + `41` (`A`) repeated — the correct, deterministic
attribute name. No `EncodingError`, no leaked bytes.

### Environment
- Ruby 3.3.8 (x86_64-linux-gnu), GCC 15.2.0, Ubuntu.
- Oj built from source at vulnerable commit `495cc38` and fixed commit `bbde91a`.

## Recommendations / Next Steps

- **Upgrade to Oj 3.17.3+** which contains the one-character fix.
- **Audit `ext/oj/usual.c` and any other copies** of the `form_attr` pattern for
  the same `buf`/`b` confusion (this was already a duplicate of a `usual.c` fix).
- **Add a regression test** that parses a JSON object with a ≥ 254-byte key in
  `:object` mode and asserts the resulting attribute name matches the input.
- Consider compiling with `-ftrivial-auto-var-init=pattern` to make uninitialized
  reads more visible in CI, and enabling MSan/ASan in the test suite.

## Additional Notes

- **Idempotency:** The script was run twice consecutively; both runs exited 0 with
  `CONFIRMED=true`. The script cleans all build artifacts between vulnerable/fixed
  builds (`git clean -fdx ext/oj lib/oj`) and uses a manual `extconf.rb + make` flow
  (avoiding `rake compile`, which loads bundler and can interfere with the git
  checkout state).
- **Key-length boundary:** The bug triggers at `len >= 254` (`sizeof(buf) - 2 = 254`).
  At `len >= 256` the read also goes out of bounds past the 256-byte `buf`. The
  reproduction uses a 300-byte key to exercise both the uninitialized read and the
  OOB read.
- **Cache bypass:** Because `CACHE_MAX_KEY = 35`, the 300-byte key bypasses the
  attribute cache entirely, so `form_attr` is called fresh on every invocation —
  maximizing the observable per-run variation.
