{"repro_id":"REPRO-2026-00228","version":6,"title":"nginx charset module segfaults when charset_map uses utf-8 as source charset, causing a NULL pointer dereference and DoS.","repro_type":"security","status":"published","severity":"high","description":"A misconfigured charset_map directive with utf-8 as the source charset (first column) causes nginx to dereference a NULL pointer and segfault during request processing. The charset module was never designed to handle UTF-8 as a source charset in charset_map; accessing uninitialized tables leads to an immediate worker process crash.","root_cause":"# RCA Report: nginx charset_map utf-8 Source Charset NULL-Dereference Segfault\n\n## Summary\n\nA misconfigured `charset_map` directive with `utf-8` in the first column (source charset) causes nginx to create wrong-format single-byte conversion tables. When a subsequent HTTP request triggers the charset filter's `recode_from_utf8()` path, the 256-byte single-byte table is cast to `u_char **` and dereferenced as a pointer array (`table[n >> 8]`), reading garbage bytes as a memory address and crashing the worker process with SIGSEGV (signal 11). The upstream fix (commit `29c23ad846787e8baa1390b2edca479eb63ea8d7`) adds a configuration-time validation that rejects `charset_map` with `utf-8` in the first column, preventing the invalid configuration from ever being loaded.\n\n## Impact\n\n- **Package/Component affected:** nginx `src/http/modules/ngx_http_charset_filter_module.c` (the `ngx_http_charset_filter_module`)\n- **Affected versions:** nginx versions prior to commit `29c23ad846787e8baa1390b2edca479eb63ea8d7` (tested on nginx/1.31.3 at parent commit `8f3465ac7f02b0ae86304e1be4ed319abb9d2edb`)\n- **Risk level:** High — any attacker who can send an HTTP request to a server configured with the vulnerable `charset_map` directive causes an immediate worker process crash (denial of service). The crash occurs on every request to the affected location.\n- **Consequences:** Repeated requests cause continuous worker respawns and crashes, degrading server availability. The crash is deterministic and triggered by a single HTTP GET request.\n\n## Impact Parity\n\n- **Disclosed/claimed maximum impact:** Denial of Service (DoS) via NULL pointer dereference / segfault in nginx worker process when processing requests with the misconfigured `charset_map`.\n- **Reproduced impact from this run:** DoS confirmed — nginx worker process crashes with SIGSEGV (signal 11, core dumped) on every HTTP request to the affected location. The worker is killed immediately when processing response body data containing non-ASCII bytes through the `recode_from_utf8()` code path.\n- **Parity:** `full` — the reproduced segfault/DoS matches the claimed impact exactly.\n- **Not demonstrated:** No code execution or privilege escalation was claimed or observed; the impact is purely a DoS crash.\n\n## Root Cause\n\nThe charset filter module supports two table formats:\n1. **Single-byte tables** (256 bytes): used when neither charset in a `charset_map` is UTF-8. Each byte maps directly: `table[src_byte] = dst_byte`.\n2. **UTF-8 multi-byte tables** (256 × `NGX_UTF_LEN` = 1024 bytes for `src2dst`, and an array of `u_char *` pointers for `dst2src`): used when the *destination* charset (second column) is UTF-8.\n\nThe bug occurs because `ngx_http_charset_map_block()` decides which table format to allocate based solely on whether `value[2]` (the **destination/second** column) is `\"utf-8\"`. When `utf-8` appears in `value[1]` (the **source/first** column) and the destination is a single-byte charset (e.g., `windows-1251`), the code takes the `else` branch and allocates 256-byte single-byte tables for both `src2dst` and `dst2src`.\n\nDuring request processing, the charset filter's body filter calls `ngx_http_charset_recode_from_utf8()` when `ctx->from_utf8` is true (i.e., the source charset is UTF-8). This function casts `ctx->table` (the 256-byte buffer) to `u_char **table` and dereferences `table[n >> 8]` as a pointer:\n\n```c\ntable = (u_char **) ctx->table;   // 256-byte buffer cast to pointer array\n...\nn = ngx_utf8_decode(&src, len);   // decode UTF-8 sequence to codepoint\nif (n < 0x10000) {\n    p = table[n >> 8];            // reads 8 bytes at offset (n>>8)*8 as a pointer\n    if (p) {\n        c = p[n & 0xff];          // dereferences the garbage pointer → SIGSEGV\n```\n\nFor example, with Cyrillic `а` (U+0430, encoded as `0xD0 0xB0`), `ngx_utf8_decode` returns `n = 0x0430`, so `n >> 8 = 4`. `table[4]` reads bytes 32–39 of the 256-byte buffer (values `32,33,34,35,36,37,38,39`), which on little-endian 64-bit forms the garbage pointer `0x0000002726252423`. Since this is non-NULL, `p[n & 0xff]` dereferences `0x0000002726252453` — an unmapped address — causing SIGSEGV.\n\n**Fix commit:** `29c23ad846787e8baa1390b2edca479eb63ea8d7` — \"Charset: disabled charset_map with utf-8 in the first column\". The fix adds a check in `ngx_http_charset_map_block()` that rejects the configuration at parse time:\n\n```c\nif (ngx_strcasecmp(value[1].data, (u_char *) \"utf-8\") == 0) {\n    ngx_conf_log_error(NGX_LOG_EMERG, cf, 0,\n                       \"\\\"charset_map\\\" with \\\"utf-8\\\" charset \"\n                       \"should be given in the second column\");\n    return NGX_CONF_ERROR;\n}\n```\n\n## Reproduction Steps\n\n1. **Script:** `bundle/repro/reproduction_steps.sh`\n2. **What the script does:**\n   - Locates pre-built nginx binaries from the project cache (vulnerable build at commit `8f3465ac7` and fixed build at commit `29c23ad84`), with a fallback to clone-and-build from source.\n   - Creates an HTML file containing real UTF-8 multi-byte characters (Cyrillic `а`, `б`, `в` — bytes `0xD0 0xB0`, etc.) to trigger the non-ASCII code path.\n   - **Vulnerable test (×2):** Writes an nginx config with `charset_map utf-8 windows-1251 { }` + `charset windows-1251` + `source_charset utf-8`, starts nginx as a real TCP listener, sends an HTTP GET request via curl, and checks the error log for `exited on signal 11` (SIGSEGV).\n   - **Fixed test (×2):** Writes the same config and runs `nginx -t` to verify the config is rejected with the patch's error message.\n   - **Config acceptance contrast:** Verifies the vulnerable binary accepts the config (exit 0) while the fixed binary rejects it.\n   - Writes `bundle/repro/runtime_manifest.json` with proof artifacts.\n3. **Expected evidence:** Two vulnerable attempts showing `worker process N exited on signal 11 (core dumped)` in the error log, and two fixed attempts showing `\"charset_map\" with \"utf-8\" charset should be given in the second column`.\n\n## Evidence\n\n### Log file locations\n- `bundle/logs/vuln_error_1.log` — Vulnerable attempt 1 error log (segfault)\n- `bundle/logs/vuln_error_2.log` — Vulnerable attempt 2 error log (segfault)\n- `bundle/logs/vuln_conf_1.conf` / `vuln_conf_2.conf` — Vulnerable nginx configs\n- `bundle/logs/fixed_test_1.log` / `fixed_test_2.log` — Fixed version config rejection\n- `bundle/logs/vuln_config_accept.log` — Vulnerable config acceptance\n- `bundle/repro/runtime_manifest.json` — Runtime evidence manifest\n\n### Key excerpts\n\n**Vulnerable worker segfault (attempt 1):**\n```\n2026/07/04 18:20:50 [alert] 30827#0: worker process 30829 exited on signal 11 (core dumped)\n```\n\n**Vulnerable worker segfault (attempt 2):**\n```\n2026/07/04 18:20:57 [alert] 30847#0: worker process 30849 exited on signal 11 (core dumped)\n```\n\n**Fixed version config rejection:**\n```\nnginx: [emerg] \"charset_map\" with \"utf-8\" charset should be given in the second column\nnginx: configuration file ... test is successful → test failed (exit 1)\n```\n\n**Vulnerable version config acceptance:**\n```\nnginx: the configuration file ... syntax is ok\nnginx: configuration file ... test is successful (exit 0)\n```\n\n### Environment\n- nginx/1.31.3 built with `--without-http_rewrite_module --without-http_gzip_module --with-cc-opt='-g -O0'`\n- Vulnerable commit: `8f3465ac7f02b0ae86304e1be4ed319abb9d2edb` (parent of fix)\n- Fixed commit: `29c23ad846787e8baa1390b2edca479eb63ea8d7`\n- gcc 15.2.0, Linux x86_64\n\n## Recommendations / Next Steps\n\n1. **Apply the upstream fix** (commit `29c23ad846787e8baa1390b2edca479eb63ea8d7`) to reject `charset_map` with `utf-8` in the first column at configuration parse time.\n2. **Audit existing configurations** for any `charset_map` directives using `utf-8` as the source charset and remove or correct them.\n3. **Add a regression test** that verifies `nginx -t` fails when `charset_map utf-8 <charset> { }` is present.\n4. **Consider defensive coding** in `recode_from_utf8()` to validate table format before casting, as defense-in-depth against similar misconfigurations.\n\n## Additional Notes\n\n- **Idempotency:** The script uses randomized port bases to avoid TCP TIME_WAIT conflicts between consecutive runs. Verified to pass twice consecutively with exit code 0.\n- **Ticket config note:** The ticket's exact map entry `D0B0 E0` (a 2-byte hex value) is rejected even in the vulnerable version because the single-byte parsing path (`else` branch in `ngx_http_charset_map()`) requires values ≤ 255. The vulnerability is triggered with any valid single-byte map entry (e.g., `C0 E0`) or even an empty `charset_map` block (`charset_map utf-8 windows-1251 { }`), since the table format mismatch occurs regardless of the entries.\n- **Two crash paths:** The `charset_map utf-8 <non-utf8>` misconfiguration affects two request-time code paths:\n  - **`recode_to_utf8`** (when `charset utf-8; source_charset <non-utf8>;`): performs an out-of-bounds read at `table[*src * NGX_UTF_LEN]` on the 256-byte buffer, causing response corruption (\"zero size buf\" alert) and connection failure.\n  - **`recode_from_utf8`** (when `charset <non-utf8>; source_charset utf-8;`): casts the 256-byte buffer to `u_char **` and dereferences `table[n >> 8]` as a pointer, causing a reliable SIGSEGV.\n  - The reproduction uses the `recode_from_utf8` path for its deterministic crash behavior. Both paths are eliminated by the same fix.\n","source_url":"https://github.com/spaceraccoon/vulnerability-spoiler-alert/issues/306","package":{"name":"nginx","ecosystem":"generic","affected_versions":"< 29c23ad846787e8baa1390b2edca479eb63ea8d7 (exact releases not specified)","fixed_version":"29c23ad846787e8baa1390b2edca479eb63ea8d7"},"reproduced_at":"2026-07-04T19:54:34.521013+00:00","duration_secs":558.0,"tool_calls":99,"handoffs":2,"total_cost_usd":1.5999569200000003,"agent_costs":{"judge":0.00812995,"repro":1.5619554900000003,"support":0.02987148},"cost_breakdown":{"judge":{"gpt-5.4-mini":0.00812995},"repro":{"accounts/fireworks/routers/glm-5p2-fast":1.5619554900000003},"support":{"accounts/fireworks/routers/glm-5p2-fast":0.02987148}},"quality":{"confidence":"high","idempotent_verified":false,"community_verifications":0},"environment":{"sandbox_image":"ghcr.io/n3mes1s/pruva-sandbox@sha256:8096b2518d6022e13d68f885c3b8ded6b4fe607098b1a1ccbfb99abc004d1dc1"},"published_at":"2026-07-04T19:54:50.585634+00:00","retracted":false,"artifacts":[{"path":"bundle/repro/reproduction_steps.sh","filename":"reproduction_steps.sh","size":13528,"category":"reproduction_script"},{"path":"bundle/repro/rca_report.md","filename":"rca_report.md","size":9397,"category":"analysis"},{"path":"bundle/artifact_promotion_manifest.json","filename":"artifact_promotion_manifest.json","size":4471,"category":"other"},{"path":"bundle/repro/validation_verdict.json","filename":"validation_verdict.json","size":807,"category":"other"},{"path":"bundle/repro/runtime_manifest.json","filename":"runtime_manifest.json","size":826,"category":"other"},{"path":"bundle/logs/vuln_error_1.log","filename":"vuln_error_1.log","size":92,"category":"log"},{"path":"bundle/logs/vuln_error_2.log","filename":"vuln_error_2.log","size":92,"category":"log"},{"path":"bundle/logs/fixed_test_1.log","filename":"fixed_test_1.log","size":298,"category":"log"},{"path":"bundle/logs/fixed_test_2.log","filename":"fixed_test_2.log","size":298,"category":"log"},{"path":"bundle/logs/vuln_config_accept.log","filename":"vuln_config_accept.log","size":264,"category":"log"},{"path":"bundle/logs/vuln_conf_1.conf","filename":"vuln_conf_1.conf","size":636,"category":"other"},{"path":"bundle/logs/vuln_conf_2.conf","filename":"vuln_conf_2.conf","size":636,"category":"other"}]}