Symfony Mailer / Mime 8.0: UTF-8 Subject gets corrupted (C3 B6 → C3 3F) during header encoding

Summary

This incident documents a header‑encoding regression in Symfony Mailer/Mime 8.0.x where valid UTF‑8 subjects become corrupted during RFC‑2047 Q‑encoding, specifically replacing UTF‑8 continuation bytes (0x80–0xBF) with ? (0x3F). The corruption occurs only during header encoding, not in the original message data.

Root Cause

The failure stems from incorrect handling of multibyte UTF‑8 sequences during Q‑encoding inside Symfony Mime’s unstructured header encoder. The symptoms strongly indicate:

Continuation bytes are being treated as invalid characters
mb_substitute_character() = 63 causes invalid bytes to be replaced with ?
The encoder likely performs byte‑by‑byte validation, not multibyte‑aware validation
Emojis (4‑byte sequences) and umlauts (2‑byte sequences) trigger the same failure pattern

In short: the Q‑encoder misinterprets valid UTF‑8 continuation bytes as invalid and substitutes them.

Why This Happens in Real Systems

This class of bug is common when:

Libraries upgrade internal encoding logic (Symfony 8 introduced header‑encoding changes)
Systems rely on mbstring settings, which can subtly alter behavior
Encoders assume single‑byte safety while operating on multibyte strings
Header folding logic interacts with byte boundaries, splitting sequences incorrectly
Q‑encoding is implemented with character‑level operations instead of byte‑level operations

These issues often surface only with:

Umlauts (ä, ö, ü)
Accented characters
Emojis
Any UTF‑8 sequence requiring continuation bytes

Real-World Impact

When a mailer corrupts UTF‑8 headers:

Recipients see broken subjects, reducing trust and professionalism
Spam filters penalize malformed headers
Automated systems fail to parse subjects, breaking workflows
International users receive unreadable messages
Support teams waste time diagnosing “random” encoding failures

For production systems, this is a high‑severity defect.

Example or Code (if necessary and relevant)

Below is a minimal reproduction pattern showing the difference between Symfony’s Q‑encoding and a correct Base64 fallback:

$subject = "Waffelhörnchen mit Sahne 🍴";

$email = (new Email())
    ->subject($subject); // Corrupts UTF‑8 in Symfony 8.0.x

// Manual workaround
$encoded = '=?UTF-8?B?' . base64_encode($subject) . '?=';
$email->getHeaders()->remove('Subject');
$email->getHeaders()->addTextHeader('Subject', $encoded);

How Senior Engineers Fix It

Experienced engineers approach this in a structured way:

Confirm the corruption occurs only during header encoding, not earlier
Reproduce with a minimal test case to isolate the encoder
Inspect Symfony’s HeaderEncoder classes for multibyte handling regressions
Switch to Base64 encoding as a temporary workaround
Disable or override the faulty Q‑encoder via custom header encoders
Open an upstream issue with:
- Hex dumps
- Minimal reproduction
- Environment details
- Expected vs. actual encoded output
Pin Symfony Mime to a known‑good version until a fix is released

Senior engineers know that header encoders are fragile, and UTF‑8 bugs rarely fix themselves.

Why Juniors Miss It

Less experienced developers often overlook this because:

They assume UTF‑8 issues originate in the database, not the mailer
They trust that framework defaults are always correct
They don’t inspect raw MIME output or hex dumps
They don’t know that Q‑encoding is byte‑sensitive
They rarely test with emojis or umlauts
They misinterpret the issue as a transport or SMTP problem, not an encoding bug

Juniors tend to debug the wrong layer, while the real issue lives deep inside the header‑encoding pipeline.