Chrome translation breaks HTML structure in complex pages

Summary

A production investigation revealed that Chrome’s built-in translation engine frequently breaks the structural integrity of HTML documents. When translating content, the engine injects pseudo-tags (e.g., <a i=0>) to track text nodes. In complex documents containing nested inline elements like <code> or <strong>, the translation service often fails to respect the boundaries of these tags, leading to malformed DOM trees, broken layout styling, and “hallucinated” text where HTML entities are treated as translatable strings.

Root Cause

The failure stems from a granularity mismatch between the DOM tree and the translation engine’s tokenization process:

Tokenization Fragmentation: The engine breaks text content into discrete chunks to send to the translation model. It wraps these chunks in virtual tags (<a i=x>) to reconstruct the sentence later.
Boundary Overlap: When a sentence spans across multiple HTML tags (e.g., <strong>Text <code>code</code> end</strong>), the engine struggles to decide whether the “chunk” should include the inner tag or split at it.
Tag Injection Errors: If the translation model treats a partial HTML tag (like </code>) as part of the translatable text, it may attempt to “translate” the syntax itself, resulting in the engine outputting broken or mismatched tags.
Context Loss: By splitting a single semantic sentence into multiple disconnected requests, the model loses the syntactic context required to place grammar and punctuation correctly, often resulting in the “messy structure” observed in DevTools.

Why This Happens in Real Systems

In large-scale distributed systems like a web browser, several factors exacerbate this:

Latency vs. Accuracy Trade-offs: To keep translation fast, the engine prefers small, manageable chunks. However, smaller chunks lead to higher fragmentation, making it harder to maintain the original document structure.
Non-Deterministic Models: Modern translation relies on Neural Machine Translation (NMT). NMT models are probabilistic; they might predict that a punctuation mark or a tag belongs at the start of a segment rather than the end, causing structural drift.
DOM Complexity: Modern web apps use deeply nested component architectures. Every extra level of nesting increases the mathematical probability of a segmentation error during the translation pass.

Real-World Impact

UI/UX Degradation: Broken tags can lead to unclosed elements, causing the entire page layout to collapse or “bleed” text into unintended areas.
Accessibility (A11y) Failure: Screen readers rely on a clean DOM. When the translation engine injects nonsensical tags or breaks semantic markers, the page becomes unusable for visually impaired users.
Broken Interactivity: If translation breaks the structure of buttons or input fields, the event listeners attached to those elements may fail to trigger, rendering the application non-functional.

Example or Code

The following demonstrates how the engine misinterprets the relationship between text nodes and inline tags:

Metadata (~100 tokens): The name and description fields...

元数据（约 100 个标记）：所有技能的 and

How Senior Engineers Fix It

A senior engineer looks beyond the “bug” and focuses on robustness and defensive design:

Sanitization Layers: Implement a post-translation DOM sanitizer that validates the balance of tags and strips any “hallucinated” HTML entities before rendering to the user.
Structural Anchoring: Instead of translating raw chunks, use a strategy where non-translatable nodes (like <code>) are explicitly flagged as “protected” or “immutable” to the translation pipeline.
Graceful Degradation: If the engine detects a high degree of structural entropy (too many mismatched tags), the system should fallback to showing the original language rather than a broken UI.
Semantic Batching: Improve the segmentation logic to ensure that a single semantic unit (a sentence or a phrase) is never split by an inline HTML tag.

Why Juniors Miss It

Focus on Content, Not Structure: Juniors often assume the translation is “correct” if the words make sense, ignoring the fact that the underlying DOM structure is compromised.
Surface-Level Debugging: They might try to fix the CSS to hide the mess, rather than identifying that the source of truth (the HTML) has been corrupted.
Ignoring Edge Cases: They test with simple strings like “Hello World” and fail to account for the interplay between markup and linguistics found in complex, real-world applications.