wkhtmltopdf repeated table headers overlap content when a single table row spans multiple pages

Summary

This postmortem analyzes a wkhtmltopdf table‑rendering defect where repeated table headers overlap content when a single table row spans multiple pages. Although the HTML renders correctly in browsers, wkhtmltopdf’s legacy layout engine fails to reserve vertical space for the repeated header, causing visual corruption and truncated content.

Root Cause

The underlying issue is a limitation in wkhtmltopdf’s patched Qt WebKit engine, which does not fully support:

True multi‑page table layout
Row fragmentation with header-aware reflow
Dynamic vertical space reservation for repeated <thead> blocks
Large column counts combined with long unbreakable text

When a <tr> breaks across pages, wkhtmltopdf repeats the header but does not adjust the layout, so the header is drawn on top of the continuing row.

Why This Happens in Real Systems

wkhtmltopdf is built on a 2009-era WebKit fork that lacks modern CSS pagination features. As a result:

It treats tables as static blocks, not paginated flow content
It cannot compute partial row height before page breaks
It inserts repeated headers after layout, not during layout
It does not support CSS like break-inside: avoid or page-break-inside: auto reliably
It cannot reflow long text in a way that respects page boundaries

In short, wkhtmltopdf’s rendering model is browser-like, but its pagination model is primitive.

Real-World Impact

Teams relying on wkhtmltopdf for reporting often see:

Overlapping headers and data, making PDFs unreadable
Truncated or hidden content in long cells
Broken compliance reports where tables must be accurate
Customer-facing PDF corruption in enterprise systems
Inconsistent output depending on text length and column count

These failures are especially common in landscape reports, audit logs, and wide tabular exports.

Example or Code (if necessary and relevant)

Below is a minimal CSS workaround sometimes used to reduce (not eliminate) the issue by discouraging row splitting:

tr, td {
  page-break-inside: avoid;
}

This forces wkhtmltopdf to keep each row on a single page, preventing header overlap at the cost of larger page gaps and less efficient pagination.

How Senior Engineers Fix It

Experienced engineers recognize that wkhtmltopdf’s layout engine cannot be “patched” with CSS alone. They typically choose one of these strategies:

1. Prevent Row Splitting Entirely

Apply page-break-inside: avoid to <tr>
Reduce column count
Shorten or truncate long text
Pre-wrap long text server-side

2. Switch to a Modern HTML-to-PDF Engine

Senior engineers often migrate to engines with true pagination support, such as:

Paged.js
WeasyPrint
PrinceXML
Headless Chrome (Puppeteer)

These engines support:

Proper multi-page table layout
Correct header/footer repetition
CSS fragmentation rules
Modern HTML/CSS features

3. Split Rows Programmatically

When migration is not possible:

Detect long cell content
Split it into multiple rows server-side
Ensure each row fits on a page
Preserve visual continuity with styling

4. Avoid Complex Tables in PDF

Some teams redesign reports:

Convert wide tables into stacked card layouts
Use multi-page sections instead of one giant table
Export raw data separately (CSV/Excel)

Why Juniors Miss It

Junior engineers often assume:

wkhtmltopdf behaves like a modern browser
<thead> repetition is fully supported
CSS pagination rules work consistently
Long text will wrap safely
Table rendering is deterministic across engines

They typically do not realize that wkhtmltopdf’s WebKit engine is frozen in time, lacks modern pagination logic, and cannot be fixed with CSS alone.

Senior engineers know that this is not a bug you “fix” — it’s a limitation you architect around.