Summary
The issue occurs because inserting rows shifts existing row indices downward, but the code continues to write to the original, calculated indices. In openpyxl, inserting rows at a specific point physically moves existing rows down to make space. Data written to the original row numbers after insertion overwrites what used to be there or lands in the wrong place, effectively “deleting” the template data that was shifted.
Root Cause
There are two primary interacting causes in the provided code:
- Index Misalignment: The code inserts rows (
ws.insert_rows) but then calculatestarget_rowbased on the originalTABLE_START_ROW. It fails to account for the fact that the existing table header row (and any other rows above the insertion point) has been pushed down. - Template Destruction via Row Copying: The
_copy_rowfunction is likely copying the original header row (which was pushed down) back to the newtarget_row. If the “source” row is calculated incorrectly (e.g., pointing to the now-empty original header location or a shifted row), it copies empty or incorrect formatting, destroying the template’s structure.
Why This Happens in Real Systems
This is a classic mutable state problem common in spreadsheet automation.
- Imperative Operations: Unlike a declarative SQL insert where you define the dataset,
openpyxlmanipulates the physical grid.insert_rowsis a destructive structural change. - Lack of Atomicity: The script performs read and write operations in a loop. If you read a cell’s location, modify the grid structure, and then try to write to that same “logical” location, the physical storage has changed.
- Template Reliance: Relying on a live file for structure (copying rows) is fragile. If the source row shifts, the copy operation propagates the error instantly.
Real-World Impact
- Data Loss: Information below the table (pricing, client details, static notes) is overwritten with blank values or formula references to incorrect rows.
- Corrupted Formulas: Formulas written at the end of the script (e.g.,
ws[f"I{6+TABLE_DATA_LEN}"]) point to wrong rows becauseTABLE_DATA_LENdoesn’t account for the second batch of inserted rows (container_square_rows). - Broken Merges: While the code attempts to restore merges, the mismatch between row indices and the merge definitions usually leaves merged cells spanning empty space or overwriting data.
Example or Code
The Logic Failure (Conceptual)
# LOGIC FLOW DEMONSTRATION
# 1. Template has data at Row 6 (Header), Row 7 (Data), Row 8 (Footer)
# Row 8 contains "Total: $500"
# 2. Insert 2 rows at Row 7
# Row 6 (Header) -> Stays
# Row 7, 8 (New Empty) -> Created
# Row 9 (Old Row 7) -> Shifted
# Row 10 (Old Row 8 "Total: $500") -> Shifted
# 3. Code calculates target_row
# target_row = 6 + idx
# target_row = 6 (Writing new data to Row 6... overwriting Header)
# target_row = 7 (Writing new data to Row 7... overwriting New Empty)
# 4. Code writes formulas based on lengths
# Formula references assume data is in 6, 7, 8
# Data is actually in 6, 9, 10
# Result: Calculations are 0 or error.
The Fix: Shift Accumulation
def _fill_oferta_cliente(self, ws, table_data, ...):
# ... setup ...
# TRACK INSERTIONS
total_inserted_rows = 0
# 1. Insert Table Rows
if INSERTED_ROWS > 0:
ws.insert_rows(TABLE_START_ROW + 1, INSERTED_ROWS)
total_inserted_rows += INSERTED_ROWS
# 2. Insert Container Rows
if container_square_rows > 0:
# Calculate physical insertion point relative to previous insertions
insert_point = SQUARE_SECTION_START_ROW - 1 + total_inserted_rows
ws.insert_rows(insert_point, container_square_rows - 1)
total_inserted_rows += (container_square_rows - 1)
# 3. Write Data (with Shift Adjustment)
for idx, (product, link) in enumerate(table_data):
# Calculate new physical row
physical_row = TABLE_START_ROW + idx + total_inserted_rows
# Copy from the correct source (Header Row is now shifted down)
if idx > 0:
# Source must be the original header, which is now at TABLE_START_ROW + total_inserted_rows
copy_source = TABLE_START_ROW + total_inserted_rows
self._copy_row(ws, copy_source, physical_row, ...)
self._write_row(ws, physical_row, TABLE_START_COL, values)
# 4. Write Formulas with Total Shift
# Use the tracked total_shift to adjust all final row references
final_row_base = 6 + total_inserted_rows
ws[f"I{final_row_base - 1}"] = f"=SUM(I6:I{final_row_base - 2})"
How Senior Engineers Fix It
- Calculate Total Shift First: Before writing a single cell, calculate exactly how many rows will be inserted and where. Accumulate this offset.
- Separate Structure from Data:
- Structure Phase: Perform all
insert_rows,delete_rows, andmergeoperations first. Use a dummy DataFrame or list to calculate the final layout if dynamic. - Data Phase: Write data to the newly established coordinates.
- Structure Phase: Perform all
- Defensive Indexing: Never assume
row + 1is the next logical row. Always usebase_row + data_index + accumulated_inserts. - Use Pandas for Manipulation (Alternative): A robust alternative is to read the template data into a Pandas DataFrame, manipulate the data structure (concat), and write back, rather than manipulating the Excel grid cell-by-cell. This handles row shifting automatically.
- Template Preservation: Instead of copying rows from the shifted template (which is risky), define the “style source” rows explicitly and apply them to new data rows.
Why Juniors Miss It
- Visual vs. Physical: They see an Excel sheet as a visual grid where “Row 1” is always the top. They don’t immediately grasp that
openpyxltreats this as a linked list where Row 1 moves if you insert above it. - Linear Thinking: Code is written line-by-line. “Insert Row -> Write Data” feels right, but they miss the state change that happens in between (the destination row index changed).
- Debugging Blindness: When they print
ws.cell(row=6, value="X")and see “X” in Row 6 in Excel, they don’t realize that “X” is overwriting the template header, not filling the new row. - Over-reliance on
_copy_row: They assume copying a row preserves the structure. If the source row is the row that just got pushed down (and is now empty because they wrote to it earlier), they copy empty cells to the new location.