Openpyxl deletes all cells values and styles that are below my inserted values

Summary

The issue occurs because inserting rows shifts existing row indices downward, but the code continues to write to the original, calculated indices. In openpyxl, inserting rows at a specific point physically moves existing rows down to make space. Data written to the original row numbers after insertion overwrites what used to be there or lands in the wrong place, effectively “deleting” the template data that was shifted.

Root Cause

There are two primary interacting causes in the provided code:

  1. Index Misalignment: The code inserts rows (ws.insert_rows) but then calculates target_row based on the original TABLE_START_ROW. It fails to account for the fact that the existing table header row (and any other rows above the insertion point) has been pushed down.
  2. Template Destruction via Row Copying: The _copy_row function is likely copying the original header row (which was pushed down) back to the new target_row. If the “source” row is calculated incorrectly (e.g., pointing to the now-empty original header location or a shifted row), it copies empty or incorrect formatting, destroying the template’s structure.

Why This Happens in Real Systems

This is a classic mutable state problem common in spreadsheet automation.

  • Imperative Operations: Unlike a declarative SQL insert where you define the dataset, openpyxl manipulates the physical grid. insert_rows is a destructive structural change.
  • Lack of Atomicity: The script performs read and write operations in a loop. If you read a cell’s location, modify the grid structure, and then try to write to that same “logical” location, the physical storage has changed.
  • Template Reliance: Relying on a live file for structure (copying rows) is fragile. If the source row shifts, the copy operation propagates the error instantly.

Real-World Impact

  • Data Loss: Information below the table (pricing, client details, static notes) is overwritten with blank values or formula references to incorrect rows.
  • Corrupted Formulas: Formulas written at the end of the script (e.g., ws[f"I{6+TABLE_DATA_LEN}"]) point to wrong rows because TABLE_DATA_LEN doesn’t account for the second batch of inserted rows (container_square_rows).
  • Broken Merges: While the code attempts to restore merges, the mismatch between row indices and the merge definitions usually leaves merged cells spanning empty space or overwriting data.

Example or Code

The Logic Failure (Conceptual)

# LOGIC FLOW DEMONSTRATION

# 1. Template has data at Row 6 (Header), Row 7 (Data), Row 8 (Footer)
#    Row 8 contains "Total: $500"

# 2. Insert 2 rows at Row 7
#    Row 6 (Header) -> Stays
#    Row 7, 8 (New Empty) -> Created
#    Row 9 (Old Row 7) -> Shifted
#    Row 10 (Old Row 8 "Total: $500") -> Shifted

# 3. Code calculates target_row
#    target_row = 6 + idx
#    target_row = 6 (Writing new data to Row 6... overwriting Header)
#    target_row = 7 (Writing new data to Row 7... overwriting New Empty)

# 4. Code writes formulas based on lengths
#    Formula references assume data is in 6, 7, 8
#    Data is actually in 6, 9, 10
#    Result: Calculations are 0 or error.

The Fix: Shift Accumulation

def _fill_oferta_cliente(self, ws, table_data, ...):
    # ... setup ...

    # TRACK INSERTIONS
    total_inserted_rows = 0

    # 1. Insert Table Rows
    if INSERTED_ROWS > 0:
        ws.insert_rows(TABLE_START_ROW + 1, INSERTED_ROWS)
        total_inserted_rows += INSERTED_ROWS

    # 2. Insert Container Rows
    if container_square_rows > 0:
        # Calculate physical insertion point relative to previous insertions
        insert_point = SQUARE_SECTION_START_ROW - 1 + total_inserted_rows
        ws.insert_rows(insert_point, container_square_rows - 1)
        total_inserted_rows += (container_square_rows - 1)

    # 3. Write Data (with Shift Adjustment)
    for idx, (product, link) in enumerate(table_data):
        # Calculate new physical row
        physical_row = TABLE_START_ROW + idx + total_inserted_rows

        # Copy from the correct source (Header Row is now shifted down)
        if idx > 0:
            # Source must be the original header, which is now at TABLE_START_ROW + total_inserted_rows
            copy_source = TABLE_START_ROW + total_inserted_rows
            self._copy_row(ws, copy_source, physical_row, ...)

        self._write_row(ws, physical_row, TABLE_START_COL, values)

    # 4. Write Formulas with Total Shift
    # Use the tracked total_shift to adjust all final row references
    final_row_base = 6 + total_inserted_rows
    ws[f"I{final_row_base - 1}"] = f"=SUM(I6:I{final_row_base - 2})"

How Senior Engineers Fix It

  1. Calculate Total Shift First: Before writing a single cell, calculate exactly how many rows will be inserted and where. Accumulate this offset.
  2. Separate Structure from Data:
    • Structure Phase: Perform all insert_rows, delete_rows, and merge operations first. Use a dummy DataFrame or list to calculate the final layout if dynamic.
    • Data Phase: Write data to the newly established coordinates.
  3. Defensive Indexing: Never assume row + 1 is the next logical row. Always use base_row + data_index + accumulated_inserts.
  4. Use Pandas for Manipulation (Alternative): A robust alternative is to read the template data into a Pandas DataFrame, manipulate the data structure (concat), and write back, rather than manipulating the Excel grid cell-by-cell. This handles row shifting automatically.
  5. Template Preservation: Instead of copying rows from the shifted template (which is risky), define the “style source” rows explicitly and apply them to new data rows.

Why Juniors Miss It

  • Visual vs. Physical: They see an Excel sheet as a visual grid where “Row 1” is always the top. They don’t immediately grasp that openpyxl treats this as a linked list where Row 1 moves if you insert above it.
  • Linear Thinking: Code is written line-by-line. “Insert Row -> Write Data” feels right, but they miss the state change that happens in between (the destination row index changed).
  • Debugging Blindness: When they print ws.cell(row=6, value="X") and see “X” in Row 6 in Excel, they don’t realize that “X” is overwriting the template header, not filling the new row.
  • Over-reliance on _copy_row: They assume copying a row preserves the structure. If the source row is the row that just got pushed down (and is now empty because they wrote to it earlier), they copy empty cells to the new location.