OpenXLSX access of cell values is too slow

Summary

The OpenXLSX library is being used to read and manipulate xlsx files, but the execution time is slow, taking 33.3 seconds to save a file as csv, compared to 4 seconds using VB scripts. The bottleneck lies in the XLCell c=wks.cell(i, 1) line, which consumes 30.8 out of the 33.3 seconds.

Root Cause

The root cause of the slow execution time is due to the following reasons:

  • Inefficient cell access: The wks.cell(i, 1) function is called repeatedly, resulting in slow performance.
  • Lack of direct access to internal structure: The documentation for direct access to the internal structure is limited, making it difficult to optimize the code.
  • Excessive use of XLCell objects: Creating multiple XLCell objects in the loop can lead to performance issues.

Why This Happens in Real Systems

This issue occurs in real systems due to:

  • Large datasets: Working with large xlsx files can exacerbate the performance issue.
  • Frequent file operations: Reading and writing files repeatedly can lead to slow execution times.
  • Inadequate optimization: Failure to optimize the code for performance can result in slow execution times.

Real-World Impact

The slow execution time can have significant real-world impacts, including:

  • Increased processing time: Slow execution times can lead to increased processing times, affecting overall productivity.
  • Decreased user experience: Slow performance can result in a poor user experience, leading to frustration and decreased satisfaction.
  • Inefficient resource utilization: Inefficient code can lead to wasted resources, including CPU, memory, and disk space.

Example or Code

// Example of optimized code using OpenXLSX
#include "OpenXLSX.hpp"
using namespace OpenXLSX;

int main() {
    // Open the xlsx file
    XLDocument doc;
    doc.open("mytest1.xlsx");
    XLWorksheet wks = doc.workbook().worksheet("Sheet1");

    // Get the row and column counts
    int nrows = wks.rowCount();
    int ncols = wks.columnCount();

    // Open the output csv file
    FILE *f = fopen("mytest.csv", "w");

    // Iterate over the rows and columns
    for (int i = 1; i <= nrows; i++) {
        for (int j = 1; j <= ncols; j++) {
            // Use a single XLCell object to access the cell value
            XLCell c = wks.cell(i, j);
            fprintf(f, "%s", c.getString().c_str());
            if (j < ncols) {
                fprintf(f, ";");
            }
        }
        fprintf(f, "\n");
    }

    // Close the output file
    fclose(f);

    return 0;
}

How Senior Engineers Fix It

Senior engineers can fix this issue by:

  • Optimizing cell access: Using a single XLCell object to access the cell value, rather than creating multiple objects.
  • Using direct access to internal structure: Utilizing the internal structure of the OpenXLSX library to access cell values directly.
  • Minimizing file operations: Reducing the number of file operations to improve performance.

Why Juniors Miss It

Junior engineers may miss this issue due to:

  • Lack of experience: Limited experience with optimizing code for performance.
  • Insufficient knowledge: Inadequate understanding of the OpenXLSX library and its internal structure.
  • Inattention to detail: Failure to notice the performance bottleneck in the code.

Leave a Comment