Office-JS Word Add-in extremely slow when inserting 90k+ soft hyphens (>300 page document)

Summary

The Office-JS Word add-in is experiencing extreme performance issues when inserting 90,000+ soft hyphens into a large document (~300-350 pages). The current implementation, which uses insertText and getOoxml/insertOoxml, is slow and inefficient, taking around 60-100 minutes to complete. The goal is to achieve this task in under a few minutes or seconds without breaking tables or formatting.

Root Cause

The root cause of the issue is that every Office-JS API that mutates ranges triggers Word’s layout engine per change, resulting in exponential performance degradation. This is due to the reflow that occurs after each insertion, which recalculates the document layout.

Why This Happens in Real Systems

This issue occurs in real systems because:

  • Large-scale text mutations are common in document processing and editing applications
  • Word’s layout engine is designed to provide a responsive user experience, but it can become a bottleneck when dealing with large documents and frequent changes
  • Office-JS APIs are wrapped around Word’s COM APIs, which can introduce additional overhead and limit the performance of the add-in

Real-World Impact

The impact of this issue is significant:

  • Slow performance affects the user experience and productivity
  • Long processing times can lead to user frustration and abandonment
  • Inability to handle large documents limits the applicability and usefulness of the add-in

Example or Code

const words = paragraphRange.getRange().split(DELIMITERS, true, true);
words.load("items/text, items/hyperlink");
await context.sync();
// build word → ranges map
// update hyperlinks
// replace ranges using insertText(...)
await context.sync();

This code snippet illustrates the current implementation, which is slow and inefficient.

How Senior Engineers Fix It

To fix this issue, senior engineers would:

  • Investigate alternative APIs that can perform large-scale text mutations without triggering Word’s layout engine
  • Optimize the insertion process using techniques such as batching, caching, and parallel processing
  • Use Word’s built-in features to minimize the number of reflows and layout calculations
  • Profile and benchmark the add-in to identify performance bottlenecks and areas for improvement

Why Juniors Miss It

Juniors may miss this issue because:

  • Lack of experience with large-scale document processing and editing applications
  • Insufficient understanding of Word’s layout engine and its impact on performance
  • Overreliance on Office-JS APIs without considering the underlying COM APIs and their limitations
  • Inadequate testing and benchmarking to identify performance issues early on