TBB interaction in python binding

Summary

This incident involved a double‑free crash triggered when importing a Python module backed by a C++ library that embeds oneTBB, but only when NumPy is imported first. The failure was caused by two independent TBB runtimes being loaded into the same Python process, each attempting to manage and free overlapping internal resources.

Root Cause

The underlying issue was the simultaneous presence of multiple TBB versions in a single Python interpreter:

NumPy (or a dependency it loads) brings its own TBB runtime.
The custom C++ extension module bundles another TBB shared library.
Both libraries register global allocators, task schedulers, and thread pools.
When Python unloads modules or finalizes the interpreter, both runtimes attempt to free the same internal structures, causing a double free.

The crash only appears when NumPy is imported first because:

NumPy’s TBB initializes global state early.
The custom module loads a second TBB later, creating two active runtimes.
Shutdown order becomes undefined, leading to memory corruption.

Why This Happens in Real Systems

This is a classic example of ABI‑incompatible runtime duplication:

Python processes often load many native extensions.
Each extension may bundle its own version of a shared library.
Libraries like TBB, OpenMP, and MKL maintain global state, making them extremely sensitive to duplication.
Linux dynamic linking rules allow multiple versions of the same .so to coexist if paths differ.
When both versions try to manage threads or memory, undefined behavior is inevitable.

Real-World Impact

Systems that accidentally load multiple TBB runtimes can experience:

Double free / corruption crashes
Deadlocks in thread pool initialization
Performance degradation due to duplicated thread pools
Non‑deterministic behavior depending on import order
Silent data corruption in worst cases

Example or Code (if necessary and relevant)

Below is a minimal example of how two TBB runtimes can be loaded unintentionally:

#include 

extern "C" void run_task() {
    tbb::parallel_for(0, 1000, [](int){});
}

If this module is compiled and shipped with its own libtbb.so, but NumPy loads another version, both will coexist in the same Python process.

How Senior Engineers Fix It

Experienced engineers avoid bundling global‑state libraries like TBB directly inside Python extensions. Common solutions include:

Do not ship your own TBB; instead depend on the system or Python‑level TBB.
Use dynamic linking and rely on the system loader to reuse NumPy’s TBB.
Enforce a single TBB version via:
- LD_PRELOAD (as a temporary workaround)
- rpath or runpath adjustments
- Conda or pip packaging constraints
Switch to TBB’s “header‑only” mode when possible (newer oneTBB versions support this).
Use symbol versioning to avoid collisions (advanced and fragile).
Document import‑order constraints only as a last resort.

The most robust fix is:
Ensure that only one TBB shared library is ever loaded into the Python process.

Why Juniors Miss It

Less experienced engineers often overlook this because:

They assume dynamic libraries are “isolated” per module.
They don’t realize that Python is a single process, so all native extensions share the same address space.
They underestimate how libraries like TBB rely on global state and global allocators.
They trust that bundling dependencies “just works” without considering ABI compatibility.
They rarely inspect ldd, LD_DEBUG=libs, or symbol tables to detect duplicate runtimes.

The failure mode is subtle, order‑dependent, and only appears when another extension loads the same library first, making it easy to miss during development.