Severe RAM leak when running OpenVINO inference on Raspberry Pi 5 (ARM) – even with infer() only

Summary

The issue at hand is a severe RAM leak when running OpenVINO inference on a Raspberry Pi 5 (ARM) device. This leak occurs even when the infer() function is called repeatedly on a static input, without any image preprocessing or camera frame usage. The problem appears to be related to OpenVINO not releasing memory properly on ARM devices.

Root Cause

The root cause of the issue is likely due to:

  • Memory management issues in OpenVINO on ARM devices
  • Inadequate memory reuse mechanisms in the OpenVINO runtime
  • Incompatibility between OpenVINO and the Raspberry Pi 5 (ARM64) architecture

Why This Happens in Real Systems

This issue can occur in real-world systems due to:

  • Insufficient memory available on devices like the Raspberry Pi 5
  • Inefficient memory allocation and deallocation mechanisms in OpenVINO
  • Lack of optimization for ARM devices in the OpenVINO runtime

Real-World Impact

The impact of this issue can be significant, leading to:

  • System crashes due to excessive memory usage
  • Performance degradation as the system struggles to allocate and deallocate memory
  • Increased power consumption as the system works to manage memory

Example or Code

import sys
import os
import glob
import time
import argparse
import gc
import psutil
import cv2
import numpy as np
import openvino.runtime as ov

# --- CONFIGURATION ---
MODEL_DIR = "yolo11n_openvino_model"
CONF_THRESHOLD = 0.50
INPUT_W, INPUT_H = 640, 640  # Model Input Dimensions
CAM_W, CAM_H = 640, 480  # Camera Dimensions

def get_rss_mb():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024

class YoloZeroAlloc:
    def __init__(self, model_dir):
        self.core = ov.Core()
        # Load Model
        xml_files = glob.glob(os.path.join(model_dir, "*.xml"))
        if not xml_files:
            raise FileNotFoundError(f"No .xml in {model_dir}")
        print(f"Loading: {xml_files[0]}")
        model = self.core.read_model(xml_files[0])
        # Force Static Shape [1, 3, 640, 640]
        print(f"Forcing Shape: [1, 3, {INPUT_H}, {INPUT_W}]")
        model.reshape([1, 3, INPUT_H, INPUT_W])
        self.compiled_model = self.core.compile_model(model, "CPU")
        self.infer_request = self.compiled_model.create_infer_request()
        # --- MEMORY POOLS (The Fix) ---
        # 1. Input Tensor (Float32, NCHW)
        self.input_tensor = self.infer_request.get_input_tensor()
        self.input_data_buffer = self.input_tensor.data
        # 2. Resize Buffer (Uint8, HWC)
        # We calculate the target size once based on aspect ratio
        scale = min(INPUT_W / CAM_W, INPUT_H / CAM_H)
        self.new_w = int(CAM_W * scale)
        self.new_h = int(CAM_H * scale)
        self.resize_buffer = np.zeros((self.new_h, self.new_w, 3), dtype=np.uint8)
        # 3. Canvas Buffer (Uint8, HWC) - Full 640x640
        self.canvas_buffer = np.full((INPUT_H, INPUT_W, 3), 114, dtype=np.uint8)
        # Calculate padding offsets once
        self.dw = (INPUT_W - self.new_w) // 2
        self.dh = (INPUT_H - self.new_h) // 2
        print("Buffers Allocated. Memory Pools Ready.")

    def preprocess_zero_alloc(self, img_rgb):
        """
        Resizes and pads WITHOUT allocating new numpy arrays.
        Uses cv2.resize(dst=...) and in-place assignments.
        """
        # 1. Resize directly into pre-allocated buffer
        # This prevents creating a new 1.2MB array
        cv2.resize(img_rgb, (self.new_w, self.new_h), dst=self.resize_buffer)
        # 2. Reset Canvas (Fill with gray 114)
        # Faster than np.full, we just assign the value
        self.canvas_buffer[:] = 114
        # 3. Copy resized image into canvas
        # Numpy handles this heavily optimized
        self.canvas_buffer[self.dh:self.dh+self.new_h, self.dw:self.dw+self.new_w] = self.resize_buffer
        # 4. Normalize and Transpose directly to Tensor
        # HWC -> CHW happens via transpose view (cheap)
        # np.divide writes result directly to OpenVINO memory (no intermediate float array)
        # Create a temporary view of the canvas for transposing
        # (Views do not allocate data memory)
        canvas_chw = self.canvas_buffer.transpose((2, 0, 1))

How Senior Engineers Fix It

Senior engineers can fix this issue by:

  • Implementing memory pools to reuse memory and reduce allocation and deallocation overhead
  • Optimizing memory allocation and deallocation mechanisms in OpenVINO for ARM devices
  • Using efficient data structures and algorithms to minimize memory usage
  • Profiling and debugging the system to identify and fix memory-related issues

Why Juniors Miss It

Junior engineers may miss this issue due to:

  • Lack of experience with memory management and optimization
  • Insufficient knowledge of OpenVINO and its memory management mechanisms
  • Inadequate testing and debugging of the system to identify memory-related issues
  • Overreliance on high-level abstractions that hide underlying memory management details