Avoid crashes with std::filesystem: safe path concatenation in C++

Summary

During a high-throughput data ingestion task, our processing engine encountered a segmentation fault and subsequent resource exhaustion when attempting to resolve file paths. While the developer’s intent was to concatenate a base directory with a list of filenames, the implementation failed to account for path separator consistency and filesystem boundary validation. This resulted in malformed paths that caused the application to crash when passed to low-level system calls.

Root Cause

The failure stemmed from several technical oversights:

  • Manual String Concatenation: The code used simple string addition (basePath + "/" + filename) instead of a dedicated filesystem library.
  • Missing Trailing Slashes: If basePath did not end with a /, and the logic didn’t explicitly check for it, the resulting path became usr/local/binfile1 instead of usr/local/bin/file1.
  • Lack of Normalization: The system failed to resolve . or .. components, leading to unexpected directory traversal.
  • Missing Error Boundaries: There was no validation to check if the basePath actually existed or was a directory before attempting to append filenames.

Why This Happens in Real Systems

In production environments, “simple” path manipulation is rarely simple due to:

  • Platform Heterogeneity: Hardcoding / fails on Windows environments, where \ is the standard.
  • Configuration Drift: A configuration file might provide a path like /etc/config/ (with slash) or /etc/config (without slash), breaking naive string logic.
  • Race Conditions (TOCTOU): Between the time a path is constructed and the time it is opened, the filesystem state might change.
  • Sanitization Failures: Input from external sources (like a user-provided filename) can contain ../ sequences, leading to Path Traversal Vulnerabilities.

Real-World Impact

  • Service Downtime: Crashes in the ingestion worker caused a backup in the message queue, leading to a 2-hour recovery window.
  • Data Corruption: Partial file writes occurred when the system attempted to write to incorrectly resolved paths.
  • Security Risk: The inability to sanitize paths opened the door for unauthorized file access via directory traversal.

Example or Code

#include 
#include 
#include 
#include 

namespace fs = std::filesystem;

std::vector get_valid_paths(const std::string& base_path, const std::vector& filenames) {
    std::vector full_paths;
    fs::path base(base_path);

    if (!fs::exists(base) || !fs::is_directory(base)) {
        throw std::runtime_error("Invalid base directory");
    }

    for (const auto& name : filenames) {
        // The / operator in std::filesystem handles separators automatically
        fs::path full_path = base / name;

        // Canonicalize to resolve . and .. and ensure absolute path
        try {
            full_paths.push_back(fs::weakly_canonical(full_path));
        } catch (const fs::filesystem_error& e) {
            // Log error and continue or handle accordingly
        }
    }

    return full_paths;
}

int main() {
    std::string basePath = "usr/local/bin";
    std::vector files = {"file1", "file2", "file3"};

    try {
        auto result = get_valid_paths(basePath, files);
        for (const auto& p : result) {
            std::cout << p << std::endl;
        }
    } catch (const std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

How Senior Engineers Fix It

  • Abstraction via Standard Libraries: Always use std::filesystem (C++17) or boost::filesystem rather than raw string manipulation.
  • Defensive Programming: Implement checks using fs::exists(), fs::is_directory(), and fs::is_regular_file() before any I/O operation.
  • Path Normalization: Utilize fs::canonical() or fs::weakly_canonical() to resolve symlinks and relative segments, preventing traversal attacks.
  • Boundary Enforcement: Explicitly validate that the resolved path stays within the intended base_path to prevent jailbreaking the directory structure.

Why Juniors Miss It

  • Focus on the “Happy Path”: Juniors often assume the input basePath will always be perfectly formatted.
  • String-Centric Thinking: They treat paths as simple sequences of characters rather than complex, platform-dependent hierarchical structures.
  • Ignoring Edge Cases: The distinction between a file and a directory, or the presence of trailing slashes, is often overlooked in early-stage development.
  • Lack of Security Awareness: The concept that a filename can be a vector for attacking the filesystem (via ../) is a sophisticated concept that requires experience to anticipate.

Leave a Comment