Advices for parsing OCLs

Summary

In this postmortem, we analyze the risks of exposing OCL (Object Constraint Language) expressions directly to a web API, as suggested in the query. The core issue is that unvalidated OCL can lead to security vulnerabilities like unauthorized data access and performance issues such as denial-of-service (DoS) via expensive queries. MDriven does provide hooks for validation via its OCL parser and AST, allowing you to introspect expressions without building a custom parser. Key takeaway: Never trust user-input queries without server-side validation and throttling.

Root Cause

The root cause stems from exposing a query language without adequate safeguards:

Unrestricted Query Execution: OCL expressions, when passed to a backend, can traverse object graphs indefinitely, accessing restricted data if authorization checks are missing.
AST Unavailability: Without parser hooks, validating syntax or semantics requires reimplementing parsing logic, increasing error rates and maintenance overhead.
Lack of Isolation: Direct execution ignores runtime constraints like query timeouts or resource limits, turning valid queries into resource hogs.

Why This Happens in Real Systems

In production systems, similar issues arise from:

Overly Permissive APIs: Developers often prioritize flexibility (e.g., dynamic queries for advanced users) over security, assuming input sanitization is sufficient.
Underestimated Complexity: OCL’s expressiveness allows recursive or deep-nested queries that weren’t anticipated during design.
Edge Cases in Parsing: Real-world OCL inputs might include malformed syntax or subtle variations that bypass naive validators, leading to runtime failures or exploits.
Resource Contention: Shared databases amplify the impact; one bad query can starve connections for others, especially in microservices or SaaS environments.

Real-World Impact

Data Breaches: Attackers could query sensitive attributes (e.g., user PII) beyond their role, violating compliance like GDPR.
Performance Degradation: Queries with O(n^2) complexity on large datasets could tie up CPU/memory for seconds or minutes, causing cascading failures.
Availability Loss: Repeated expensive queries mimic DoS, leading to 500 errors or SLA violations, eroding user trust and incurring financial penalties from downtime.
Debugging Overhead: Incidents require tracing queries to source, often amid logs flooded with partial OCL snippets.

Example or Code

If validation is needed, MDriven’s OCL parser can be accessed via the OclParser class in the Eco framework. Here’s a snippet to parse and validate an expression without executing it:

using Eco.Ocl;
using Eco.Handles;

// Assume inputOcl is the user-provided string, e.g., "Person.allInstances->select(age > 18)"
var result = OclParser.Parse(inputOcl);

if (result.HasErrors)
{
    // Handle syntax errors
    foreach (var error in result.Errors)
    {
        Console.WriteLine($"Syntax Error: {error.Message}");
    }
}
else
{
    // Inspect AST for semantic issues (e.g., forbidden attributes)
    var ast = result.AST;
    WalkAst(ast); // Custom walker to check for restricted access
}

In a custom WalkAst method, you’d traverse the AST nodes to flag patterns like allInstances on restricted classes or expensive operations like collect on large collections. This runs in-process without full execution.

How Senior Engineers Fix It

Senior engineers implement a defense-in-depth strategy:

Validate at Entry: Use MDriven’s parser hooks to extract and whitelist allowed classes/attributes from the AST before execution.
Enforce Limits: Apply timeouts (e.g., 5s max) and resource quotas via database query planners or custom wrappers.
Authorize Granularly: Post-parse, map the AST to a capability-based model where only pre-approved OCL fragments are executable.
Monitor and Log: Instrument the pipeline with metrics on query complexity (e.g., node count) to detect anomalies early.
Fallback to Safe Subsets: If full OCL is too risky, restrict to a DSL subset, parsing via MDriven to ensure compatibility.

This ensures zero-trust querying, where validation is cheap and execution is isolated.

Why Juniors Miss It

Junior developers often:

Underestimate Risks: They view OCL as “safer than SQL” due to MDriven’s constraints, overlooking that any query language can be abused if not bounded.
Skip Validation Steps: Focus on core functionality (e.g., “make the query work”) without considering edge cases like long-running scans or unauthorized paths.
Lack AST Knowledge: Unfamiliar with parser APIs, they might write string-based checks (e.g., regex) which are brittle and miss semantic issues.
Overlook Performance: Testing with small datasets hides real-world slowdowns, leading to production surprises.

Training on threat modeling and code reviews for query paths can bridge this gap.