Filtering GADT Vectors: Runtime Predicates & Type Safety

Summary

The core problem revolves around the incompatibility between value-dependent logic and type-level invariants. Specifically, when using Generalized Algebraic Data Types (GADTs) like a length-indexed Vec, the type system requires the length n to be known at compile time. However, the filter operation is inherently non-deterministic at the type level because the number of elements passing a predicate depends entirely on runtime values. This creates a type mismatch where a function signature promises a specific length m, but the implementation cannot prove what m will be without violating the core principles of static type safety.

Root Cause

The failure occurs due to a fundamental conflict between structural induction and dynamic predicates:

Type-Level Erasure: In a Vec a n, the n is part of the type. For a function to be total and type-safe, it must return a type where n is strictly defined.
Value-Type Gap: The predicate (a -> Bool) operates on values. The result of this predicate is a runtime boolean. The type system cannot “lift” this runtime boolean into a type-level Nat to adjust the m in Vec a m.
Constraint Violation: If you attempt to return a Vec a m, the compiler demands a proof that m is the exact count of elements satisfying the predicate. Since the compiler cannot execute your code to see which elements pass, it cannot satisfy the proof requirement.

Why This Happens in Real Systems

In high-assurance software engineering, this is a manifestation of the Decidability Problem.

Static vs. Dynamic Boundaries: Systems are often designed with rigid type invariants to prevent buffer overflows or index-out-of-bounds errors. When a dynamic process (like user input or a network stream) enters these rigid structures, the “knowledge” required to maintain the invariant is lost.
Over-specification: Engineers often design types that are “too precise.” While a length-indexed list is great for matrix multiplication, applying it to highly dynamic data structures introduces type-level friction that requires complex existential packaging to resolve.

Real-World Impact

Development Velocity: Teams spend excessive time fighting the borrow checker or the type system rather than delivering features.
Complexity Overhead: Implementing “workarounds” (like using ExistentialQuantification) increases the cognitive load for every developer touching the codebase.
Runtime Failure Risk: If engineers bypass the type system (e.g., using unsafeCoerce) to solve the length mismatch, they re-introduce the very memory safety vulnerabilities the GADT was intended to prevent.

Example or Code

{-# LANGUAGE GADTs, DataKinds, KindSignatures, TypeOperators #-}

data Nat = Zero | Succ Nat

data Vec a (n :: Nat) where
    Nil  :: Vec a 'Zero
    Cons :: a -> Vec a n -> Vec a ('Succ n)

data SomeVec a where
    SomeVec :: Vec a n -> SomeVec a

filterVec :: (a -> Bool) -> Vec a n -> SomeVec a
filterVec _ Nil = SomeVec Nil
filterVec p (Cons x xs)
    | p x       = case filterVec p xs of
                    SomeVec rest -> SomeVec (Cons x rest)
    | otherwise = case filterVec p xs of
                    SomeVec rest -> SomeVec rest

How Senior Engineers Fix It

A senior engineer recognizes that you cannot force a dynamic result into a static shape without Existential Types or Dependent Types.

Existential Wrapping: Instead of trying to return Vec a m (where m is unknown), wrap the result in a container that says: “There exists some m such that this is a Vec a m.” This is the SomeVec pattern shown above.
Sigma Types: In more advanced settings, use a dependent pair (Sigma type) which bundles the value with a proof of its length.
Type-Level Evidence: If the length can be predicted (e.g., filtering for elements with a specific property that is also encoded in the type), use Type Classes to provide evidence of the new length.
Down-leveling: If the strictness of Vec is not required for the subsequent logic, convert the Vec to a standard [] (List), perform the operation, and only re-encode into a Vec when the length is once again deterministic.

Why Juniors Miss It

The “Annotation Fix” Trap: Juniors often attempt to solve the problem by adding type annotations (as noted in the input), which only works if they manually hard-code the expected length. This breaks the generality of the function.
Ignoring the Type Error: Instead of understanding why the compiler is complaining about the mismatch between m and the actual length, they may attempt to use “unsafe” escapes to make the error disappear.
Lack of Abstraction Awareness: They view types as “labels” rather than mathematical proofs. They fail to realize that a type-level n is a contract that must be formally satisfied by every possible execution path.