Summary
A critical regression was identified following an upgrade of the pam_radius_auth module from version 1.4.0 to 3.0.0. In the previous version, the One-Time Password (OTP) entered by the user was consumed exclusively by the RADIUS module. In version 3.0.0, the module incorrectly re-injects the OTP value into the PAM conversation buffer as the user’s password. This causes subsequent modules in the stack, specifically pam_unix.so, to attempt validation using the OTP instead of the actual UNIX password, leading to systematic authentication failures across all upgraded systems.
Root Cause
The root cause is a change in the handling of the PAM conversation function within the pam_radius_auth 3.0.0 source code.
- Credential Persistence: In PAM, modules interact with the user via a “conversation function.” If a module successfully authenticates a user, it may attempt to provide the “password” to the next module to streamline the user experience.
- Logic Regression: In version 3.0.0, the module was modified to return the successfully verified OTP as the password to the PAM stack.
- Module Interdependence: Because
pam_unix.sofollowspam_radius_auth.soin the stack, it receives the OTP value as thePAM_AUTHTOKbuffer. Since the OTP does not match the user’s actual password stored in/etc/shadow,pam_unixreturns an authentication failure.
Why This Happens in Real Systems
This behavior occurs due to the complex, stateful nature of PAM stacks and the way different modules interpret the conversation buffer.
- Implicit Trust: PAM modules are often designed with the assumption that if a “password” is provided in the buffer, it is the definitive credential for that session.
- Abstraction Leaks: While PAM is meant to abstract authentication, a module that “helps” by filling in credentials can inadvertently break the security assumptions of downstream modules.
- Upgrade Side-Effects: Upgrades often introduce “quality of life” features (like single-factor entry) that, when applied to multi-factor flows, create logic collisions.
Real-World Impact
- Service Outage: Users are completely locked out of SSH access across the infrastructure.
- Broken MFA Workflows: The security intent of Multi-Factor Authentication is bypassed/broken because the second factor (OTP) is being treated as the first factor (Password).
- Operational Overhead: Massive manual intervention is required to rollback packages or modify PAM configurations across distributed fleets.
- Security Risk: While this specific case results in a “fail-closed” scenario (denial of service), similar logic errors in other modules could lead to “fail-open” vulnerabilities.
Example or Code
The failure manifests in the way the PAM stack processes the PAM_AUTHTOK during the auth phase.
/* Conceptual representation of the failure in pam_radius_auth 3.0.0 */
// 1. User enters OTP: "123456"
// 2. pam_radius_auth verifies "123456" via RADIUS server -> SUCCESS
// 3. BUG: The module tells PAM: "The password is 123456"
pam_set_item(pamh, PAM_AUTHTOK, "123456");
// 4. Next module (pam_unix) receives the stack:
// pam_unix calls check_password("123456")
// Result: FAILURE (123456 != actual_user_password)
How Senior Engineers Fix It
Senior engineers approach this by addressing the architecture of the PAM stack rather than just trying to patch the broken module immediately.
- Immediate Mitigation: Perform a version rollback to 1.4.0 to restore service stability.
- Stack Reordering/Isolation: If the module must be used, engineers investigate if
pam_unix.socan be decoupled from the OTP buffer. However, in standard SSH flows, this is difficult without custom modules. - Upstream Contribution: The definitive fix is to submit a patch to the
pam_radius_authmaintainers to ensure that the module does not modify thePAM_AUTHTOKitem unless explicitly configured to do so. - Configuration Hardening: Implementing
nullokor specificpam_authenticateflags to ensure that subsequent modules do not attempt to re-validate credentials that have already been “consumed.”
Why Juniors Miss It
- Focusing on the wrong layer: Juniors often spend hours debugging the RADIUS server or the network connectivity, assuming the “authentication failure” means the RADIUS secret or IP is wrong.
- Ignoring the Stack Order: They treat PAM modules as independent units rather than a sequential chain where the output of one is the input of the next.
- Misinterpreting Logs: A junior sees
pam_unix(sshd:auth): authentication failureand assumes the user entered the wrong password, failing to realize that the system itself provided the wrong password to the module. - Lack of Version Diffing: They assume an upgrade is a “safe” additive process and fail to perform a diff of the behavior between version 1.4.0 and 3.0.0.