## Summary
In AVX2 instruction sets, intrinsics `_mm256_bslli_epi128` and `_mm256_slli_si256` both compile to the identical `vpslldq` instruction despite their differing names. This occurs because Intel maintains **backward-compatibility aliases** alongside updated naming conventions for clarity. No functional difference exists between these intrinsics; the duplication is purely syntactic.
## Root Cause
* **Legacy naming conventions**: Earlier SSE/AVX intrinsics used inconsistent naming (e.g., `_mm_slli_si128`).
* **Self-documenting aliases**: Intel later introduced `_mm_bslli_si128` to clarify the operation ("Bytes Shift Left Logical").
* **Backward compatibility**: Rather than deprecating old names at a mass scale, Intel **retains aliases** for existing codebases.
* **AVX256 extension**: When expanding to 256-bit registers, both forms (`slli` and `bslli`) were implemented using the same underlying instruction.
## Why This Happens in Real Systems
* **Long-lived ISA evolution**: Processor instruction sets evolve over decades, requiring transitional mechanisms.
* **Codebase inertia**: Breaking changes to intrinsic naming would break vast amounts of legacy code.
* **Ambiguity vs. clarity**: Overloaded meanings in older names (`slli` implied "shift left", not byte granularity) prompted cleaner names as aliases.
* **Compiler simplification**: Mapping aliases to one instruction simplifies compiler intrinsics handling.
## Real-World Impact
* **Codebase confusion**: Developers might wrongly assume performance differences or functional divergence.
* **Readability tradeoffs**: Legacy code contains obscure names (`slli_si256`); newer code may prefer explicit names (`bslli_epi128`).
* **Documentation overhead**: Engineers must consult intrinsics guides to verify equivalence, slowing down development.
* **Minimal performance impact**: Since both compile identically, **no runtime penalty** exists for using either intrinsic.
## Example or Code
```cpp
#include
__m256i avx_shift_left(__m256i a) {
__m256i res1 = _mm256_bslli_epi128(a, 1); // "Byte Shift Left Logical Imm."
__m256i res2 = _mm256_slli_si256(a, 1); // Legacy naming
return _mm256_or_si256(res1, res2); // Compiler optimizes to vpslldq
}
How Senior Engineers Fix It
- Standardize on explicit names: Advocate for using
_mm256_bslli_epi128 across codebases for clarity.
- Compiler inspection: Use disassembly outputs (
gcc -S, Godbolt) to verify intrinsic→instruction mappings.
- Document aliases: Add comments clarifying equivalences where reliable vendor naming exists.
- Avoid redundant benchmarking: Never waste time comparing performance of such intrinsics—SPOILER THEY’RE IDENTICAL.
- L their historical context: Understand ISA evolution to anticipate similar patterns.
Why Juniors Miss It
- Over-reliance on naming: Assumes descriptive intrinsic name implies unique behavior.
- Compilers shield details: Abstract LLVM/GCC/ICC may not expose instruction mappings visibly.
- Guides seem contradictory: Finding two intrinsics for “same” function raises confusion about docs accuracy.
- Undocumented assumptions: May suspect hidden alignment or optimization differences between names.
- Shallow ISA understanding: Lack awareness of naming evolution pitfalls from SSE→AVX2 transitions.