Why do both AVX2 intrinsics use the same instruction

## Summary
In AVX2 instruction sets, intrinsics `_mm256_bslli_epi128` and `_mm256_slli_si256` both compile to the identical `vpslldq` instruction despite their differing names. This occurs because Intel maintains **backward-compatibility aliases** alongside updated naming conventions for clarity. No functional difference exists between these intrinsics; the duplication is purely syntactic.

## Root Cause
*   **Legacy naming conventions**: Earlier SSE/AVX intrinsics used inconsistent naming (e.g., `_mm_slli_si128`).  
*   **Self-documenting aliases**: Intel later introduced `_mm_bslli_si128` to clarify the operation ("Bytes Shift Left Logical").  
*   **Backward compatibility**: Rather than deprecating old names at a mass scale, Intel **retains aliases** for existing codebases.  
*   **AVX256 extension**: When expanding to 256-bit registers, both forms (`slli` and `bslli`) were implemented using the same underlying instruction.

## Why This Happens in Real Systems
*   **Long-lived ISA evolution**: Processor instruction sets evolve over decades, requiring transitional mechanisms.  
*   **Codebase inertia**: Breaking changes to intrinsic naming would break vast amounts of legacy code.  
*   **Ambiguity vs. clarity**: Overloaded meanings in older names (`slli` implied "shift left", not byte granularity) prompted cleaner names as aliases.  
*   **Compiler simplification**: Mapping aliases to one instruction simplifies compiler intrinsics handling.

## Real-World Impact
*   **Codebase confusion**: Developers might wrongly assume performance differences or functional divergence.  
*   **Readability tradeoffs**: Legacy code contains obscure names (`slli_si256`); newer code may prefer explicit names (`bslli_epi128`).  
*   **Documentation overhead**: Engineers must consult intrinsics guides to verify equivalence, slowing down development.  
*   **Minimal performance impact**: Since both compile identically, **no runtime penalty** exists for using either intrinsic.

## Example or Code
```cpp
#include 

__m256i avx_shift_left(__m256i a) {
    __m256i res1 = _mm256_bslli_epi128(a, 1);  // "Byte Shift Left Logical Imm."
    __m256i res2 = _mm256_slli_si256(a, 1);     // Legacy naming
    return _mm256_or_si256(res1, res2);          // Compiler optimizes to vpslldq
}

How Senior Engineers Fix It

  • Standardize on explicit names: Advocate for using _mm256_bslli_epi128 across codebases for clarity.
  • Compiler inspection: Use disassembly outputs (gcc -S, Godbolt) to verify intrinsic→instruction mappings.
  • Document aliases: Add comments clarifying equivalences where reliable vendor naming exists.
  • Avoid redundant benchmarking: Never waste time comparing performance of such intrinsics—SPOILER THEY’RE IDENTICAL.
  • L their historical context: Understand ISA evolution to anticipate similar patterns.

Why Juniors Miss It

  • Over-reliance on naming: Assumes descriptive intrinsic name implies unique behavior.
  • Compilers shield details: Abstract LLVM/GCC/ICC may not expose instruction mappings visibly.
  • Guides seem contradictory: Finding two intrinsics for “same” function raises confusion about docs accuracy.
  • Undocumented assumptions: May suspect hidden alignment or optimization differences between names.
  • Shallow ISA understanding: Lack awareness of naming evolution pitfalls from SSE→AVX2 transitions.