GitHub Pages serves index.html not README.md – fix the issue

Summary

GitHub Pages serves index.html as the homepage, while README.md is primarily a project documentation file visible in the repository view. The core issue stems from conflicting homepage expectations: GitHub Pages automatically prioritizes index.html for human visitors, leaving README.md unexposed unless manually linked. This creates a disconnect between search engine optimization (SEO) goals (wanting README.md indexed) and user experience (wanting HTML content served directly).

Root Cause

  • Automatic homepage resolution: GitHub Pages defaults to index.html as the root document, bypassing README.md entirely for direct site visits.
  • No built-in dual-purpose routing: The platform lacks native support for serving different content types (markdown vs. HTML) to different audiences (humans vs. search engines) from the same URL.
  • Repository ≠ public web content: While README.md is crucial for repository context, GitHub Pages explicitly excludes it from published sites unless manually included in the published branch.

Why This Happens in Real Systems

  • Web server conventions: Web servers globally prioritize index.html/index.htm as default documents. Deviating from this requires explicit configuration.
  • SEO vs. UX separation: Search engines parse markdown files for project metadata, while humans expect formatted HTML layouts. Serving both simultaneously from one URL violates HTTP standards.
  • GitHub Pages architecture: The static site generator treats the /docs folder as a separate site, merging it with / if both exist. This forces either markdown or HTML dominance.

Real-World Impact

  • SEO dilution: Projects lose search visibility for keywords if README.md (containing project descriptions) isn’t indexed.
  • User friction: Visitors landing on the GitHub Pages site see generic repository views instead of polished HTML content.
  • Maintenance overhead: Manual workarounds (like README hyperlinks) create broken risks and inconsistent user journeys.
  • Documentation gaps: Project metadata becomes inaccessible to non-GitHub interfaces (e.g., npm, PyPI).

Example or Code



  // Redirect humans to HTML content after 3 seconds
  setTimeout(() => {
    window.location.href = "/docs/index.html"; // Path to actual HTML content
  }, 3000);

  // No-op for search engines (they won't execute JS)

How Senior Engineers Fix It

  • Dual-site approach:
    • Publish HTML content in /docs/
    • Set index.html to redirect to /docs/
    • Keep README.md in the root for SEO
  • JavaScript-based redirection:
    Use client-side JS to detect human bots (via navigator.userAgent patterns) and redirect accordingly.
  • GitHub Actions integration:
    Automatically generate index.html from README.md using markdown-to-HTML converters (e.g., pandoc) in the publishing workflow.
  • Custom domain configuration:
    Point docs.example.com to the HTML content while keeping example.com for README.md.

Why Juniors Miss It

  • Assuming GitHub Pages treats repositories uniformly: Not realizing README.md is excluded by default from published sites.
  • Overlooking SEO mechanics: Focusing only on user-facing content without considering search engine crawling behavior.
  • Ignoring architectural constraints: Attempting to force GitHub Pages into roles it doesn’t support (e.g., dynamic content serving).
  • Neglecting audience separation: Treating search engines and humans as the same audience requiring identical content delivery.

Leave a Comment