ISTIO – Virtual service to virtual service routing

Summary

A common misconfiguration occurs when attempting to route traffic directly from one Istio VirtualService (VS) to another by referencing the downstream VS’s hostname in the upstream VS’s destination. This pattern results in no healthy upstream 503 errors because the upstream VirtualService does not “call” the downstream VirtualService; it only resolves to Kubernetes Services or ServiceEntries. In the provided example, the deployment-1-vs utilizes a directResponse and defines no actual workload, leading the upstream router to find no healthy upstream hosts, causing the 503 error. The solution is to use a Gateway or delegation (Ingress/Egress patterns) to hand off traffic between scopes, or simply route directly to the backing services if they exist.

Root Cause

The root cause is a misunderstanding of how Istio’s VirtualService operates. A VirtualService is a routing rule configuration, not a service that can be invoked.

  • Direct Routing to a VS Host: When hello-world-vs routes to deployment-1.hello-world-app.svc.cluster.local, Istio looks for a Kubernetes Service or ServiceEntry with that name. It does not look for a VirtualService named deployment-1-vs.
  • Missing Workload: In the provided YAML, deployment-1-vs has directResponse and no route to a real workload. Istio resolves the destination to a valid upstream cluster, but that cluster has no healthy endpoints, triggering the no healthy upstream response.
  • ServiceEntry Misuse: The ServiceEntry resolves the istio analyze warning by registering the host, but it does not link the two VirtualServices. It essentially tells Istio “this host exists,” but since the VS doesn’t route to a real pod, the 503 persists.

Why This Happens in Real Systems

  • Microservice Decoupling: Teams often split ownership of traffic. Team A owns the “Gateway” VS, and Team B owns the “Service” VS. Team A tries to route to Team B’s VS host to inherit its routing logic (e.g., header-based routing).
  • Lack of “Sub-routing” Mechanism: Unlike NGINX or API Gateways, standard Istio VirtualServices do not natively proxy HTTP calls to other VirtualServices within the same data plane in a single hop. The abstraction layer is different.

Real-World Impact

  • Traffic Interruption: Users receive 503 errors, blocking access to the service.
  • Latency in Troubleshooting: Engineers often suspect DNS, mTLS, or ServiceEntry resolution issues, spending hours debugging valid configurations because the error message “Host not found” or “No healthy upstream” is generic.
  • Configuration Bloat: Teams often add unnecessary ServiceEntries in a futile attempt to fix the routing, further complicating the mesh topology.

Example or Code

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: hello-world-vs
  namespace: hello-world-app
spec:
  hosts:
  - hello-world.app
  gateways:
  - hello-world-app
  http:
  - match:
    - uri:
        prefix: /direct
    route:
    # CORRECT: Route to a real Service or Workload
    - destination:
        host: deployment-1-svc.hello-world-app.svc.cluster.local
        port:
          number: 80

How Senior Engineers Fix It

  • Route to Workload, Not Routing Rules: The golden rule is to route to a Kubernetes Service or a specific Pod IP, never to a VirtualService hostname. The downstream VS (if it even needs to exist) should handle traffic arriving from a Gateway or another service, not from a routing rule lookup.
  • Use Gateways for Handoffs: If you must split traffic between two VirtualServices (e.g., to separate Gateway logic from Service logic), you should route the first VS to an internal Gateway, which then triggers the second VS.
    • Step 1: hello-world-vs -> mesh or internal Gateway.
    • Step 2: deployment-1-vs attaches to that internal Gateway.
  • Simplify: In the specific example provided, the fix is to delete deployment-1-vs and configure hello-world-vs to return the directResponse directly, or route to a real backing service.

Why Juniors Miss It

  • Mental Model of “Chaining”: Juniors often treat VirtualServices like API endpoints (functions) that can call other endpoints. They don’t realize that a VS is a filter applied to incoming traffic, not an active client.
  • Over-reliance on istio analyze: The tool warns about “Host not found” (missing ServiceEntry). When the engineer adds the ServiceEntry, the error changes to “No healthy upstream.” They stop there, assuming the ServiceEntry was the missing link, not realizing the VS behind it is fundamentally incapable of receiving traffic via that method.