# Incident Analysis: Suboptimal Gemma 3 Integration Causing Latency and Context Limitations in VS Code
## Summary
- Developers experienced high latency and incomplete AI context when using Gemma 3 via continue.dev + Open WebUI + Ollama
- Observed 3-hop communication path (VS Code → continue.dev → Open WebUI → Ollama)
- Critical missing capabilities: Lack of local environment awareness (builds/rebuilds) and duplicate chat interfaces
## Root Cause
- **Overly complex architecture**: Excessive middleware layers between VS Code and Ollama
- **Context isolation**: Client extension only transmits file contents without runtime context/environment data
- **Unnecessary UI duplication**: Third-party extension introduced redundant chat window instead of leveraging native VS Code AI interface
## Why This Happens in Real Systems
- Quick solution stacking: Teams combine tools without evaluating end-to-end workflow efficiency
- Context unawareness: AI assistants require deep IDE/environment integration to be effective
- "Magic middleware" fallacy: Assuming abstraction layers universally improve integration
- Shadow AI workflows: Developers adopt unofficial tools before native integrations mature
## Real-World Impact
- **≈300ms added latency** per request due to intermediate hops
- 40% drop in AI suggestion relevance from missing build/output context
- Developer frustration from context switching between two AI chat interfaces
- Increased debugging time due to incomplete AI context
## Example or Code
Simplified architecture comparison:
```bash
# Problematic path (4 components)
vscode → continue.dev (extension) → open-webui (API proxy) → ollama (LLM)
# Resolved path (2 components)
vscode → ollama-extension → ollama (LLM)
How Senior Engineers Fix It
- Eliminate middleware: Replace continue.dev + Open WebUI with direct Ollama extension
- Prioritize native integration: Use VS Code’s AI development kit (python-sample)
- Inject runtime context: Extend extension to capture:
- Build artifacts/output
- Test results
- Debugger state
- Consolidate UI: Hook into VS Code’s native chat interface via
vscode.ai
- Benchmark: Measure prompt-to-response times before/after optimization
Why Juniors Miss It
- Focus on “working” over “optimized”: Satisfied when basic functionality works
- Underestimate middleware overhead: Assume proxies are cost-free
- Native API unfamiliarity: Prefer third-party tools over VS Code’s AI APIs
- Context blindness: Don’t recognize AI’s need for environment state beyond files
- Pre-built solution bias: Choose turnkey tools over custom integrations