Jupyter Kernel stuck on “Connecting” in Docker (CellOracle) on macOS Silicon (M2)

Summary

A legacy amd64‑only Jupyter environment was run inside Docker on an Apple Silicon (M2) host. Although the container launched and Jupyter started, the notebook kernel stayed stuck on “Connecting…” because the browser could not establish a WebSocket connection to the kernel running under Rosetta‑translated Docker networking. The issue was not Tornado itself but a deeper mismatch between architecture emulation, WebSocket handling, and Jupyter’s kernel gateway.

Root Cause

The failure stemmed from a combination of factors:

  • Running an amd64 Jupyter stack under QEMU/Rosetta on Apple Silicon introduces subtle timing and networking inconsistencies.
  • WebSockets from browser → host → Docker → QEMU → kernel fail silently when the emulated environment cannot respond within expected time windows.
  • Legacy Jupyter + Tornado versions rely on older WebSocket handshake logic that is fragile under emulation.
  • The CellOracle image is not built for ARM, and its dependencies were never tested under Rosetta.

The result:
The kernel process starts, but the browser cannot complete the WebSocket handshake, so the UI stays stuck on “Connecting”.

Why This Happens in Real Systems

This pattern shows up frequently when running scientific containers on Apple Silicon:

  • Architecture emulation adds latency that breaks real‑time protocols like WebSockets.
  • Old Python stacks assume x86 timing behavior, especially around I/O loops.
  • Docker Desktop’s networking layer behaves differently for amd64 containers on ARM hosts.
  • Jupyter kernels require persistent bidirectional WebSocket channels, which are extremely sensitive to:
    • handshake timing
    • TCP keepalive behavior
    • event‑loop scheduling
    • Tornado version mismatches

In short: WebSockets are the first thing to break when you mix old Jupyter stacks with CPU emulation.

Real-World Impact

When this failure occurs, users experience:

  • Kernel stuck on “Connecting” even though logs show it started.
  • No error messages in Jupyter logs, making debugging painful.
  • Browser console WebSocket failures (timeouts, handshake errors).
  • Hours wasted debugging Tornado, ports, tokens, and permissions even though the root cause is architectural.

For research workflows, this often means:

  • Inability to run legacy bioinformatics tools that only ship x86 Docker images.
  • Blocked analysis pipelines in lab environments.
  • Forced migration to ARM‑compatible stacks or remote compute.

Example or Code (if necessary and relevant)

A minimal reproduction of the failure pattern:

docker run -p 9999:9999 --platform linux/amd64 jupyter/base-notebook

Opening any notebook on an M1/M2 Mac frequently results in:

WebSocket connection to 'ws://localhost:9999/api/kernels/.../channels' failed

How Senior Engineers Fix It

Experienced engineers avoid fighting the emulation layer and instead fix the root architectural mismatch:

1. Run the container natively on ARM

If possible:

  • Use an ARM‑compatible base image
  • Rebuild CellOracle from source on ARM
  • Replace the legacy image entirely

2. Use VS Code Remote Containers or Jupyter Server Proxy

These tools bypass the fragile browser → container WebSocket path.

3. Run the kernel in a separate process

Bind the kernel to a stable port and let Jupyter connect to it explicitly.

4. Use a remote Linux server

A real amd64 host avoids all Rosetta/QEMU issues.

5. Rebuild the image with modern Jupyter + Tornado

Senior engineers often:

  • Pin Tornado to a version known to work with Jupyter
  • Upgrade Jupyter to a version with more robust WebSocket handling
  • Remove legacy entrypoints that assume x86 timing

6. Avoid Docker Desktop’s amd64 emulation

Instead:

  • Use Colima with --arch x86_64
  • Or run the container inside a lightweight amd64 VM (UTM, Lima, Multipass)

These approaches provide consistent amd64 networking, which Docker Desktop cannot guarantee.

Why Juniors Miss It

Less experienced engineers often focus on the wrong layers:

  • They assume Tornado, ports, or tokens are the issue.
  • They try reinstalling packages inside the container instead of questioning the architecture mismatch.
  • They trust that “Docker should abstract everything,” not realizing that WebSockets + emulation is a known failure mode.
  • They expect errors to appear in logs, unaware that WebSocket failures occur in the browser, not the server.

The key insight seniors have is:
If Jupyter WebSockets fail on Apple Silicon under amd64 emulation, the problem is almost never inside the container—it’s the architecture boundary itself.

If you want, I can outline a clean ARM‑native rebuild path for CellOracle so you can run it without emulation.

Leave a Comment