Summary
A production Java service using the openai-java SDK experienced intermittent timeout errors and elevated latency when calling OpenAI APIs under load. The root cause was an unconfigured OkHttp connection pool, specifically the maxRequestsPerHost limit. The default OkHttp client settings (5 requests per host) bottlenecked concurrent API calls, causing requests to queue and eventually time out. We resolved this by explicitly configuring a custom OkHttpClient with increased pool limits and attaching it to the OpenAI client builder.
Root Cause
The default OpenAiOKHttpClient provided by the SDK is a minimal wrapper around OkHttp with conservative default settings.
- Default Pooling Constraints: OkHttp’s default
maxRequestsPerHostis 5. This means the SDK could only maintain 5 in-flight requests toapi.openai.comat any given moment. - Blocking Behavior: When the application attempted to send more than 5 concurrent requests, the excess requests were blocked and placed in a queue waiting for a connection to become available.
- Latency Accumulation: Under high concurrency, the queuing delay combined with OpenAI’s processing time caused the overall request latency to spike, eventually triggering client-side timeouts.
Why This Happens in Real Systems
Connection pool misconfiguration is a common “silent killer” in microservices.
- Hidden Bottlenecks: The application layer (thread pool) is often scaled up to handle high throughput, but the HTTP client layer remains unconfigured. The HTTP client becomes the bottleneck before the CPU or memory is saturated.
- Concurrency vs. Parallelism: Developers often assume that spinning up multiple threads guarantees parallel API calls. However, if the underlying ConnectionPool limits the number of active connections, those threads end up waiting for I/O resources rather than executing in parallel.
Real-World Impact
- Increased Latency (P95/P99): Users experienced slow response times because requests spent seconds waiting in the HTTP client’s internal queue.
- Request Timeouts: Critical operations failed due to exceeding the configured read timeouts while waiting for a connection to open up.
- Resource Exhaustion: Threads were blocked waiting on I/O, reducing the available worker threads for other tasks and risking thread starvation.
Example or Code
To fix this, you must construct your own OkHttpClient instance and pass it to the OpenAIClient builder.
1. The Fix (Java):
import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAiOkHttpClient;
import okhttp3.OkHttpClient;
import java.time.Duration;
public class OpenAiConfiguration {
public static OpenAIClient createConfiguredClient() {
// 1. Configure the underlying OkHttpClient
OkHttpClient okHttpClient = new OkHttpClient.Builder()
// CRITICAL: Increase the max requests per host (Default is 5)
.connectionPool(new okhttp3.ConnectionPool(
50, // maxIdleConnections
10, // keepAliveDuration
java.util.concurrent.TimeUnit.MINUTES
))
// Ensure read/write timeouts are sufficient for AI generation
.connectTimeout(Duration.ofSeconds(10))
.readTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(10))
.build();
// 2. Wrap it in the OpenAI SDK's OkHttp client wrapper
OpenAiOkHttpClient customClient = OpenAiOkHttpClient.builder()
.from(okHttpClient)
.build();
// 3. Build the main OpenAI client
return OpenAIClient.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.httpClient(customClient)
.build();
}
}
How Senior Engineers Fix It
- Instrumentation First: Before tuning, they add metrics to track the connection pool (e.g.,
evictedConnectionCount,queuedConnectionCount) or thread pool queue sizes to prove a bottleneck exists. - Proactive Configuration: They never rely on library defaults for production HTTP clients. They explicitly define
ConnectionPoolsettings, timeouts, and retry strategies aligned with the downstream API’s rate limits and SLAs. - Dependency Awareness: They inspect the source code of the SDK wrapper to understand what underlying library is being used (in this case, OkHttp) and how to inject custom configurations.
Why Juniors Miss It
- Abstracted Complexity: The
OpenAiOKHttpClientlooks like a “magic” black box. Juniors often assume the library authors chose the “best” defaults, not realizing those defaults are for generic compatibility, not high-performance production use. - Focus on Application Logic: Juniors tend to focus on the business logic (the prompt, the response parsing) and overlook the infrastructure plumbing (connection management, serialization, and network I/O).
- Lack of Load Testing: Configuration issues like
maxRequestsPerHostonly surface under load. Without high-concurrency load tests, the default settings (5 concurrent requests) may seem “fast enough” for simple development tasks.