Configuring connection pool setting – maxRequestsPerHost in openai-ai java

Summary

A production Java service using the openai-java SDK experienced intermittent timeout errors and elevated latency when calling OpenAI APIs under load. The root cause was an unconfigured OkHttp connection pool, specifically the maxRequestsPerHost limit. The default OkHttp client settings (5 requests per host) bottlenecked concurrent API calls, causing requests to queue and eventually time out. We resolved this by explicitly configuring a custom OkHttpClient with increased pool limits and attaching it to the OpenAI client builder.

Root Cause

The default OpenAiOKHttpClient provided by the SDK is a minimal wrapper around OkHttp with conservative default settings.

  • Default Pooling Constraints: OkHttp’s default maxRequestsPerHost is 5. This means the SDK could only maintain 5 in-flight requests to api.openai.com at any given moment.
  • Blocking Behavior: When the application attempted to send more than 5 concurrent requests, the excess requests were blocked and placed in a queue waiting for a connection to become available.
  • Latency Accumulation: Under high concurrency, the queuing delay combined with OpenAI’s processing time caused the overall request latency to spike, eventually triggering client-side timeouts.

Why This Happens in Real Systems

Connection pool misconfiguration is a common “silent killer” in microservices.

  • Hidden Bottlenecks: The application layer (thread pool) is often scaled up to handle high throughput, but the HTTP client layer remains unconfigured. The HTTP client becomes the bottleneck before the CPU or memory is saturated.
  • Concurrency vs. Parallelism: Developers often assume that spinning up multiple threads guarantees parallel API calls. However, if the underlying ConnectionPool limits the number of active connections, those threads end up waiting for I/O resources rather than executing in parallel.

Real-World Impact

  • Increased Latency (P95/P99): Users experienced slow response times because requests spent seconds waiting in the HTTP client’s internal queue.
  • Request Timeouts: Critical operations failed due to exceeding the configured read timeouts while waiting for a connection to open up.
  • Resource Exhaustion: Threads were blocked waiting on I/O, reducing the available worker threads for other tasks and risking thread starvation.

Example or Code

To fix this, you must construct your own OkHttpClient instance and pass it to the OpenAIClient builder.

1. The Fix (Java):

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAiOkHttpClient;
import okhttp3.OkHttpClient;
import java.time.Duration;

public class OpenAiConfiguration {

    public static OpenAIClient createConfiguredClient() {
        // 1. Configure the underlying OkHttpClient
        OkHttpClient okHttpClient = new OkHttpClient.Builder()
                // CRITICAL: Increase the max requests per host (Default is 5)
                .connectionPool(new okhttp3.ConnectionPool(
                        50, // maxIdleConnections
                        10, // keepAliveDuration
                        java.util.concurrent.TimeUnit.MINUTES
                ))
                // Ensure read/write timeouts are sufficient for AI generation
                .connectTimeout(Duration.ofSeconds(10))
                .readTimeout(Duration.ofSeconds(60)) 
                .writeTimeout(Duration.ofSeconds(10))
                .build();

        // 2. Wrap it in the OpenAI SDK's OkHttp client wrapper
        OpenAiOkHttpClient customClient = OpenAiOkHttpClient.builder()
                .from(okHttpClient)
                .build();

        // 3. Build the main OpenAI client
        return OpenAIClient.builder()
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .httpClient(customClient)
                .build();
    }
}

How Senior Engineers Fix It

  • Instrumentation First: Before tuning, they add metrics to track the connection pool (e.g., evictedConnectionCount, queuedConnectionCount) or thread pool queue sizes to prove a bottleneck exists.
  • Proactive Configuration: They never rely on library defaults for production HTTP clients. They explicitly define ConnectionPool settings, timeouts, and retry strategies aligned with the downstream API’s rate limits and SLAs.
  • Dependency Awareness: They inspect the source code of the SDK wrapper to understand what underlying library is being used (in this case, OkHttp) and how to inject custom configurations.

Why Juniors Miss It

  • Abstracted Complexity: The OpenAiOKHttpClient looks like a “magic” black box. Juniors often assume the library authors chose the “best” defaults, not realizing those defaults are for generic compatibility, not high-performance production use.
  • Focus on Application Logic: Juniors tend to focus on the business logic (the prompt, the response parsing) and overlook the infrastructure plumbing (connection management, serialization, and network I/O).
  • Lack of Load Testing: Configuration issues like maxRequestsPerHost only surface under load. Without high-concurrency load tests, the default settings (5 concurrent requests) may seem “fast enough” for simple development tasks.