Summary
The issue at hand involves a Druid connections pool that randomly throws “connections closed” errors when used with PolarDB PG in a SpringBoot application. This occurs even when the system is not under heavy load, and the PolarDB developers have confirmed that there are no server-side errors. The goal is to identify the root cause and find a suitable fix to prevent these random connection closures.
Root Cause
The root cause of this issue can be attributed to several factors, including:
- Insufficient connection validation: The current configuration has testOnBorrow set to false, which means that connections are not validated when they are borrowed from the pool.
- Inadequate keep-alive settings: Although keep-alive is enabled, the keep-alive-between-time-millis might not be sufficient to prevent connections from being closed due to inactivity.
- Invalid or missing validation query: The validationQuery is set to SELECT 1, but it’s possible that this query is not sufficient to validate the connection properly.
Why This Happens in Real Systems
This issue can occur in real systems due to various reasons, including:
- Network issues: Temporary network connectivity problems can cause connections to be closed unexpectedly.
- Database server configuration: The database server’s configuration, such as the connection timeout or idle connection timeout, can cause connections to be closed.
- Application configuration: The application’s configuration, such as the Druid connection pool settings, can also contribute to this issue.
Real-World Impact
The impact of this issue can be significant, including:
- Application crashes: The random connection closures can cause the application to crash or become unresponsive.
- Data inconsistencies: The connection closures can lead to data inconsistencies or errors, especially if the application is performing critical operations.
- Performance degradation: The frequent connection closures and re-establishments can degrade the application’s performance.
Example or Code
// Example Druid configuration with improved settings
properties.put("initial-size", 20);
properties.put("min-idle", 20);
properties.put("connectTimeout", 30000);
properties.put("socketTimeout", 60000);
properties.put("maxActive", 150);
properties.put("maxWait", 60000);
properties.put("time-between-eviction-runs-millis", 120000);
properties.put("min-evictable-idle-time-millis", 240000);
properties.put("max-evictable-idle-time-millis", 600000);
properties.put("keep-alive", true);
properties.put("keep-alive-between-time-millis", 30000); // Reduced to 30 seconds
properties.put("max-lifetime", 1800000);
properties.put("testWhileIdle", true);
properties.put("testOnBorrow", true); // Enabled to validate connections on borrow
properties.put("testOnReturn", false);
properties.put("validationQuery", "SELECT 1");
properties.put("validation-query-timeout", 5);
How Senior Engineers Fix It
Senior engineers can fix this issue by:
- Enabling connection validation: Set testOnBorrow to true to validate connections when they are borrowed from the pool.
- Adjusting keep-alive settings: Adjust the keep-alive-between-time-millis to a lower value to prevent connections from being closed due to inactivity.
- Improving validation query: Use a more robust validation query to ensure that connections are properly validated.
- Monitoring and logging: Implement monitoring and logging to detect and diagnose connection closure issues.
Why Juniors Miss It
Junior engineers may miss this issue due to:
- Lack of experience: Limited experience with Druid connection pools and PolarDB PG can make it difficult to identify the root cause.
- Insufficient knowledge: Limited knowledge of connection validation and keep-alive settings can lead to incorrect configuration.
- Inadequate testing: Inadequate testing and debugging can make it challenging to detect and diagnose the issue.