Elastic Cloud On Kubernetes – settings.json: Read-only file system

Summary

Key Takeaway: The warning indicates that Elasticsearch’s File Settings Service (reserved state) cannot update the timestamp on settings.json because the filesystem is mounted read-only, which is a default security configuration in Kubernetes (including ECK). This is often a non-issue if you don’t modify settings.json after deployment, but it can prevent runtime adjustments to cluster settings via the operator.

  • Symptom: Master node logs repeated warnings: encountered I/O error trying to update file settings timestamp and Read-only file system for /usr/share/elasticsearch/config/operator/settings.json.
  • Trigger: Restoring a snapshot from another Elastic Cloud cluster (which may have enabled the File-Based Settings feature) into an ECK deployment.
  • Impact: The warning is benign if you do not intend to update cluster settings via the file path; however, it indicates that the File Settings Service is running but failing to persist changes, potentially blocking dynamic configuration updates.

Root Cause

The root cause is a mismatch between how Elasticsearch’s File Settings Service operates and the Kubernetes Persistent Volume (PVC) configuration in ECK.

  • File Settings Service: Elasticsearch (starting from v8.11+) has a feature that watches /usr/share/elasticsearch/config/operator/settings.json for dynamic cluster settings changes. To prevent file timestamps from causing unwanted reloads, it tries to update the file’s timestamp via java.nio.file.Files.setLastModifiedTime.
  • Read-Only Mount: In ECK, the /usr/share/elasticsearch/config directory is typically mounted from a PVC with readOnlyRootFilesystem: true in the Pod security context. This enforces that the root filesystem (including mounted config paths) is read-only to the container process.
  • Snapshot Restore Context: When restoring a snapshot from another cluster (like Elastic Cloud), the cluster state might include feature states that enable or reference the File Settings Service. If the source cluster had file-based settings enabled, the restored cluster inherits this behavior, triggering the service on startup.

Specific stack trace analysis:

  • The error originates from MasterNodeFileWatchingService.refreshExistingFileStateIfNeeded, which attempts to set the last modified time of settings.json.
  • java.nio.file.FileSystemException confirms the OS-level read-only restriction.

Why not a bug in Elasticsearch?: This is expected behavior; Elasticsearch does not assume a read-only config directory. The issue arises from the Kubernetes deployment constraints.

Why This Happens in Real Systems

In production environments, security best practices dictate immutable infrastructure and least privilege. Kubernetes operators like ECK enforce these by design:

  • Security Posture: ECK sets readOnlyRootFilesystem: true to prevent runtime modifications to the container image and configuration, reducing attack surface.
  • Operator-Driven Configuration: ECK manages cluster settings via custom resources (CRDs) and sidecar containers, not via the in-cluster File Settings Service. The service is disabled or unsupported in ECK’s default setup.
  • Snapshot Restore Nuances: Restoring snapshots from external sources (e.g., Elastic Cloud) introduces state that may expect writable config directories. If the snapshot includes reserved_state (which powers File Settings), Elasticsearch enables the service, leading to I/O errors when it tries to write.

Common scenarios:

  • Teams migrating between Elastic Cloud and self-managed ECK via snapshots.
  • ECK deployments where the PVC mount points inherit read-only settings from the Pod security context.
  • Clusters upgraded from versions before 8.11 (where file settings were introduced) via snapshot restore, activating the feature unexpectedly.

Real-World Impact

  • Operational Risk: Low to Medium. The warning does not stop cluster operations; nodes remain healthy, and data is accessible. It’s logged repeatedly but won’t crash the pod (since it’s a WARN, not FATAL).
  • Functional Limitations:
    • If you need to update cluster settings via settings.json (e.g., during troubleshooting or custom tuning), changes won’t persist. The File Settings Service will fail to apply them.
    • In multi-node clusters, repeated warnings can pollute logs, increasing storage costs and complicating monitoring (e.g., false positives in alerting rules for I/O errors).
  • Blast Radius:
    • Single-node dev clusters: Annoying noise, but harmless.
    • Production clusters: Could mask real I/O issues if warnings spike. If the cluster relies on file-based settings for compliance or automation, it may break workflows.
  • Long-Term Behavior: ECK will recreate the settings.json on pod restarts, but the timestamp update fails consistently. No data loss occurs, but the service remains in a retry loop.

Example or Code

Kubernetes Security Context Configuration (ECK YAML): This is how ECK enforces the read-only filesystem, leading to the error.

apiVersion: apps/v1
kind: StatefulSet
spec:
  template:
    spec:
      containers:
      - name: elasticsearch
        securityContext:
          readOnlyRootFilesystem: true
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/config
          name: elasticsearch-config
          readOnly: true  # This is the culprit for the settings.json write failure
      volumes:
      - name: elasticsearch-config
        persistentVolumeClaim:
          claimName: elasticsearch-config-pvc

Elasticsearch Log Entry (Simulated from Input): The exact log format you’re seeing.

{"@timestamp":"2026-01-16T09:26:55.841Z","log.level":"WARN","message":"encountered I/O error trying to update file settings timestamp","error.type":"java.nio.file.FileSystemException","error.message":"/usr/share/elasticsearch/config/operator/settings.json: Read-only file system"}

How Senior Engineers Fix It

Senior engineers approach this by balancing security, functionality, and operational simplicity. Do not simply disable security settings without thorough risk assessment.

  1. Assess Necessity of File Settings Service:

    • Determine if you need to use settings.json for dynamic configuration. In ECK, prefer managing settings via the Elasticsearch custom resource (e.g., spec.config or spec.env for runtime tweaks).
    • If file settings aren’t used, suppress the warning by disabling the service: Set xpack.reserved_state.disabled: true in the Elasticsearch configuration via ECK (add to spec.config in the CR).
  2. Modify Pod Security Context (If File Settings Are Required):

    • In the ECK Helm chart or Operator manifest, update the Pod security context to allow writes to specific paths:
      spec:
        securityContext:
          readOnlyRootFilesystem: false  # Global override; use with caution
        volumes:
        - name: elasticsearch-config
          persistentVolumeClaim:
            claimName: elasticsearch-config-pvc
          readOnly: false
    • Best Practice: Use a dedicated writable volume for /usr/share/elasticsearch/config/operator (e.g., an emptyDir or separate PVC with write permissions) instead of disabling read-only FS entirely.
    • Restart the ECK Operator and Elasticsearch pods to apply changes.
  3. Handle Snapshot Restores Proactively:

    • Before restoring, edit the target cluster’s configuration to include xpack.reserved_state.disabled: true.
    • Post-restore, verify via GET /_cluster/settings that no file-based overrides are active.
  4. Monitor and Validate:

    • After fix, tail logs to confirm no more warnings. Use tools like Elasticsearch’s _cluster/settings API to check reserved state status.
    • If scaling or upgrades occur, re-check security contexts, as ECK Helm charts may reset defaults.
  5. Rollback Option: If changes break anything, revert to read-only and rely solely on ECK resource-based config management—this is the recommended ECK pattern.

Outcome: Fixes the I/O error without compromising cluster security, assuming file settings aren’t mission-critical.

Why Juniors Miss It

Junior engineers often overlook this due to focus on application-level issues rather than infrastructure constraints:

  • Lack of Kubernetes Security Awareness: Juniors may not review Pod security contexts (e.g., readOnlyRootFilesystem), assuming containers have full filesystem access like local processes. They prioritize Elasticsearch configs over infra settings.
  • Snapshot Restore Assumptions: Treating snapshots as “portable” without checking for feature-state incompatibilities. Juniors might miss that Elastic Cloud enables advanced features like reserved state, which aren’t ECK defaults.
  • Log Interpretation Gaps: Viewing the warning as a generic I/O error, not tracing it to the File Settings Service. They might delete the file or ignore it without understanding the underlying service loop.
  • Over-Reliance on Defaults: Assuming the official Helm chart is “production-ready” without customizing for ECK’s operator-driven model. Juniors often skip validating mount options in PVCs.
  • Debugging Shortcut: They might increase log verbosity or search for Elasticsearch bugs, ignoring Kubernetes-level evidence like kubectl describe pod showing readOnly volumes.

Prevention Tip: Always cross-reference Elasticsearch logs with Kubernetes manifests (kubectl get pod <pod> -o yaml) when deploying via operators. Training on security contexts (e.g., from CNCF docs) helps bridge this gap.