NextCloud crashing #37

Closed
opened 2025-07-28 02:26:25 +00:00 by eric · 3 comments
Owner

NextCloud (along with several other services) is crashing frequently, seemingly as a result of Redis crashing. Here are some log messages:

Redis:

Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

NextCloud

[Mon Jul 28 01:31:02.852886 2025] [mpm_prefork:notice] [pid 1:tid 1] AH00170: caught SIGWINCH, shutting down gracefully                                       10.244.9.1 - - [28/Jul/2025:01:30:57 +0000] "GET /status.php HTTP/1.1" 200 1622 "-" "kube-probe/1.31"        

If there are too many file pointers open, adding additional processors to the stack may not be helpful. It would depend on a number of factors...

If I added another VM to the cluster that is running on the R720, and Redis was running on this new instance, it could utilize the underlying filesystem of that machine, thus alleviating disk pressure.

NextCloud (along with several other services) is crashing frequently, seemingly as a result of Redis crashing. Here are some log messages: Redis: ``` Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis. ``` NextCloud ``` [Mon Jul 28 01:31:02.852886 2025] [mpm_prefork:notice] [pid 1:tid 1] AH00170: caught SIGWINCH, shutting down gracefully 10.244.9.1 - - [28/Jul/2025:01:30:57 +0000] "GET /status.php HTTP/1.1" 200 1622 "-" "kube-probe/1.31" ``` If there are too many file pointers open, adding additional processors to the stack may not be helpful. It would depend on a number of factors... If I added another VM to the cluster that is running on the R720, and Redis was running on this new instance, it could utilize the underlying filesystem of that machine, thus alleviating disk pressure.
Author
Owner

Using an NFS Client in a Kubernetes Helm Chart

In this first pass at addressing the issue described above, I attempted to configure the Bitnami Helm chart for Redis to use the NFS client deployed in DevOps/software-infrastructure#23. Unfortunately, I encountered two issues in doing so: the first was that the Redis deployment uses a Kubernetes Stateful Set, which does not allow updates to the underlying storage class. I believe I will need to delete this Stateful Set in order to use the new NFS provisioner; however, I am apprehensive to do so out of fear that it might create an unrecoverable situation. The attempted change also caused the Redis pods to be shuffled to R720 nodes, so I am curious to see if the problem is resolved with this unintended result. The second problem encountered during this update was an incompatibility between PostgreSQL versions. I was able to pin the required one, but this can only be a temporary solution. I will need to consult Nextcloud to see how they recommend upgrading PostgreSQL versions when using Helm.

# Using an NFS Client in a Kubernetes Helm Chart <video controls type="video/mp4" src="https://minio.eom.dev/public/Videos/2025-08-01_12-20-52.mp4"></video> In this first pass at addressing the issue described above, I attempted to configure the Bitnami Helm chart for Redis to use the NFS client deployed in DevOps/software-infrastructure#23. Unfortunately, I encountered two issues in doing so: the first was that the Redis deployment uses a Kubernetes Stateful Set, which does not allow updates to the underlying storage class. I believe I will need to delete this Stateful Set in order to use the new NFS provisioner; however, I am apprehensive to do so out of fear that it might create an unrecoverable situation. The attempted change also caused the Redis pods to be shuffled to R720 nodes, so I am curious to see if the problem is resolved with this unintended result. The second problem encountered during this update was an incompatibility between PostgreSQL versions. I was able to pin the required one, but this can only be a temporary solution. I will need to consult Nextcloud to see how they recommend upgrading PostgreSQL versions when using Helm.
eric added spent time 2025-08-01 17:57:11 +00:00
1 hour 5 minutes
Author
Owner

The Redis pods have not had a restart since the deployment. It is still too soon to tell, but it seems that having these pods cycled to the new R720 nodes may have been sufficient to mitigate the issue. I am going to give it a full 24 hours before closing the issue.

The Redis pods have not had a restart since the deployment. It is still too soon to tell, but it seems that having these pods cycled to the new R720 nodes may have been sufficient to mitigate the issue. I am going to give it a full 24 hours before closing the issue.
Author
Owner

After 48 hours running on the new node, I am satisfied that this is now working and switching to the second NFS client was unnecessary.

After 48 hours running on the new node, I am satisfied that this is now working and switching to the second NFS client was unnecessary.
eric closed this issue 2025-08-03 22:34:58 +00:00
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Total Time Spent: 1 hour 5 minutes
eric
1 hour 5 minutes
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: DevOps/ansible-role-eom#37
No description provided.