NextCloud crashing #37
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
NextCloud (along with several other services) is crashing frequently, seemingly as a result of Redis crashing. Here are some log messages:
Redis:
NextCloud
If there are too many file pointers open, adding additional processors to the stack may not be helpful. It would depend on a number of factors...
If I added another VM to the cluster that is running on the R720, and Redis was running on this new instance, it could utilize the underlying filesystem of that machine, thus alleviating disk pressure.
Using an NFS Client in a Kubernetes Helm Chart
In this first pass at addressing the issue described above, I attempted to configure the Bitnami Helm chart for Redis to use the NFS client deployed in DevOps/software-infrastructure#23. Unfortunately, I encountered two issues in doing so: the first was that the Redis deployment uses a Kubernetes Stateful Set, which does not allow updates to the underlying storage class. I believe I will need to delete this Stateful Set in order to use the new NFS provisioner; however, I am apprehensive to do so out of fear that it might create an unrecoverable situation. The attempted change also caused the Redis pods to be shuffled to R720 nodes, so I am curious to see if the problem is resolved with this unintended result. The second problem encountered during this update was an incompatibility between PostgreSQL versions. I was able to pin the required one, but this can only be a temporary solution. I will need to consult Nextcloud to see how they recommend upgrading PostgreSQL versions when using Helm.
The Redis pods have not had a restart since the deployment. It is still too soon to tell, but it seems that having these pods cycled to the new R720 nodes may have been sufficient to mitigate the issue. I am going to give it a full 24 hours before closing the issue.
After 48 hours running on the new node, I am satisfied that this is now working and switching to the second NFS client was unnecessary.