I’m running Docker on Ubuntu server; around 50 containers running, most admin via Portainer. Configuration files and small databases for container applications are stored on the local SSD, media and larger files are stored on a NAS.
NAS data and the container folders are backed up.
I have a second identical machine doing nothing. What would you recommend researching to add resilience to this setup? Top priority is quick and easy restoration should the SSD fail - everything else is relatively easy to replace.
I’ll create an SSD RAID but I like the idea of a second host.
You can use docker swarm (or a better container orchestrator) to have the containers automatically fail over to the second host
Swarm will also spread the load out over both hosts, but all your data would need to be accessible by both hosts
Thanks. That means I need to move all data off the hosts on to, say, a NAS - then the NAS becomes the single point of failure. Can I operate a swarm without doing that but still duplicate everything from host 1 to host 2, so host 2 could take over relatively seamlessly (apart from local DNS and moving port forwarding to nginx on the remaining host)?
Yes could sync the 2 hosts data, you also can use both hosts as nginx upstreams.
I think you can run a ceph or glusterfs cluster for sharing files in a cluster
I think 3 nodes are required for that
Thanks. Can I use my existing, single Docker to start a new swarm, or do I have to start from scratch?
Container orchestration is what you’re looking for. Kubernetes is the most popular, but it might be overkill it’s hard to say based on your setup. However it’s definitely useful experience to know how to run it.
Thanks. Could I achieve a simple 2-host solution with Kubernetes though?
Nothing about k8s is simple. But yes you can achieve that.
Take a look at Rancher for actually running a cluster.
I put my dockers on mirrored zfs pool and have enough spare parts in case of breakdowns.
So you have Docker itself on a single host (with parts) and all the containers in fault tolerant storage, and the most work you’d have to do in the event of host drive failure is to re-install the OS and Docker itself?
I have the OS (with docker) mirrored too. So no reinstalling, just disk or other parts swapping in case of a failure. I hope. A mothboard swap is the worst downtime. I have done this and needed to fiddle with network settings due to changed net interface name to get the server up again.
It might be enough to just rsync stuff to the secondary regularly and the inactive machine monitor the active machine and just start all services as the active machine stops responding.
Learning K8s is a lot to take on, but it will pay off as your needs expand in the long term — and if you decide to go into infra/ops at work.
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters DNS Domain Name Service/System HTTP Hypertext Transfer Protocol, the Web NAS Network-Attached Storage k8s Kubernetes container management package nginx Popular HTTP server
4 acronyms in this thread; the most compressed thread commented on today has 8 acronyms.
[Thread #162 for this sub, first seen 24th Sep 2023, 17:15] [FAQ] [Full list] [Contact] [Source code]