(Having little experience of pets, and none of anonymous herds of cattle, perhaps “like blood cells, not like limbs” is a better analogy…)
This is a pretty common adage among people working at any kind of scale. I just wish it was more generally accepted.
Reboot your servers often. Destroy and rebuild them often. Poke them, prod them, ensure that your services continue.
The more you do this, the more confidence you can have in the resilience of your system.
High uptime was cool in the 90s. It’s also a measure of how long it’s been since you tested the machine starts up ok. You want to test that before a critical downtime situation. The longer you leave it, the more likely you are to find that, oh, oh dear, an update broke your boot loader config, or you never actually booted into that kernel you installed a few months back, or nobody still at the company knows the password to the encrypted data partition.
You may claim to have a fancy high availability setup with keepalived or CARP etc. Do you know that? When did you last see it work?
“But I’ve only got one server!”
Resource constraints are a thing, I know, but still.
Can you offload to SaaS/PaaS/IaaS? Can you run a second server?
What’s the real cost of it being down? Determining this is hard. Inevitably you’ll have forgotten to back something up; some small piece of critical data like a config file you manually tweaked once at install time and forgot about.
You do not want to find yourself in the situation where you’re not applying security updates to a machine because you’re scared they’ll break it. That way lies misery and ruin.
Yes in pure billing terms, renting one beefy server and running everything on it is cheaper than a couple of PaaS applications, or even just several smaller servers. Can you at least use some form of containment so the things you run on it are isolated enough to easily run elsewhere?
Yes there’s environments where this approach isn’t practical. Ask yourself if yours is really one of them though.
If you’re concerned because you’ve no idea what would happen if one of your servers suddenly died, maybe it’s time to treat Fido with a bit less personal attachment.