Lucas Luitjes

Freelance dev/devops/security. Click here to learn more about my background and the services I offer.

ClickOps doesn't have to be terrible

09 Dec 2023

Look, I get it. Like most backend developers who started out in small organizations in the 00s, I set up my share of servers and VMs by hand. I had enough technical skills to get something up and running, but no clue about the non-technical side of system administration. It was not reliable, always a pain, and Infrastructure as Code is a giant improvement over those days.

That said, for small organizations, building and maintaining an IaC (Infrastructure as Code) setup has significant risks and downsides. Especially if your team isn’t already very familiar with those tools. I don’t think this is a particularly controversial statement. Even Yevgeniy Brikman, author of the O’Reilly book “Terraform: Up and Running” and co-founder of an IaC startup, wrote:

It might be slightly heretical for the author of a book on Terraform to say this, but not every team needs IaC. Adopting IaC has a relatively high cost, and although it will pay off in the long term for some scenarios, it won’t for others; for example, if you’re at a tiny startup with just one Ops person, or you’re working on a prototype that might be thrown away in a few months, or you’re just working on a side project for fun, managing infrastructure by hand is often the right choice.

I want to talk about the middle ground: automation where appropriate, and the processes and procedures that great sysadmins used before we had tools like Docker, Kubernetes, and Terraform. I got lucky and learned from some amazing ops people, right around the time we started calling them DevOps. Here’s what I learned from them:

  • Document everything and have processes in place to keep documentation up to date. Focus on checklists and brief overviews, not lengthy formal documents.
  • Write, test, and use run-books for infrastructure changes.
  • Don’t blindly automate, but use business requirements to determine the right level of automation (Do you need auto-scaling if your load never changes?).
  • Set up solid logging, preferably centralized and searchable. If you’re short on time and don’t have a huge amount of logs, setting up rsyslog with a SaaS platform like Papertrail is very easy.
  • For servers and VMs: do configuration management with lightweight tools like Ansible.
  • Set up basic monitoring and alerting; extend if high availability/reliability matter for this use-case (For some applications, a bit of downtime is tolerable, but nobody wants to deal with a server with a disk so full that you can’t SSH into it).
  • Have a disaster recovery plan, and have a process in place to verify that it works (You don’t know if your backups are really there unless you actually verify that you can restore from them).
  • If you depend on manual procedures that are currently not viable targets for automation, set up solid processes to ensure those procedures are executed in time.
  • Make sure there are clear lines of communication with all stakeholders, including developers, management, support, and users.

Do your ClickOps like this, and:

  • You will have solid, reliable infrastructure.
  • Once your organization scales to the point where IaC is actually worth it, you’re in an excellent position to migrate quickly.
  • New types of infrastructure are quicker to set up. You only need to understand the infrastructure, not the provisioning tools around it.
  • Setting up similar infrastructure is not as fast as with IaC, but since you documented everything it’s still pretty quick.
  • You won’t have to spend time keeping your IaC setup up to date with the rapid pace of modern infrastructure tools.
  • You won’t need to look for workarounds like local-exec or custom provisioners if you run into a cloud feature not yet supported by your favorite IaC tool.
  • People have been doing and writing about basic system administration for several decades. The tools are extremely mature and well-documented. Any problem you’re likely to run into at pre-IaC scale, somebody will have posted a solution online.
  • Bonus benefit: Being familiar with this workflow means you don’t develop blind spots for problems that IaC tools mostly solve (e.g., believing you have backups because Tool X said you did).

I think a lot of developers started like me, but weren’t exposed to experienced sysadmins before moving to IaC tools. And I think a lot of that knowledge is getting lost, snowed under in the avalanche of content about IaC tools for larger organizations.

Whenever I set this up for clients, they seem pleasantly surprised that it’s possible to have good infrastructure without a ton of Kubernetes clusters and Terraform resources. Which makes me wonder: is this a topic people are interested in?

I’m considering creating some content (videos? books? a course? specialized consulting business? help me decide!) on how to professionalize your infrastructure when you’re not quite ready for fully automated Infrastructure as Code.

If you’re interested, leave your email address below. I’ll send you sneak previews and ask you the occasional question about what topics you care about most, and what form it should take.