Some notes on wasting less time on computers

If I seem to be on the right track I can write these up more formally.

The problem

We have two things we want to safeguard: data and time. How can we keep our data safe and accessible with the minimum of time wasting (especially unpredictably due to emergencies)? When things do go wrong, how can we recover quickly?


The key is to realise that the important and unique things are our data and time. Hardware is neither. The market provides cheap, consistent (but not always reliable) commodity hardware; we need to combine it with high quality software and processes to achieve a reliable system.

Elements of a solution

  1. Simplicity. Setup and recovery in particular should be simple. The essential information you need to access things in an emergency should fit on a business card. The systems must be simple to set up and administer, so we spent almost all our time working, not staying afloat.

  2. Repetition. Use the same hardware and software everywhere as far as possible. Also, use the same processes, so that there are, as far as possible, no emergency-only, easily forgettable procedures.

  3. Automation. Backups and reinstalls should be fully automated as far as possible. Where they aren't, they should be machine-guided and thoroughly documented. This is also the key to addressing ease of use: if everything is automated, there's nothing to remember day to day. If everything is documented, you can understand it quickly when it goes wrong.

  4. Redundancy. Redundant hardware, and redundant software checks, eliminate single points of failure. Redundant hardware can be reused when needed (e.g. turning a server into a desktop machine in an emergency).

  5. Distribution and networking. Data should be replicated in multiple places (on disconnected machines, on a central server, and off-site). Again, this should be fully automatic. Servers should serve not just data, but also applications: if your laptop breaks, you should be able to boot an arbitrary machine from a CD or USB key, and continue working remotely on the server.

  6. Compatibility. Without compromising reliability or security, we should easily interwork with the rest of the world, using protocols and formats that allow us with little work to both interwork with others and commandeer standard hardware and software for our needs (e.g. web browser access to files and email, NoX client for remote login).

  7. Security. We need to keep our data safe so that we avoid fraud, IP and identity theft, and also fulfil our various legal obligations.

  8. Monitoring. We should be able to tell at a glance how our hardware is performing, and problems should be automatically notified. Of especial interest is incipient disk failure, and network bandwidth problems.

This is about principles, not technologies, although it is informed by the state of the market. If we can agree on what we're trying to achieve, and in what manner, then I can write an infrastructure plan imbued with those values to satisfy those goals.

Last updated 2006/05/22