Use variables in runbooks

Runbooks are a common documentation format used to instruct people to perform specific tasks. It is quite common to see instructions such as:

kubectl get pod | grep <service-name>  # get the name of the pod
cmd1 -p <pod> -c <container name>      # get id from 3rd column
cmd2 -p <pod> -c <container name> <id> # perform some operation

This is a terrible practice!!

The commands in the runbook are typically intended to be copy/pasted by a human into a shell for execution. If they are written as above, the human needs to expend cognitive effort to parse the ouptut of get pod and discover the name of the pod. They then need to edit the subsequent commands manually. That takes time and is error-prone! In addition, if the operator inadvertantly executes a command before editing, the < and > symbols become redirect operators which will probably result in undesirable effects on the local filesystem and potentially run commands in your production environment with completely unexpected inputs and unexpected arguments. Many runbooks contain commands with multiple pieces that the user must manually edit, and it is easy to miss a few when responding during a time-sensitive, high-stress incident. These issues can be mitigated by using shell variables directly in the run book. For example, the above could be written:

container_name=<some-fixed-string>
pod=$(kubectl get pod | awk '/service-name/{print $1}')
id=$(cmd1 -p "$pod" -c "$container_name" | awk '{print $3}')
cmd2 -p "$pod" -c "$container_name" "$id"

With the above format, the user only needs to make a single edit (filling in the container name) before copy/pasting the remaining commands into a shell. Further improvements can be made which make the runbook a bit uglier to read, but much more robust against copy/paste errors. For example:

container_name=$(cmd-to-generate-container-name) &&
pod=$(kubectl get pod | awk '/service-name/{print $1}' | grep .) &&
id=$(cmd1 -p "${pod:?}" -c "${container_name:?}" | awk '{print $3}' | grep .) &&
cmd2 -p "${pod:?}" -c "${container_name:?}" "${id:?}"

This last example is a bit ugly, but the purpose of code snippets is easy copy/paste; if the code becomes unreadable the correct fix is to create better tooling. For example, a domain specific tool with a name like get-pod-name which behaves as desired when $service is empty or null so that the kubectl get pod | awk | grep pipeline can be replaced in the runbook with pod=$(get-pod-name $service). Here, we’ve used the && connector so that a human reader can copy the entire snippet into the clipboard only once. This is convenient because many web environments include a button to copy the entire snippet. If no separator is given, it’s unclear if the commands will be executed with no separator (in which case subsequent commands will not be executed but merely passed to the first command as arguments) or a newline (in which case subsequent commands will be executed unconditionally), so it is preferrable to give an explicit separator. If you want the reader to stop after a particular command, put it in a different snippet so they don’t have to select text with a pointer device but can take advantage of a ‘copy’ button provided by the web interface.

We should strive to make the code snippets in a run-book copy/pasteable with zero modification. The awk | grep pipeline is generally an anti-pattern, but using grep to produce a status code is preferrable to generating a return status with awk. You may be tempted use PIPESTATUS to get the exit status of the prior command, but this restricts the user to using a shell which supports that and it is generally wise to avoid shell specific constructs in your runbook. You may like bash, but the person executing the commands in the runbook may be using zsh, or perhaps they are forced to run the commands in an environment that has a minimal shell. A better approach is to fix the tool so that the output of cmd1 does not need to be piped to a filter. Similarly, you can often pass flags to kubectl so that no post-processing is necessary.

Notice that none of the given examples contain any prompts. This is intentional. You will often see runbooks with snippets that contain prompts like:

bastion-us-east-1$ kubectl ...

to convey the machine on which the command is expected to be run, but doing this forces the user to edit the commands and renders the snippet less useful. If it is important to tell the reader where to run the command, either write that information outside the code snippet or include it as a comment in the snippet. Also, it is common practice to prefix the command with a comment character like:

# kubectl ...

so that the reader can paste the snippet without fear of accidental execution. I argue against that practice for the same reason as above; it forces a manual edit. Also, someone reading that snippet may interpret your comment character as a prompt indicating that the command should be executed with root privileges. If you safeguard your snippet using ${var:?} syntax and proper quotes, adding such comments should not be necessary. Your commands should be idempotent and safe to execute at any time. If the command you are suggesting is potentially unsafe, fix your tooling so that it is safe! This is not always possible, but if the operator is nervous about a command they can always type the comment character before pasting. Doing that is much simpler than editing after a paste.

In short, use shell variables and portable shell syntax in your runbook code snippets so that an operator can easily copy/paste the snippet directly into a shell without needing to make modifications. Make your code idempotent. Always remember that the person reading your runbook may be simultaneously handling multiple incidents at 0330 in the morning with very little sleep. It is very easy to make mistakes in that situation, and you should strive to decrease friction as much as possible.