Use variables in runbooks
Runbooks are a common documentation format used to instruct people to perform specific tasks. It is quite common to see instructions such as:
kubectl get pod | grep <service-name> # get the name of the pod
cmd1 -p <pod> -c <container name> # get id from 3rd column
cmd2 -p <pod> -c <container name> <id> # perform some operation
This is a terrible practice!!
The commands in the runbook are typically intended to be copy/pasted
by a human into a shell for execution. If they are written as
above, the human needs to expend cognitive effort to parse the
ouptut of get pod
and discover the name of the pod. They then
need to edit the subsequent commands manually. That takes time and
is error-prone! In addition, if the operator inadvertantly executes
a command before editing, the <
and >
symbols become redirect
operators which will probably result in undesirable effects on the
local filesystem and potentially run commands in your production
environment with completely unexpected inputs and unexpected
arguments. Many runbooks contain commands with multiple pieces
that the user must manually edit, and it is easy to miss a few when
responding during a time-sensitive, high-stress incident. These
issues can be mitigated by using shell variables directly in the
run book. For example, the above could be written:
container_name=<some-fixed-string>
pod=$(kubectl get pod | awk '/service-name/{print $1}')
id=$(cmd1 -p $pod -c $container_name | awk '{print $3}')
cmd2 -p $pod -c $container_name $id
With the above format, the user only needs to make a single edit (filling in the container name) before copy/pasting the remaining commands into a shell. Further improvements can be made which make the runbook a bit uglier to read, but much more robust against copy/paste errors. For example:
container_name=$(cmd-to-generate-container-name) &&
pod=$(kubectl get pod | awk '/service-name/{print $1}' | grep .) &&
id=$(cmd1 -p "${pod:?}" -c "${container_name:?}" | awk '{print $3}' | grep .) &&
cmd2 -p "${pod:?}" -c "${container_name:?}" "${id:?}"
This last example is a bit ugly, but the purpose of code
snippets is easy copy/paste; if the code becomes unreadable the
correct fix is to create better tooling. For example, a domain
specific tool with a name like get-pod-name
which behaves as
desired when $service
is empty or null so that the kubectl get
pod | awk | grep
pipeline can be replaced in the runbook with
pod=$(get-pod-name $service)
. Here, we’ve used the &&
connector
so that a human reader can copy the entire snippet into the clipboard
only once. This is convenient because many web environments include
a button to copy the entire snippet. If no separator is given,
it’s unclear if the commands will be executed with no separator
(in which case subsequent commands will not be executed but merely
passed to the first command as arguments) or a newline (in which
case subsequent commands will be executed unconditionally), so it
is preferrable to give an explicit separator. If you want the
reader to stop after a particular command, put it in a different
snippet so they don’t have to select text with a pointer device but
can take advantage of a ‘copy’ button provided by the web interface.
We should strive to make the code snippets in a run-book copy/pasteable
with zero modification. The awk | grep
pipeline is generally
an anti-pattern, but using grep
to produce a status code is
preferrable to generating a return status with awk
. You may be
tempted use PIPESTATUS to get the exit status of the prior command,
but this restricts the user to using a shell which supports that
and it is generally wise to avoid shell specific constructs in your
runbook. You may like bash
, but the person executing the commands
in the runbook may be using zsh
, or perhaps they are forced to
run the commands in an environment that has a minimal shell. A
better approach is to fix the tool so that the output of cmd1
does not need to be piped to a filter. Similarly, you can often
pass flags to kubectl
so that no post-processing is necessary.
Notice that none of the given examples contain any prompts. This is intentional. You will often see runbooks with snippets that contain prompts like:
bastion-us-east-1$ kubectl ...
to convey the machine on which the command is expected to be run, but doing this forces the user to edit the commands and renders the snippet less useful. If it is important to tell the reader where to run the command, either write that information outside the code snippet or include it as a comment in the snippet. Also, it is common practice to prefix the command with a comment character like:
# kubectl ...
so that the reader can paste the snippet without fear of accidental
execution. I argue against that practice for the same reason as
above; it forces a manual edit. Also, someone reading that snippet
may interpret your comment character as a prompt indicating that
the command should be executed with root privileges. If you safeguard
your snippet using ${var:?}
syntax and proper quotes, adding such
comments should not be necessary. Your commands should be idempotent
and safe to execute at any time. If the command you are suggesting is
potentially unsafe, fix your tooling so that it is safe! This is not always
possible, but if the operator is nervous about a command they can
always type the comment character before pasting. Doing that is
much simpler than editing after a paste.
In short, use shell variables and portable shell syntax in your runbook code snippets so that an operator can easily copy/paste the snippet directly into a shell without needing to make modifications. Make your code idempotent. Always remember that the person reading your runbook may be simultaneously handling multiple incidents at 0330 in the morning with very little sleep. It is very easy to make mistakes in that situation, and you should strive to decrease friction as much as possible.