PLEASE NOTE: This document applies to an unreleased version of Crossplane. It is strongly recommended that you only use official releases of Crossplane, as unreleased versions are subject to changes and incompatibilities that will not be supported in the official releases.
If you are using an official release version of Crossplane, you should refer to the documentation for your specific version.
Documentation for other releases can be found by using the version selector in the top right of any doc page.Observability is crucial to Crossplane users; both those operating Crossplane and those using Crossplane to operate their infrastructure. Crossplane currently approaches observability via Kubernetes events and structured logs.
In short, a non-admin user and an admin user should both be able to debug any issues only by inspecting logs and events. There should be no need to rebuild the Crossplane binary or to reach out to a Crossplane developer.
A user should be able to:
A cluster admin should be able to:
Error reporting in the logs is mostly intended for consumption by Crossplane cluster admins. A cluster admin should be able to debug any issue by inspecting the logs, without needing to add more logs themselves or contact a Crossplane developer. This means that logs should contain:
Error reporting as Kubernetes events is primarily aimed toward end-users of
Crossplane who are not cluster admins. Crossplane typically runs as a Kubernetes
pod, and thus it is unlikely that most users of Crossplane will have access to
its logs. Events, on the other hand, are available as top-level Kubernetes
objects, and show up the objects they relate to when running kubectl describe
.
Events should be recorded in the following cases:
The events recorded in these cases can be thought of as forming an event log of things that happen for the resources that Crossplane manages. Each event should refer back to the relevant controller and resource, and use other fields of the Event kind as appropriate.
More details about examples of how to interact with events can be found in the guide to debugging an application cluster.
There are many ways to report errors, such as:
It can be confusing to figure out which one is appropriate in a given situation. This section will try to offer advice and a mindset that can be used to help make this decision.
Let’s set the context by listing the different user scenarios where error reporting may be consumed. Here are the typical scenarios as we imagine them:
The goal is to satisfy the users in all of the scenarios. We’ll refer to the scenarios by number.
The short version is: we should do whatever satisfies all of the scenarios. Logging and events are the recommendations for satisfying the scenarios, although they don’t cover scenario 2.
The longer version is:
As for the question of how to decide whether to log or not, we believe it helps to try to visualize which of the scenarios the error or information in question will be used for. We recommend starting with reporting as much information as possible, but with configurable runtime behavior so that, for example, debugging logs don’t show up in production normally.
For the question of what constitutes an error, errors should be actionable by a human. See the Dave Cheney article on this topic for some more discussion.
Crossplane provides two observability libraries as part of crossplane-runtime:
event
emits Kubernetes events.logging
produces structured logs. Refer to its package documentation for
additional context on its API choices.Keep the following in mind when using the above libraries:
main()
and plumb them down to where they’re needed.Reconciler
implementation should use its own logging.Logger
and
event.Recorder
. Implementations are strongly encouraged to default to using
logging.NewNopLogger()
and event.NewNopRecorder()
, and accept a functional
loggers and recorder via variadic options. See for example the managed
resource reconciler.controller
structured logging key. The controllers name
should be of the form controllertype/resourcekind
, for example
managed/cloudsqlinstance
or stacks/stackdefinition
. Controller names
should always be lowercase.Reconcile
method of the
Reconciler
implementation; not by functions called by Reconcile
. Author
the methods orchestrated by Reconcile
as if they were a library; prefer
surfacing useful information for the Reconciler
to log (for example by
wrapping errors) over plumbing loggers and event recorders down to
increasingly deeper layers of code.error
(e.g.
log.Debug("boom!, "error", err)
). Many logging implementations (including
Crossplane’s) add context like stack traces for this key.CamelCase
.Reconcile
returns) over logging logic flow. i.e. Prefer one log line
that reads “encountered an error fooing the bar” over two log lines that read
“about to foo the bar” and “encountered an error”. Recall that if the audience
is a developer debugging Crossplane they will be provided a stack trace with
file and line context when an error is logged.reconcile.Request
, and the resource’s UID and
resource version (not API version) under the keys request
, uid
, and
version
. Doing so allows log readers to determine what specific version of a
resource the log pertains to.Finally, when in doubt, aim for consistency with existing Crossplane controller implementations.