Documentation

PLEASE NOTE: This document applies to an unreleased version of Crossplane. It is strongly recommended that you only use official releases of Crossplane, as unreleased versions are subject to changes and incompatibilities that will not be supported in the official releases.

If you are using an official release version of Crossplane, you should refer to the documentation for your specific version.

Documentation for other releases can be found by using the version selector in the top right of any doc page.

Observability Developer Guide

Introduction

Observability is crucial to Crossplane users; both those operating Crossplane and those using Crossplane to operate their infrastructure. Crossplane currently approaches observability via Kubernetes events and structured logs.

Goals

In short, a non-admin user and an admin user should both be able to debug any issues only by inspecting logs and events. There should be no need to rebuild the Crossplane binary or to reach out to a Crossplane developer.

A user should be able to:

A cluster admin should be able to:

Error reporting in the logs

Error reporting in the logs is mostly intended for consumption by Crossplane cluster admins. A cluster admin should be able to debug any issue by inspecting the logs, without needing to add more logs themselves or contact a Crossplane developer. This means that logs should contain:

Error reporting as events

Error reporting as Kubernetes events is primarily aimed toward end-users of Crossplane who are not cluster admins. Crossplane typically runs as a Kubernetes pod, and thus it is unlikely that most users of Crossplane will have access to its logs. Events, on the other hand, are available as top-level Kubernetes objects, and show up the objects they relate to when running kubectl describe.

Events should be recorded in the following cases:

The events recorded in these cases can be thought of as forming an event log of things that happen for the resources that Crossplane manages. Each event should refer back to the relevant controller and resource, and use other fields of the Event kind as appropriate.

More details about examples of how to interact with events can be found in the guide to debugging an application cluster.

Choosing between methods of error reporting

There are many ways to report errors, such as:

It can be confusing to figure out which one is appropriate in a given situation. This section will try to offer advice and a mindset that can be used to help make this decision.

Let’s set the context by listing the different user scenarios where error reporting may be consumed. Here are the typical scenarios as we imagine them:

  1. A person using a system needs to figure out why things aren’t working as expected, and whether they made a mistake that they can correct.
  2. A person operating a service needs to monitor the service’s health, both now and historically.
  3. A person debugging a problem which happened in a live environment (often an operator of the system) needs information to figure out what happened.
  4. A person developing the software wants to observe what is happening.
  5. A person debugging the software in a development environment (typically a developer of the system) wants to debug a problem (there is a lot of overlap between this and the live environment debugging scenario).

The goal is to satisfy the users in all of the scenarios. We’ll refer to the scenarios by number.

The short version is: we should do whatever satisfies all of the scenarios. Logging and events are the recommendations for satisfying the scenarios, although they don’t cover scenario 2.

The longer version is:

As for the question of how to decide whether to log or not, we believe it helps to try to visualize which of the scenarios the error or information in question will be used for. We recommend starting with reporting as much information as possible, but with configurable runtime behavior so that, for example, debugging logs don’t show up in production normally.

For the question of what constitutes an error, errors should be actionable by a human. See the Dave Cheney article on this topic for some more discussion.

In Practice

Crossplane provides two observability libraries as part of crossplane-runtime:

Keep the following in mind when using the above libraries:

Finally, when in doubt, aim for consistency with existing Crossplane controller implementations.