cert-manager

Edit on GitHub /services/cert-manager
Type Helm
Namespace cert-manager

Overview

The cert-manager application is an installation of cert-manager from its Helm chart repository. It creates TLS certificates via Let’s Encrypt and automatically renews them.

This application is only deployed on clusters managed by SQuaRE. NCSA clusters use NCSA certificates issued via an internal process.

This application is configured to acquire a certificate for the domain name configured via the top-level fqdn key in the values.yaml file for a given environment. This certificate will be stored in the secret default-certificate in the cert-manager namespace. On clusters using cert-manager, nginx-ingress should be configured to use this certificate via this configuration snippet in its values.yaml file:

nginx-ingress:
  controller:
    extraArgs:
      default-ssl-certificate: cert-manager/default-certificate

Upgrading

Upgrading cert-manager is generally painless. The only custom configuration that we use is to tell the Helm chart to install the Custom Resource Definitions. Watch for changes that require updating ClusterIssuer or Certificate resources; those will require corresponding changes to the resources defined in /services/cert-manager.

Normally, it’s not necessary to explicitly test cert-manager after a routine upgrade. We will notice if the certificates expire, and have monitoring of the important ones. However, if you want to be sure that cert-manager is still working after an upgrade, delete the default-certificate secret in the cert-manager namespace. It should be recreated by cert-manager. (You may have to also delete the Certificate resource of the same name and let Argo CD re-create it to trigger this.) This may cause an outage for nginx-ingress since it is using this certificate, so you may want to be prepared to port-forward to get to the Argo CD UI in case something goes wrong.

Bootstrapping the application

There are currently two configuration options for cert-manager: the HTTP solver and the DNS solver. We are standardizing on the DNS solver for all environments for consistency. The advantage of the DNS solver is that it works behind firewalls and can provision certificates for environments not exposed to the Internet, such as the Tucson teststand.

The DNS solver uses an AWS service user with write access to Route 53 to answer Let’s Encrypt challenges. To configure it, add the following to the values.yaml file for an environment:

solver:
  route53:
    aws-access-key-id: AKIAQSJOS2SFLUEVXZDB
    hosted-zone: Z06873202D7WVTZUFOQ42
    vault-secret-path: "secret/k8s_operator/<cluster-name>/cert-manager"

replacing <cluster-name> with the FQDN of the cluster, corresponding to the root of the Vault secrets for that cluster. See vault-secrets-operator for more information.

This access key ID corresponds to the cert-manager-lsst-codes service user in AWS. The hosted zone is the tls.lsst.codes hosted zone, where all challenge responses will be created. To limit the scope of access in case of a compromise, this AWS service user does not have write access to the full lsst.codes domain. This AWS service user can be used for all Science Platform deployments in the lsst.codes domain. It is configured according to the cert-manager documentation for Route 53.

The secret key for this AWS access key must be stored in Vault as the cert-manager secret for that cluster. The Vault secret should look something like this:

data:
  aws-access-key-id: AKIAQSJOS2SFLUEVXZDB
  aws-secret-access-key: <secret>

The secret is stored in 1Password (search for cert-manager-lsst-codes).