Ctrl+K
Logo image Logo image

Phalanx

Site Navigation

  • About
  • Developers
  • Administrators
  • Applications
  • Environments
  • Rubin docs

Site Navigation

  • About
  • Developers
  • Administrators
  • Applications
  • Environments
  • Rubin docs

Section Navigation

Cluster infrastructure

  • argocd — Kubernetes application manager
    • Argo CD architecture and notes
    • Upgrading Argo CD
    • Argo CD authentication
    • Argo CD Helm values reference
  • cert-manager — TLS certificate manager
    • Cert-manager architecture and notes
    • Bootstrapping cert-manager
    • Setting up Route 53 for cert-manager
    • Upgrading cert-manager
    • Cert-manager Helm values reference
  • ingress-nginx — Ingress controller
    • Upgrading ingress-nginx
    • TLS certificates
    • Ingress-nginx Helm values reference
  • gafaelfawr — Authentication & identity
    • Gafaelfawr architecture and notes
    • Configuring storage
    • Recreating Gafaelfawr service tokens
    • Releasing GitHub organization data
    • Troubleshooting
    • Gafaelfawr Helm values reference
  • postgres — In-cluster SQL store
    • Upgrading postgres
    • Adding a new database
    • Troubleshooting postgres
    • postgres Helm values reference
  • vault-secrets-operator — Vault to Kubernetes
    • Bootstrapping vault-secrets-operator
    • Upgrading vault-secrets-operator
    • vault-secrets-operator Helm values reference

Rubin Science Platform

  • cachemachine — JupyterLab image prepuller
    • Upgrading cachemachine
    • Image pruning
    • Updating the “recommended” JupyterLab image
    • Google Cloud Artifact Registry (GAR) integration
    • Cachemachine Helm values reference
  • datalinker — IVOA DataLink service
    • Datalinker Helm values reference
  • hips — HiPS tile server
    • hips Helm values reference
  • mobu — RSP integration testing
    • Configuring mobu
    • Managing mobu flocks
    • Mobu Helm values reference
  • moneypenny — RSP user provisioning
    • moneypenny Helm values reference
  • noteburst — Notebook execution-as-a-service
    • noteburst Helm values reference
  • nublado2 — JupyterHub for RSP
    • Upgrading nublado2
    • Troubleshooting nublado2
    • nublado2 Helm values reference
  • portal — Firefly-based RSP Portal
    • portal Helm values reference
  • semaphore — User notification
    • semaphore Helm values reference
  • sherlock — App ingress status and metrics
    • sherlock Helm values reference
  • squareone — RSP homepage
    • Squareone Helm values reference
  • tap — IVOA Table Access Protocol
    • tap architecture and notes
    • Upgrading tap
    • Update the TAP_SCHEMA table
    • tap Helm values reference
  • tap-schema — TAP schemas
    • tap-schema Helm values reference
  • times-square — Parameterized notebooks
    • times-square Helm values reference
  • vo-cutouts — IVOA SODA image cutouts
    • vo-cutouts Helm values reference

RSP+

  • alert-stream-broker
  • exposurelog — Exposure log API
    • Exposure log Helm values reference
  • narrativelog — Narrative observatory log
    • narrativelog Helm values reference
  • plot-navigator — Data production plot viewer
    • plot-navigator Helm values reference
  • production-tools — Data Production
    • production-tools Helm values reference
  • sasquatch — Observatory telemetry
    • sasquatch Helm values reference
  • strimzi — Strimzi for Alert Broker
  • strimzi-registry-operator — Schema registry for Alert Broker
  • telegraf — SQuaRE telemetry collection
    • telegraf Helm values reference
  • telegraf-ds — SQuaRE telemetry collection
    • telegraf-ds Helm values reference

Updating the “recommended” JupyterLab image#

The “recommended” tag for JupyterLab images is usually a recent weekly image. The image marked “recommended” is guaranteed by SQuaRE to be compatible with other services and materials–such as tutorial or system testing notebooks–that we make available on RSP deployments. Because this process requires quite a bit of checking and sign-off from multiple stakeholders, it is possible that approving a new version for “recommended” may take more than the two weeks (for most deployments) it takes for a weekly image to roll off the default list of images to pull. This can cause the RSP JupyterHub options form to display empty parentheses rather than the correct target version when a user requests a lab container.

This document explains how to circumvent that display bug by changing cachemachine’s values-<instance>.yaml for the appropriate instance when moving the “recommended” tag.

Tagging a new container version#

When a new version is to be approved (after passing through its prior QA and sign-off gates), the “recommended” tag must be updated to point to the new version.

This really is as simple as pulling the new target version, tagging it as recommended, and pushing it again. This is, sadly, necessary — there is no way to tag an image on Docker Hub without pulling and re-pushing it. However, the push will be a no-op, since all the layers are, by definition, already there, so while the pull may be slow, the push will be fast.

The procedure is as follows:

docker pull registry.hub.docker.com/lsstsqre/sciplat-lab:w_2021_33  # or whatever tag
docker tag registry.hub.docker.com/lsstsqre/sciplat-lab:w_2021_33 registry.hub.docker.com/lsstsqre/sciplat-lab:recommended
docker login  # This may require interaction, depending on how you've set up your docker credentials
docker push registry.hub.docker.com/lsstsqre/sciplat-lab:recommended

The DockerHub sqreadmin user could be used for this; however, when the process is not automated (it currently is not), using personal credentials is acceptible. The sqreadmin DockerHub credentials are within the SQuaRE 1Password credential store.

Updating Phalanx to ensure the “recommended” target is pre-pulled#

In most environments, cachemachine only ensures pulling of the latest two weekly images, and it is therefore not at all unusual for more than two weeks to go by before approving a new version.

Usually this doesn’t matter: the image cache on a node uses a Least Recently Used replacement strategy, and the great majority of users spawn “recommended,” so it’s not going to be purged. However, there is a display bug in the Notebook Aspect spawner form can occur. If a new node has come online after the recommended weekly has rolled out of the weekly list, then, although the new node will pre-pull “recommended”, it will not pre-pull the corresponding weekly by the weekly tag Cachemachine, and therefore the options form, will fail to resolve “recommended” to a particular weekly, which means the description in parentheses after the image name will be empty.

Fortunately, this is easy to fix.

In cachemachine’s values-<instance>.yaml file for the affected environment, go towards the bottom and look in repomen. The first entry will always be of type RubinRepoMan, and will contain the definitions of how many daily, weekly, and release images to prepull.

There are currently only two environments in which we care about keeping the “recommended” target pre-pulled:

  1. IDF Production (data.lsst.cloud)

  2. IDF Integration (data-int.lsst.cloud)

Beneath the RubinRepoMan entry, you should find an entry that looks like:

{
  "type": "SimpleRepoMan",
  "images": [
    {
      "image_url": "registry.hub.docker.com/lsstsqre/sciplat-lab:w_2021_33",
      "name": "Weekly 2021_33"
    }
  ]
}

Replace the tag and image name with the current approved versions.

If you are adding these definitions to an instance that does not already ensure that the target image for “recommended” is always prepulled, add an entry to the repomen list that looks like the above, with current approved versions.

Commit your changes to a git branch, and then create a GitHub pull request to services/cachemachine in Phalanx from that branch. Request that someone review the PR, and then merge it.

Then synchronize cachemachine (using Argo CD) in the correct environment. It is not generally required to wait for a maintenance window to do this, since making this change is low-risk. The cachemachine deployment will automatically restart, and that will kick off any required pulls. Since these pulls will just be pulling “recommended” under a different name, the image will almost certainly already be cached, and therefore the pull will be near-instant. Each pod that starts from the pulled image simply sleeps for one minute and then terminates. After each pod has run and terminated, the Notebook Aspect options form will again show the correct data.

previous

Image pruning

next

Google Cloud Artifact Registry (GAR) integration

On this page
  • Tagging a new container version
  • Updating Phalanx to ensure the “recommended” target is pre-pulled
Edit this page

© Copyright 2020-2022 Association of Universities for Research in Astronomy, Inc. (AURA).

Created using Sphinx 5.3.0.