Skip to main content
Ctrl+K
Logo image Logo image

Phalanx

Site Navigation

  • About
  • Developers
  • Administrators
  • Applications
  • Environments
  • Rubin docs

Site Navigation

  • About
  • Developers
  • Administrators
  • Applications
  • Environments
  • Rubin docs

Section Navigation

Cluster infrastructure

  • argocd — Kubernetes application manager
    • Argo CD architecture and notes
    • Bootstrapping Argo CD
    • Argo CD authentication
    • Upgrading Argo CD
    • Troubleshooting argocd
    • Argo CD Helm values reference
  • cert-manager — TLS certificate manager
    • Bootstrapping cert-manager
    • Add TLS certificates for a new hostname
    • Setting up Route 53 for cert-manager
    • Upgrading cert-manager
    • Cert-manager architecture and notes
    • Cert-manager Helm values reference
  • ingress-nginx — Ingress controller
    • TLS certificates
    • Ingress-nginx Helm values reference
  • gafaelfawr — Authentication & identity
    • Bootstrapping Gafaelfawr
    • Recreating Gafaelfawr service tokens
    • Releasing GitHub organization data
    • Troubleshooting
    • Gafaelfawr Helm values reference
  • postgres — In-cluster SQL store
    • Adding a new database
    • Troubleshooting postgres
    • postgres Helm values reference
  • vault-secrets-operator — Vault to Kubernetes
    • Bootstrapping vault-secrets-operator
    • Upgrading vault-secrets-operator
    • vault-secrets-operator Helm values reference

Rubin Science Platform

  • cachemachine — JupyterLab image prepuller
    • Bootstrapping cachemachine
    • Image pruning
    • Updating the recommended Notebook Aspect image
    • Google Cloud Artifact Registry (GAR) integration
    • Cachemachine Helm values reference
  • datalinker — IVOA DataLink service
    • Datalinker Helm values reference
  • hips — HiPS tile server
    • hips Helm values reference
  • linters - automated chechking of DNS
    • Linters Helm values reference
  • livetap — IVOA livetap Table Access Protocol
    • LiveTAP architecture and notes
    • livetap Helm values reference
  • mobu — Integration testing
    • Configuring mobu
    • Managing mobu flocks
    • Mobu Helm values reference
  • moneypenny — User provisioning
    • moneypenny Helm values reference
  • noteburst — Notebook execution-as-a-service
    • noteburst Helm values reference
  • nublado — JupyterHub/JupyterLab for RSP
    • Bootstrapping Nublado
    • Upgrading Nublado
    • Troubleshooting nublado
    • nublado Helm values reference
  • nublado2 — JupyterHub for RSP
    • Bootstrapping Nublado
    • Upgrading Nublado
    • Troubleshooting nublado2
    • nublado2 Helm values reference
  • portal — Firefly-based RSP Portal
    • Bootstrapping Portal
    • portal Helm values reference
  • semaphore — User notification
    • semaphore Helm values reference
  • sherlock — App ingress status and metrics
    • sherlock Helm values reference
  • sqlproxy-cross-project — External Cloud SQL proxy
    • sqlproxy-cross-project Helm values reference
  • squareone — RSP homepage
    • Squareone architecture and notes
    • Bootstrapping Squareone
    • Squareone Helm values reference
  • ssotap — IVOA DP03 Solar System Table Access Protocol
    • ssotap Helm values reference
  • tap — IVOA Table Access Protocol
    • TAP architecture and notes
    • tap Helm values reference
  • tap-schema — TAP schemas
    • tap-schema architecture and notes
    • Upgrading tap-schema
    • tap-schema Helm values reference
  • times-square — Parameterized notebooks
    • times-square Helm values reference
  • vo-cutouts — IVOA SODA image cutouts
    • vo-cutouts Helm values reference

RSP+

  • argo-workflows — Argo workflows
    • argo-workflows Helm values reference
  • alert-stream-broker — Alert transmission to brokers
  • exposurelog — Exposure message log
    • Exposure log Helm values reference
  • narrativelog — Narrative observatory log
    • narrativelog Helm values reference
  • obsloctap — serve observing schedule
    • Helm values reference
  • plot-navigator — Data production plot viewer
    • plot-navigator Helm values reference
  • production-tools — Data Production monitoring
    • production-tools Helm values reference
  • sasquatch — Observatory telemetry
    • Creating a Sasquatch write token
    • sasquatch Helm values reference
  • strimzi — Kafka cluster manager
  • strimzi-access-operator — Strimzi user access
    • strimzi-access-operator Helm values reference
  • strimzi-registry-operator — Schema registry for Alert Broker
  • telegraf — Application telemetry collection
    • telegraf Helm values reference
  • telegraf-ds — Per-node telemetry collection
    • telegraf-ds Helm values reference

Roundtable

  • kubernetes-replicator - Cross-namespace resources
    • Kubernetes Helm values reference
  • squarebot — Kafka event gateway
    • Squarebot Helm values reference

Updating the recommended Notebook Aspect image#

The recommended tag for JupyterLab images is usually a recent weekly image. The image tagged recommended is guaranteed by SQuaRE to be compatible with other services and materials, such as tutorial or system testing notebooks, that we make available on RSP deployments.

Because this process requires quite a bit of checking and sign-off from multiple stakeholders, it is possible that approving a new recommended version may take more than the two weeks (for most deployments) it takes for a weekly image to roll off the default list of images to pull. This can cause the RSP JupyterHub options form to display empty parentheses rather than the correct target version when a user requests a lab container.

This document explains the process for moving the recommended tag, and how to circumvent that display bug by changing cachemachine’s values-<instance>.yaml for the appropriate instance when moving the recommended tag.

Tagging a new container version#

When a new version is to be approved (after passing through its prior QA and sign-off gates), the recommended tag must be updated to point to the new version.

To do this, run the GitHub retag workflow for the sciplat-lab repository, as follows:

  1. Go to the retag workflow page.

  2. Click on Run workflow.

  3. Enter the tag of the image to promote to recommended under Docker tag of input container. This will be a tag like w_2022_40.

  4. Enter recommended under Additional value to tag container with.

  5. Click on the Run workflow submit button.

Don’t change the URIs.

Ensure the recommended image is pre-pulled#

In most environments, cachemachine only prepulls the latest two weekly images. It is common for more than two weeks to go by before approving a new version of recommended. While the recommended tag is always prepulled, cachemachine cannot resolve that tag to a regular image tag unless the corresponding image tag is also prepulled. The result is a display bug where recommended is not resolved to a particular tag, and therefore is missing the information in parentheses after the Recommended menu option in the spawner form.

To avoid this, we therefore explicitly prepull the weekly tag corresponding to the recommended tag. This ensures that cachemachine can map the recommended tag to a weekly tag. This doesn’t consume any additional cache space on the nodes, since Kubernetes, when cachemachine tells it to cache that weekly tag, will realize that it already has it cached under another name.

We add this configuration to the IDF environments. Other Phalanx environments handle recommended images differently and don’t need this configuration.

In cachemachine’s values-<instance>.yaml file for the affected environment, go towards the bottom and look in repomen. The first entry will always be of type RubinRepoMan, and will contain the definitions of how many daily, weekly, and release images to prepull. Beneath the RubinRepoMan entry, you should find an entry that looks like:

{
  "type": "SimpleRepoMan",
  "images": [
    {
      "image_url": "registry.hub.docker.com/lsstsqre/sciplat-lab:w_2021_33",
      "name": "Weekly 2021_33"
    }
  ]
}

Replace the tag and name with the weekly tag and corresponding name for the weekly image that is also tagged recommended.

Once this change is merged, sync cachemachine (using Argo CD) in the affected environments. You do not have to wait for a maintenance window to do this, since the change is low risk, although it will result in a very brief outage for Notebook Aspect lab spawning while cachemachine is restarted.

cachemachine will then spawn a DaemonSet that pulls the weekly tag to every node, which as mentioned above will be fairly quick since Kubernetes will realize it already has the image cached under another name. Once cachemachine rechecks the cached images on each node, it will have enough information to build the menu correctly, and the spawner menu in the Notebook Aspect should be correct.

previous

Image pruning

next

Google Cloud Artifact Registry (GAR) integration

On this page
  • Tagging a new container version
  • Ensure the recommended image is pre-pulled
Edit this page

© Copyright 2020-2022 Association of Universities for Research in Astronomy, Inc. (AURA).

Built with the PyData Sphinx Theme 0.12.0.

Created using Sphinx 6.2.1.