GKEPhalanxClusterService#

class phalanx.services.cluster.GKEPhalanxClusterService(kubernetes_storage)#

Bases: object

Kubernetes operations on a GKE Phalanx cluster.

These operations are not in the EnviromentService because:

  • They do not necessarily correspond 1-1 to a Phalanx enviroment.

  • They are often part of tasks that involve multiple clusters, so passing an explicit context is less error prone, and allows the operations to be automated without having to manually change the kubectl current context.

  • Some of the operations are specific to Google Cloud GKE clusters, like waiting for an associated cloud load balancer to be destroyed when converting a Service from LoadBalancer to ClusterIP.

For example, when recovering a GCP cluster from backups while the old cluster still exists, you will need to work with two clusters, both of which are associated with the same environment.

Parameters:

kubernetes_storage (KubernetesStorage) – Interface to direct Kubernetes object manipulation.

Methods Summary

pause_sasquatch_kafka_reconciliation()

Pause Strimzi reconciliation of the Sasquatch Kafka cluster.

release_service_ips()

Release all IPs on LoadBalancer Services for all Phalanx apps.

restore_service_ips()

Restore all IPs on LoadBalancer Services for all Phalanx apps.

resume_cronjobs()

Resume all CronJobs in all Phalanx apps.

resume_sasquatch_kafka_reconciliation()

Resume Strimzi reconciliation of the Sasquatch Kafka cluster.

scale_down_workloads()

Scale down (almost) all Phalanx workloads.

scale_up_workloads()

Scale up (almost) all Phalanx workloads to their previous counts.

suspend_cronjobs()

Suspend all CronJobs in all Phalanx apps.

Methods Documentation

pause_sasquatch_kafka_reconciliation()#

Pause Strimzi reconciliation of the Sasquatch Kafka cluster.

During some recovery operations, we want to modify resources that are managed by the Strimzi operator. The Strimzi operator will automatically revert any changes we make, so we have to pause Strimzi reconciliation if we want our changes to persist.

https://strimzi.io/docs/operators/latest/full/deploying#proc-pausing-reconciliation-str

Return type:

None

release_service_ips()#

Release all IPs on LoadBalancer Services for all Phalanx apps.

This will clear any spec.loadBalancerIPs, then change the Service type from LoadBalancer to ClusterIP, then back to LoadBalancer.

This can be used when recovering a Phalanx cluster to another GKE cluster while the old cluster is still running.

Returns:

A list of Services with refreshed IPs.

Return type:

list[Service]

Raises:

InvalidLoadBalancerServiceStateError – If the workload resource does not have exactly one of spec.loadBalancerIP or an annotation for the previous loadBalancerIP set.

restore_service_ips()#

Restore all IPs on LoadBalancer Services for all Phalanx apps.

This is only intended to be run after a release operation because it depends on an annotation set during that operation.

Raises:

InvalidLoadBalancerServiceStateError – If the workload resource does not have exactly one of spec.loadBalancerIP or an annotation for the previous loadBalancerIP set.

Return type:

list[Service]

resume_cronjobs()#

Resume all CronJobs in all Phalanx apps.

Return type:

None

resume_sasquatch_kafka_reconciliation()#

Resume Strimzi reconciliation of the Sasquatch Kafka cluster.

https://strimzi.io/docs/operators/latest/full/deploying#proc-pausing-reconciliation-str

Return type:

None

scale_down_workloads()#

Scale down (almost) all Phalanx workloads.

During cluster rebuilds when there is an old and a new cluster, certain workloads can’t be running in both clusters at the same time, or else state external to the cluster could be corrupted.

We’ll leave ArgoCD running so the scaled-down cluster can still be inspected.

Raises:

InvalidScaleStateError – If the workload resource has a previous replica count annotation, but its current replica count is not zero.

Return type:

None

scale_up_workloads()#

Scale up (almost) all Phalanx workloads to their previous counts.

This is only intended to be run after a scale down operation.

Raises:

InvalidScaleStateError – If the workload resource has a previous replica count annotation, but its current replica count is not zero.

Return type:

None

suspend_cronjobs()#

Suspend all CronJobs in all Phalanx apps.

Return type:

None