GKEPhalanxClusterService#

class phalanx.services.cluster.GKEPhalanxClusterService(kubernetes_storage)#

Bases: object

Kubernetes operations on a GKE Phalanx cluster.

These operations are not in the EnviromentService because:

They do not necessarily correspond 1-1 to a Phalanx enviroment.
They are often part of tasks that involve multiple clusters, so passing an explicit context is less error prone, and allows the operations to be automated without having to manually change the kubectl current context.
Some of the operations are specific to Google Cloud GKE clusters, like waiting for an associated cloud load balancer to be destroyed when converting a Service from LoadBalancer to ClusterIP.

For example, when recovering a GCP cluster from backups while the old cluster still exists, you will need to work with two clusters, both of which are associated with the same environment.

Parameters:: kubernetes_storage (KubernetesStorage) – Interface to direct Kubernetes object manipulation.

Methods Summary

`get_kafka_cluster_id`()	Get the Strimzi cluster id from a Kafka node PVC.
`get_phalanx_load_balancer_services`()	Get all Services of type LoadBalancer for all Phalanx apps.
`kube_version`()	Return the version of the kubectl client and kubernetes server.
`pause_kafka_reconciliation`()	Pause Strimzi reconciliation of the Sasquatch Kafka cluster.
`release_service_ips`()	Release all IPs on LoadBalancer Services for all Phalanx apps.
`restore_service_ips`()	Restore all IPs on LoadBalancer Services for all Phalanx apps.
`resume_cronjobs`()	Resume all CronJobs in all Phalanx apps.
`resume_kafka_reconciliation`()	Resume Strimzi reconciliation of the Sasquatch Kafka cluster.
`retain_pvs`()	Set the persistentVolumeReclaimPolicy to Retain for all PVs.
`scale_down_all`()	Scale down all Phalanx workloads except ArgoCD and crons.
`scale_down_workloads`()	Scale down (almost) all Phalanx workloads.
`scale_up_all`()	Scale up all Phalanx workloads and resume all crons.
`scale_up_workloads`()	Scale up (almost) all Phalanx workloads to their previous counts.
`set_kafka_cluster_id`(cluster_id)	Configure a Strimzi Kafka cluster with an explicitly specified ID.
`suspend_cronjobs`()	Suspend all CronJobs in all Phalanx apps.

Methods Documentation

get_kafka_cluster_id()#

Get the Strimzi cluster id from a Kafka node PVC.

When recovering a Strimzi Kafka cluster from backed-up PersistentVolumes, the cluster id generated from applying a Strimzi Kafka resource will not match the cluster id in the data in the recovered volumes. We need to get the cluster id from the data on the volume before we let strimzi reconcile the Kafka resource, or the Kafka pods will not be able to start.

Command is from the Strimzi recovery docs here: https://strimzi.io/docs/operators/latest/full/deploying#proc-cluster-recovery-volume-str

This method assumes a lot about what various resources are called.

Return type:: str

get_phalanx_load_balancer_services()#

Get all Services of type LoadBalancer for all Phalanx apps.

We say a Service is in a Phalanx app if it has an ArgoCD label.

Returns:: A list of all of the LoadBalancer Services provisioned by Phalanx apps.
Return type:: list[Service]

kube_version()#

Return the version of the kubectl client and kubernetes server.

Useful for checking that kubectl can connect to a cluster using a given context.

Return type:: str

pause_kafka_reconciliation()#

Pause Strimzi reconciliation of the Sasquatch Kafka cluster.

During some recovery operations, we want to modify resources that are managed by the Strimzi operator. The Strimzi operator will automatically revert any changes we make, so we have to pause Strimzi reconciliation if we want our changes to persist.

https://strimzi.io/docs/operators/latest/full/deploying#proc-pausing-reconciliation-str

Return type:: None

release_service_ips()#

Release all IPs on LoadBalancer Services for all Phalanx apps.

This will clear any spec.loadBalancerIPs, then change the Service type from LoadBalancer to ClusterIP, then back to LoadBalancer.

This can be used when recovering a Phalanx cluster to another GKE cluster while the old cluster is still running.

Returns:

A list of Services with refreshed IPs.

Return type:

list[Service]

Raises:

InvalidLoadBalancerServiceStateError – If the service resource does not have exactly one of spec.loadBalancerIP or an annotation for the previous loadBalancerIP set.
ServiceMissingTrafficPolicyError – If the service resource does not have an externalTrafficPolicy.

restore_service_ips()#

Restore all IPs on LoadBalancer Services for all Phalanx apps.

This is only intended to be run after a release operation because it depends on an annotation set during that operation.

Returns:: A list of Services with restored IPs.
Return type:: list[Service]
Raises:: InvalidLoadBalancerServiceStateError – If the service resource does not have exactly one of spec.loadBalancerIP or an annotation for the previous loadBalancerIP set.

resume_cronjobs()#

Resume all CronJobs in all Phalanx apps.

Return type:: None

resume_kafka_reconciliation()#

Resume Strimzi reconciliation of the Sasquatch Kafka cluster.

https://strimzi.io/docs/operators/latest/full/deploying#proc-pausing-reconciliation-str

Return type:: None

retain_pvs()#

Set the persistentVolumeReclaimPolicy to Retain for all PVs.

Return type:: None

scale_down_all()#

Scale down all Phalanx workloads except ArgoCD and crons.

Return type:: None

scale_down_workloads()#

Scale down (almost) all Phalanx workloads.

During cluster rebuilds when there is an old and a new cluster, certain workloads can’t be running in both clusters at the same time, or else state external to the cluster could be corrupted.

We’ll leave ArgoCD running so the scaled-down cluster can still be inspected.

Return type:: None

scale_up_all()#

Scale up all Phalanx workloads and resume all crons.

Return type:: None

scale_up_workloads()#

Scale up (almost) all Phalanx workloads to their previous counts.

This is only intended to be run after a scale down operation.

Return type:: None

set_kafka_cluster_id(cluster_id)#

Configure a Strimzi Kafka cluster with an explicitly specified ID.

When we’re recovering a Strimzi Kafka cluster from backed-up persistent storage, We need to manually change the cluster ID to match the original cluster ID.

For more info on Strimzi Kafka cluster recovery, see: https://strimzi.io/docs/operators/latest/deploying#assembly-cluster-recovery-volume-str

Parameters:: cluster_id (str) – The new cluster ID for the Kafka cluster.
Return type:: None

suspend_cronjobs()#

Suspend all CronJobs in all Phalanx apps.

Return type:: None