oc delete -n openshift-operators patterns/<pattern-name>Experimental support for pattern uninstall
We are excited to announce that uninstalling patterns are now experimentally supported from pattern operator version 0.0.65 To initiate the uninstallation, only the pattern cr needs to be deleted.
Initial approach
When we initialy started to work on supporting pattern uninstall, the naive approach was to remove the app of apps, and let argocd handle and clean up the rest. Unfortunatelly this path was unfeasable for two reasons: some long outstanding issues in argocd regarding application health checks, here and here and the hub-spoke architecture with acm policies.
Gory implementation details
We decided to go with a more fine grained solution, and do phases:
Delete spoke child applications
Delete spoke app of apps
Delete hub child applications
Delete hub app of apps
The main control is in the operator, we pass the phase as a variable to clustergroup chart, and to the acm chart. In order to achive this, we made some changes to the clustergroup chart, to remove child applications when needed. In the acm chart we also needed some changes to support distinction between the phases in hub and spoke clusters. The resource removal is still managed by argocd. In prior operator versions the gitops subscription was owned by the pattern CR, we transfer the ownership to the patterns operator. The patterns operator, and cluster-wide OpenShift GitOps instance will remain installed.
To initiate the uninstallation, only the pattern cr needs to be deleted.
oc delete -n openshift-operators patterns/<pattern-name>Deletion Phase Flow
We store each phase in the pattern status deletionPhase, initialized as empty string.
Initial Phase
Right after deleting the pattern cr we set the deletionPhase to DeleteSpokeChildApps if we have acm hub resource or DeleteHubChildApps if we have no acm hub detected.
DeleteSpokeChildApps
When deleting spoke apps we update the clustergroup chart global.deletePattern variable to DeleteSpokeChildApps, this gets passed to the acm chart. The acm chart passes/changes global.deletePattern variable to DeleteChildApps, so the spoke clustergroup chart can start to remove child applications.
We check the spoke child applications from the operator, and once all are gone, we change the deletionPhase to DeleteSpoke.
DeleteSpoke
This will be passed through the clustergroup chart to acm where the application policy will be deleted, and because pruneObjectBehavior: DeleteIfCreated is set, the app of apps will be removed from the spoke(s).
We check the spoke app of apps from the operator, and once it is gone, we change the deletionPhase to DeleteHubChildApps.
DeleteHubChildApps
At this phase we detach all managed clusters from the hub cluster. We update the clustergroup chart global.deletePattern variable to DeleteChildApps so argocd can start to remove child apps from the hub.
We check the hub child applications from the operator, and once all are gone, we change the deletionPhase to DeleteHub.
Known limitations
We did our due diligence and tested the uninstallation of the mcg pattern with both hub and spoke clusters.
"By design, when OLM uninstalls an operator it does not remove any of the operator’s owned CRDs, APIServices, or CRs in order to prevent data loss."
With the above in mind, we only remove operators/subscriptions, and do not remove any crds. If the operator is installed to a namespace, its ClusterServiceVersion will be removed (when the namespace is deleted). However if its installed to a namespace that is not cleaned up (ie: openshift-operators) the csv will remain in the cluster.
