Ansible Edge GitOps
Validated Pattern: Ansible Edge GitOps
Ansible Edge GitOps: The Why and What
As we have been working on new validated patterns and the pattern framework, we have seen a need and interest from the community in expanding the use cases covered by the framework to include other parts of the portfolio besides OpenShift. We understand the Edge computing environments are very complex, and while OpenShift may be the right choice for some Edge environments, it will not be feasible or practical for all of them. Can other environments besides Kubernetes-native ones benefit from GitOps? If so, what would those look like? This pattern works to answer those questions.
GitOps is currently a hot topic in technology. It is a natural outgrowth of the Kubernetes approach in particular, and is informed by now decades of practice in managing large fleets of systems. But is GitOps a concept that is only for Kubernetes? Or can we use the techniques and patterns of GitOps in other systems as well? We believe that by applying specific practices and techniques to Ansible code, and using a Git repo as the authoritative source for configuration results, that we can do exactly that.
One of the first problems we knew we would have to solve in developing this pattern was to work out how to model an Edge environment that was running Virtual Machines. We started with the assumption that we were going to use the Ansible Automation Platform Operator for OpenShift to manage these VMs. But how should we run the VMs themselves?
It is certainly possible to use the different public cloud offerings to spin up instances within the clouds, but that would require a lot of maintenance to the pattern over the long haul to pay attention to different image types and to address any changes to the provisioning schemes the different clouds might make. Additionally, since the purpose of including VMs in this pattern is to model an Edge environment, modeling them as ordinary public cloud instances might seem odd. As a practical matter, the pattern user would have to keep track of the instances and spin them down when spinning down the pattern.
To begin solving these problems, this pattern introduces OpenShift Virtualization to the pattern framework. While OpenShift Virtualization today supports AWS and on-prem baremetal clusters, we hope that it will also bring support to GCP and Azure in the not too distant future. The use of OpenShift Virtualization enables the simulated Edge environment to be modeled entirely in a single cluster, and any instances will be destroyed along with the cluster.
The pattern itself focuses on the installation of a containerized application (Inductive Automation Ignition) on simulated kiosks running RHEL 8 in kiosk mode. This installation pattern is based on work Red Hat did with a customer in the Petrochemical industry.
Highlight: Imperative and Declarative Automation, and GitOps
The validated patterns framework has been committed to GitOps as a philosophy and operational practice since the beginning. The framework’s use of ArgoCD as a mechanism for deploying applications and components is proof of our commitment to GitOps core principles of having a declared desired end state, and a designated agent to bring about that end state.
Many decades of automation practice that focus on individual OS instances (whether they be virtual machines or baremetal) may lead us to believe that the only way to manage such instances is imperatively - that is, focusing on the steps required to configure a machine to the state you want it to be in as opposed to the actual end state you want.
By way of example, consider a situation where you want an individual OS instance to synchronize its time to a source you specify. The imperative way to do this would be to write a script that does some or all of the following:
- Install the software that manages system time synchronization
- Write a configuration file for the service that specifies the time source in question
- If the configuration file or other configuration mechanism that influences the service has changed, restart the time synchronization service.
Along the way, there are subtle differences between different operating systems, such as the name of the time synchronization package (ntp
or chrony
, for example); differences in which package manager to use; differences in configuration file formats; differences in service names. It is all rather a lot to consider, and the kinds of scripts that managed these sorts of things at scale, when written in Shell or Perl, could get quite convoluted.
Meanwhile, would it not be great if we could put the focus on end state, instead of on the steps required to get to that end state? So we could specify what we want, and we could trust the framework to “make it so” for us? Languages that have this capability rose to the forefront of IT consciousness this century and became wildly popular - languages like Puppet, Chef, Salt and, of course, Ansible. (And yes, they all owe quite a lot to CFEngine, which has been around, and is still around.) The development and practices that grew up around these languages significantly influenced Kubernetes and its development in turn.
Because these languages all provide a kind of hybrid model, they all have mechanisms that allow you to violate one or more of the core tenets of GitOps. For example, while many people run their configuration management code from a Git repository, none of these languages specifically require that, and all provide mechanisms to run in an ad-hoc mode. And yet, all of these languages have a fairly strong declarative flavor that can be used to specify configurations with them; again, this is not mandatory, but it is still quite common. So maybe there is a way to apply the stricter definitions of GitOps to these languages and include their use in a larger GitOps system.
Even within Kubernetes, where have clearly have first-class support for declarative systems, there are aspects of configuration that we may want to make deterministic, but not explicitly code into a git repository. For example, best practice for cluster availability is to spread a worker pool over three different availability zones in a public cloud region. Which three zones should they be? Those decisions are bound to the region the cluster is installed in. Do the AZs themselves really matter, or is the only important constraint that there be three? These kinds of things are state that matters to operators, and an imperative framework for dealing with questions like this can vastly simplify the task of cluster administrators, who can then use this automation to create clusters in multiple regions and clouds and trust that resources will be optimized for maximum availability.
Another crucial point of Declarative and Imperative systems is that it is impossible to conceive of a declarative system that does not have or require reconciliation loops. These reconciliation loops are by definition imperative processes. They often have additional conventions that apply - for example the convention that in Kubernetes Operators the reconciliation loop will change one thing, and then retry - but those processes are still inherently imperative.
A final crucial point on Declarative and Imperative systems is that, especially when we are talking about Edge installations, many of the systems that are important parts of those ecosystems do not have the same level of support for declarative-style configuration management that server operating systems and layers like Kubernetes have. Here we consider crucial elements of Edge environments like routers, switches, access points, and other network gear; as we consider IoT sensors like IP Cameras, it seems unlikely that we will have a Kubernetes-native way to manage devices like these in the foreseeable future.
With these points in mind, it seems that if we cannot bring devices to GitOps, perhaps we should bring GitOps to devices. Ansible has long been recognized for its ability to orchestrate and manage devices in an agentless model. Is there a way to run Ansible that we can recognize as a GitOps mechanism? We believe that there is, by using it with the Ansible Automation Platform components (formerly known as Tower), and recording the desired state in the git repository or repositories that the system uses. In doing so, we believe that we can and should bring GitOps to Edge environments.
Highlight: Including Ansible in Validated Patterns
A new element of this pattern is the use of the Ansible Automation Platform Operator, which we install in the hub cluster of the pattern.
The Ansible Automation Platform Operator is the Kubernetes-native way of running AAP. It provides the Controller function, which supports Execution Environments. The pattern provides its own Execution Environment (with the definition files, so that you can see what is in it or customize it if you like), and loads its own Ansible content into the AAP instance. It uses a dynamic inventory technique to deal with certain aspects of running the VMs it manages under Kubernetes.
The key function of AAP in this pattern is to configure and manage the kiosks. The included content takes the fresh templates, registers them to the Red Hat CDN, installs Firefox, configures kiosk mode, and then downloads and manages the Ignition application container so that both firefox and the application container start at boot time.
The playbook that configures the kiosks is configured to run every 10 minutes on all kiosks, so that if there is some temporary error on the kiosk the configuration will simply attempt configuration again when the schedule tells it to.
Highlight: Including Virtualization in Validated Patterns
As discussed above, another key element in this pattern is the introduction of of OpenShift Virtualization to model the Edge environment with kiosks. The pattern installs the OpenShift Virtualization operator, configures it, and provisions a metal node in order to run the virtual machines. It is possible to emulate hardware acceleration, but the resulting VMs have terrible performance.
The virtual machines we build as part of this pattern are x86_64 RHEL machines, but it should be straightforward to extend this pattern to model other architectures, or other operating systems or versions.
The chart used to define the virtual machines is designed to be open and flexible - replacing the values.yaml file in the chart’s directory will allow you to define different kinds of virtual machine sets; the chart may give you some ideas on how to manage virtual machines under OpenShift in a GitOps way.
Highlight: Including RHEL in Validated Patterns
One of the highlights of this pattern is the use of RHEL in it. There are a number of interesting developments in RHEL that we have been working on, and we expect to highlight more of these in future patterns. We expect that this pattern will be the basis for future patterns that include RHEL, Ansible, and/or Virtualization.
Where do we go from here?
We believe this pattern breaks some new and interesting ground in bringing Ansible, Virtualization, and RHEL to the validated pattern framework. Like all of our patterns, this pattern is Open Source, and we encourage you to use it, tinker with it, and submit your ideas, changes and fixes.
Documentation for how to install the pattern is here, where there are detailed installation instructions, more technical details on the different components in the pattern (especially the use of AAP and OpenShift Virtualization), and some ideas for customization.