Validated Patterns

Intel SGX protected Vault for Multicloud GitOps

Validation status:
Sandbox Sandbox
Links:

About the Intel SGX protected Vault for Multicloud GitOps pattern

Use case
  • Use a GitOps approach to manage hybrid and multi-cloud deployments across both public and private clouds.

  • Enable cross-cluster governance and application lifecycle management.

  • Protect a key component by using Intel Security Guard Extensions.

  • Securely manage secrets across the deployment.

    Based on the requirements of a specific implementation, certain details might differ. However, all validated patterns that are based on a portfolio architecture, generalize one or more successful deployments of a use case.

Background

Organizations are aiming to develop, deploy, and operate applications on an open hybrid cloud in a stable, simple, and secure way. This hybrid strategy includes multi-cloud deployments where workloads might be running on multiple clusters and on multiple clouds, private or public. This strategy requires an infrastructure-as-code approach: GitOps. GitOps uses Git repositories as a single source of truth to deliver infrastructure-as-code. Submitted code checks the continuous integration (CI) process, while the continuous delivery (CD) process checks and applies requirements for things like security, infrastructure-as-code, or any other boundaries set for the application framework. All changes to code are tracked, making updates easy while also providing version control should a rollback be needed. Moreover, organizations are looking for solutions that increase protection, which is possible using 5th Generation Intel Xeon Scalable Processors with running application code in a protected way using Intel Security Guard Extensions.

About the solution

This architecture covers hybrid and multi-cloud management with GitOps as shown in following figure. At a high level this requires a management hub, for DevOps and GitOps, and infrastructure that extends to one or more managed clusters running on private or public clouds. The automated infrastructure-as-code approach can manage the versioning of components and deploy according to the infrastructure-as-code configuration.

Benefits of Hybrid Multicloud management with GitOps:

  • Unify management across cloud environments.

  • Dynamic infrastructure security.

  • Infrastructural continuous delivery best practices.

Multicloud Architecture
Figure 1. Overview of the solution including the business drivers, management hub, and the clusters under management

In the following figure, logically, this solution can be viewed as being composed of an automation component, unified management including secrets management, and the clusters under management, all running on top of a user-chosen mixture of on-premise data centers and public clouds.

Logical Architecture
Figure 2. Logical diagram of Intel SGX protected hybrid multi-cloud management with GitOps

About the technology

The following technologies are used in this solution:

Red Hat OpenShift Platform

An enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, public cloud, and edge deployments. It delivers a complete application platform for both traditional and cloud-native applications, allowing them to run anywhere. OpenShift has a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. It also enables the use of external secret management systems, for example, HashiCorp Vault in this case, to add secrets in a protected way into the OpenShift platform.

Red Hat OpenShift GitOps

A declarative application continuous delivery tool for Kubernetes based on the ArgoCD project. Application definitions, configurations, and environments are declarative and version controlled in Git. It can automatically push the desired application state into a cluster, quickly find out if the application state is in sync with the desired state, and manage applications in multi-cluster environments.

Red Hat Advanced Cluster Management for Kubernetes

Controls clusters and applications from a single console, with built-in security policies. Extends the value of Red Hat OpenShift by deploying apps, managing multiple clusters, and enforcing policies across multiple clusters at scale.

Red Hat Ansible Automation Platform

Provides an enterprise framework for building and operating IT automation at scale across hybrid clouds including edge deployments. It enables users across an organization to create, share, and manage automation, from development and operations to security and network teams.

Node Feature Discovery Operator

Manages the detection of hardware features and configuration in an OpenShift Container Platform cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.

Intel® Device Plugins Operator

Is a collection of device plugins advertising Intel-specific hardware resources to the kubelet. Currently, the operator has basic support for the QAT, GPU, FPGA, SGX, DSA, IAA device plugins: it validates container image references and extends reported statuses.

Intel® Software Guard Extensions

Helps protect data in use via unique application isolation technology. Protect selected code and data from modification using hardened enclaves with Intel SGX.

Gramine Shielded Containers

Transforms a Docker image into a new image which includes the Gramine Library OS, manifest files, Intel SGX-related information, and executes the application inside an Intel SGX enclave using the Gramine Library OS.

Hashicorp Vault

Provides a protected centralized store for dynamic infrastructure and applications across clusters, including over low-trust networks between clouds and data centers.

This solution also uses a variety of observability tools including the Prometheus monitoring and Grafana dashboard that are integrated with OpenShift as well as components of the Observatorium meta-project which includes Thanos and the Loki API.

Overview of the architectures

The following figure provides a schematic diagram overview of the complete solution including both components and data flows.

Physical Architecture
Figure 3. Overview schematic diagram of the complete solution

Subsequent schematic diagrams provide details on:

  • Bootstrapping the management hub (Figure 4)

  • Hybrid multi-cloud GitOps (Figure 5)

  • Dynamic security management (Figure 6)

Bootstrapping the management hub

The following figure provides a schematic diagram showing the setup of the management hub using Ansible playbooks.

Schematic diagram of bootstrapping the management hub
Figure 4. Schematic diagram of bootstrapping the management hub
  • Set up the OpenShift Container Platform that hosts the Management Hub. The OpenShift installation program provides flexible ways to install OpenShift. An Ansible playbook starts the installation with necessary configurations.

  • Ansible playbooks deploy and configure Red Hat Advanced Cluster Management for Kubernetes and, later, other supporting components such as external secrets management, on top of the provisioned OpenShift cluster.

  • Another Ansible playbook installs HashiCorp Vault, a Red Hat partner product chosen for this solution that can be used to manage secrets for OpenShift clusters.

  • An Ansible playbook configures and installs the Openshift GitOps operator on the hub cluster. This deploys the Openshift GitOps instance to enable continuous delivery.

Hybrid Multicloud GitOps

The following figure provides a schematic diagram showing remaining activities associated with setting up the management hub and clusters using Red Hat Advanced Cluster Management.

Schematic diagram of hybrid multi-cloud management with GitOps
Figure 5. Schematic diagram of hybrid multi-cloud management with GitOps
  • Manifest and configuration are set as code template in the form of a Kustomization YAML file. The file describes the desired end state of the managed cluster. When complete, the Kustomization YAML file is pushed into the source control management repository with a version assigned to each update.

  • OpenShift GitOps monitors the repository and detects changes in the repository.

  • OpenShift GitOps creates and updates the manifest by creating Kubernetes objects on top of Red Hat Advanced Cluster Management.

  • Red Hat Advanced Cluster Management provisions, updates, or deletes managed clusters and configuration according to the manifest. In the manifest, you can configure what cloud provider the cluster will be on, the name of the cluster, infrastructure node details and worker node. Governance policy can also be applied as well as provision an agent in the cluster as the bridge between the control center and the managed cluster.

  • OpenShift GitOps continuously monitors the code repository and the status of the clusters reported back to Red Hat Advanced Cluster Management. Any configuration drift or in case of any failure, OpenShift GitOps will automatically try to remediate by applying the manifest or by displaying alerts for manual intervention.

Dynamic security management

The following figure provides a schematic diagram showing how secrets are handled in this solution.

Schematic showing the setup and use of external secrets management
Figure 6. Schematic showing the setup and use of external secrets management
  • During setup, the token to access HashiCorp Vault in a protected way is stored in Ansible Vault. It is encrypted to protect sensitive content.

  • Red Hat Advanced Cluster Management for Kubernetes acquires the token from Ansible Vault during install and distributes it among the clusters. As a result, you have centralized control over the managed clusters through RHACM.

  • To allow the cluster access to the external vault, you must set up the external secret management with Helm in this study. OpenShift Gitops is used to deploy the external secret object to a managed cluster.

  • External secret management fetches secrets from HashiCorp Vault by using the token that was generated in step 2 and constantly monitors for updates.

  • Secrets are created in each namespace, where applications can use them.

Additional resources

View and download all of the diagrams above from the Red Hat Portfolio Architecture open source tooling site.

Intel SGX protected Vault for Multicloud GitOps pattern

The basic Multicloud GitOps pattern has been extended to highlight the 5th Generation Intel Xeon Scalable Processors capabilities, by running one of the components in Intel SGX memory enclave. This cutting-edge technology allows them to run applications in a way that increases security and prevents malicious actors from accessing the RAM of running applications. Also, it is very important not only on-premise but especially when running in the cloud environment.

The effort to start using Intel Security Guard Extensions is minimal as no changes to the code of the existing application are required. Tools like Gramine Shielded Containers allow a lift-and-shift approach and convert existing container images to their protected version which runs existing applications in an encrypted enclave.

The basic pattern has been improved by protecting the Vault component, which is a crucial component of MultiCloud GitOps pattern. The Vault component is run in a trusted execution environment.

Since protected Vault using Intel SGX must be running on the node with CPU supporting Intel SGX, Node Feature Discovery Operator (NFD) and Intel Device Plugins Operator (IDP) are deployed as a part of this pattern.

NFD manages the detection of hardware features and configuration in an OpenShift Container Platform cluster. It labels the nodes with hardware-specific information. It is the requirement for IDP installation. IDP advertises Intel-specific hardware resources to the cluster.

In the logs of vault-0 pod there is information that "Gramine is starting" and Vault is launched in encrypted RAM.

Vault was chosen because it is a common component of this pattern and it might be used in many places, so protecting it introduces even more benefits.

Sample logs from vault-0 pod:

Gramine is starting. Parsing TOML manifest file, this may take some time...
-----------------------------------------------------------------------------------------------------------------------
Gramine detected the following insecure configurations:

  - sgx.allowed_files = [ ... ]                (some files are passed through from untrusted host without verification)

Gramine will continue application execution, but this configuration must not be used in production!
-----------------------------------------------------------------------------------------------------------------------

Emulating a raw syscall instruction. This degrades performance, consider patching your application to use Gramine syscall API.
==> Vault server configuration:

Administrative Namespace:
             Api Address: http://10.128.1.233:8200
                     Cgo: disabled
         Cluster Address: https://vault-0.vault-internal:8201
   Environment Variables: GODEBUG, HOME, HOSTNAME, HOST_IP, LD_LIBRARY_PATH, NAME, PATH, POD_IP, SKIP_CHOWN, SKIP_SETCAP, VAULT_ADDR, VAULT_API_ADDR, VAULT_CACERT, VAULT_CLUSTER_ADDR, VAULT_K8S_NAMESPACE, VAULT_K8
S_POD_NAME, VERSION, container
              Go Version: go1.21.4
              Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level:
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: file
                 Version: Vault v1.15.3, built 2023-11-22T20:59:54Z
             Version Sha: 25a4d1a00dc81a5b4907c76c2358f38d30c05747

==> Vault server started! Log data will stream in below:

...
2024-02-26T15:59:57.791Z [INFO]  events: Starting event system

This is a sample usage of the SGX application, as it is, and should not be used in production; unless the above warnings related to sgx.allowed_files related to Gramine are resolved.

Next steps