Validated Patterns

Deploying

Prerequisites

Azure deployments

  1. Install an OpenShift 4.19.28+ Cluster on Azure

  2. Update the required Azure configuration and secrets in values-global.yaml, including the Azure service principal, DNS resource group, and cluster networking details.

  3. Fork the repository and clone it locally. ArgoCD reconciles against your fork, so all configuration changes must be committed and pushed.

  4. Run bash scripts/gen-secrets.sh to generate KBS key pairs, attestation policy seeds, and copy the values-secret template to ~/values-secret-coco-pattern.yaml. This script will not overwrite existing secrets.

  5. Run bash scripts/get-pcr.sh to retrieve PCR measurements from the peer-pod VM image. This stores the measurements at ~/.coco-pattern/measurements.json, which are loaded into Vault and used by the attestation service. Requires podman, skopeo, and a pull secret at ~/pull-secret.json.

  6. Review and customise ~/values-secret-coco-pattern.yaml. This file controls what secrets are loaded into Vault, including attestation policies, KBS key material, and PCR measurements. See the comments in values-secret.yaml.template for details on each field.

Bare metal deployments

  1. Install OpenShift 4.19.28+ on bare metal with Intel TDX or AMD SEV-SNP hardware

  2. Ensure BIOS/firmware is configured to enable TDX or SEV-SNP. For Intel TDX, consult the Intel TDX BIOS setup guide. For AMD SEV-SNP, consult the AMD SEV developer documentation and your hardware vendor’s TEE enablement procedures.

  3. For Intel TDX: Obtain an Intel PCS API key from Intel Trusted Services

  4. Fork the repository and clone it locally. ArgoCD reconciles against your fork, so all configuration changes must be committed and pushed.

  5. Run bash scripts/gen-secrets.sh to generate KBS key pairs and PCCS secrets (for Intel TDX)

  6. For bare metal, PCR measurements must be collected manually after the first boot. See the tested environments page for guidance on PCR collection for bare metal. Store the measurements at ~/.coco-pattern/measurements.json.

  7. Review and customise ~/values-secret-coco-pattern.yaml. For Intel TDX, uncomment the PCCS secrets section and provide your Intel PCS API key. See the comments in values-secret.yaml.template for details on each field.

Single cluster deployment

The single-cluster topology uses the simple clusterGroup. All components — Trustee, Vault, ACM, sandboxed containers, and workloads — are deployed on one cluster.

  1. Ensure main.clusterGroupName: simple is set in values-global.yaml

  2. ./pattern.sh make install

  3. Wait for the cluster to reboot all nodes. The sandboxed containers operator applies a MachineConfig update that triggers a rolling reboot. Monitor progress via the ArgoCD UI or oc get nodes.

  4. If the services do not come up, use the ArgoCD UI to triage potential timeouts. Peer-pod VMs may need to be restarted if they time out during initial provisioning.

Multi-cluster deployment

The multi-cluster topology separates the trusted zone (hub) from the untrusted workload zone (spoke). The hub cluster runs Trustee, Vault, and ACM. The spoke cluster runs the sandboxed containers operator and confidential workloads.

  1. Set main.clusterGroupName: trusted-hub in values-global.yaml

  2. Deploy the hub cluster: ./pattern.sh make install

  3. Wait for ACM (MultiClusterHub) to reach Running state on the hub cluster: oc get multiclusterhub -n open-cluster-management

  4. Provision a second OpenShift 4.17+ cluster on Azure for the spoke

  5. Import the spoke cluster into ACM with the label clusterGroup=spoke (see importing a cluster). ACM will automatically deploy the spoke clusterGroup applications to the imported cluster.

  6. The spoke cluster will install the sandboxed containers operator, deploy peer-pod infrastructure, and launch the sample workloads. Monitor progress in the ACM console or via ArgoCD on the spoke.

Bare metal deployment

The bare metal topology uses the baremetal clusterGroup for Intel TDX or AMD SEV-SNP hardware without GPUs.

  1. Set main.clusterGroupName: baremetal in values-global.yaml

  2. Run bash scripts/gen-secrets.sh to generate KBS key pairs and PCCS secrets (for Intel TDX)

  3. For Intel TDX: uncomment the PCCS secrets section in ~/values-secret-coco-pattern.yaml and provide your Intel PCS API key. AMD SEV-SNP deployments do not require PCCS.

  4. Review and customize ~/values-secret-coco-pattern.yaml with any additional secrets or attestation policy customizations

  5. ./pattern.sh make install

  6. Wait for the cluster to reboot nodes. The pattern applies MachineConfig updates for:

    • TDX/SEV-SNP kernel parameters (e.g., kvm_intel.tdx=1 for Intel TDX, kvm_amd.sev=1 for AMD SEV-SNP)

    • nohibernate kernel argument

    • vsock-loopback kernel module configuration + Monitor node reboot progress: oc get nodes or oc get mcp + Note: MCO-driven reboots during initial deployment may cause Vault secret loading to time out. If secrets fail to load after nodes finish rebooting (oc get mcp shows UPDATED=True), manually re-trigger secret loading by running ./pattern.sh make upgrade.

Bare metal support is currently tested on Single Node OpenShift (SNO) configurations. Multi-node bare metal clusters are expected to work but have not been validated.

Automatic hardware detection:

The pattern automatically detects and configures your TEE hardware:

  • Node Feature Discovery (NFD) labels nodes based on detected capabilities:

  • Intel TDX: intel.feature.node.kubernetes.io/tdx=true

  • AMD SEV-SNP: amd.feature.node.kubernetes.io/snp=true

  • HostPath Provisioner (HPP) provides persistent storage for bare metal deployments

  • RuntimeClass: The kata-cc RuntimeClass is created automatically, using kata-tdx or kata-snp handler based on detected hardware

  • Both kata-tdx and kata-snp RuntimeClasses are deployed; only the one matching your hardware will have schedulable nodes

  • Intel DCAP components (PCCS and QGS) deploy unconditionally but DaemonSets only schedule on Intel TDX nodes via NFD label selectors

Optional PCCS node pinning: For Intel TDX deployments, you can pin the PCCS service to a specific node by running bash scripts/get-pccs-node.sh and setting baremetal.pccs.nodeSelector in the baremetal chart values.

Bare metal GPU deployment (Technology Preview)

The baremetal-gpu topology extends the bare metal deployment with NVIDIA confidential GPU support (H100, H200, B100, B200). This topology works with both Intel TDX and AMD SEV-SNP as the host TEE platform.

  1. Set main.clusterGroupName: baremetal-gpu in values-global.yaml

  2. Run bash scripts/gen-secrets.sh to generate KBS keys and PCCS secrets

  3. For Intel TDX: uncomment the PCCS secrets in ~/values-secret-coco-pattern.yaml and provide your Intel PCS API key

  4. ./pattern.sh make install

  5. Wait for the cluster to reboot nodes. The GPU topology adds IOMMU configuration to the MachineConfig, which enables GPU passthrough to confidential VMs:

    • Intel: intel_iommu=on

    • AMD: amd_iommu=on + All nodes will reboot to apply these kernel parameters. + Note: MCO-driven reboots may cause Vault secret loading to time out. If needed, re-run ./pattern.sh make upgrade after nodes finish rebooting.

  6. Approve the GPU Operator install plan when it appears. The pattern uses installPlanApproval: Manual to ensure version control. Check for pending install plans: + [source,bash] ---- oc get installplan -n nvidia-gpu-operator ---- + Approve the install plan: + [source,bash] ---- oc patch installplan <install-plan-name> -n nvidia-gpu-operator \ --type merge -p '{"spec":{"approved":true}}' ----

The baremetal-gpu topology applies IOMMU MachineConfig to all nodes and triggers reboots even on clusters without GPUs. If you do not have GPUs, use the baremetal topology instead. The GPU workload (gpu-vectoradd) will remain in Pending state on systems without GPUs but is otherwise harmless.

GPU-specific components:

  • NVIDIA GPU Operator: Manages GPU drivers, device plugins, and confidential computing manager

  • Kata device plugin: Exposes GPUs as schedulable resources (nvidia.com/pgpu)

  • CC Manager: Enables confidential mode at the GPU firmware level

  • VFIO Manager: Binds GPUs to VFIO for passthrough to Kata VMs

  • RuntimeClass: kata-cc-nvidia-gpu is created for GPU-enabled confidential pods

GPU workload verification:

The pattern deploys a gpu-vectoradd sample workload that runs a CUDA vector addition inside a confidential container. Check the logs to verify GPU functionality:

oc logs -n gpu-workload deployment/gpu-vectoradd

Expected output should show successful CUDA execution and GPU device detection.

Updating PCR measurements

Platform Configuration Register (PCR) measurements are cryptographic hashes of the peer-pod VM’s boot components and runtime state. The attestation service verifies these measurements to ensure workloads are running in a genuine, unmodified TEE. When the peer-pod VM image is updated (for example, when upgrading OpenShift Sandboxed Containers), the PCR values change and must be refreshed.

When to update PCR measurements

Update PCR measurements after:

  • Upgrading the OpenShift Sandboxed Containers operator

  • Updating the peer-pod VM image (podvm-image configmap)

  • Changing kernel parameters or boot configuration in the peer-pod image

  • Applying security patches that modify the VM boot chain

Update workflow

  1. Retrieve new measurements: Run the PCR extraction script against the updated peer-pod image:

    bash scripts/get-pcr.sh

    This fetches the current peer-pod image from your cluster’s registry, extracts the measurements, and stores them at ~/.coco-pattern/measurements.json.

  2. Update Vault secrets: The measurements are loaded into Vault via values-secret-coco-pattern.yaml. If you previously deployed the pattern, update the pcrMeasurements field in your values-secret file with the new content from ~/.coco-pattern/measurements.json.

  3. Sync to the cluster: Push the updated values-secret file to refresh Vault:

    ./pattern.sh make upgrade

    This reloads secrets into Vault and triggers the External Secrets operator to sync the new PCR values to the trustee-operator-system namespace.

  4. Restart the attestation service: The attestation service caches policy configuration. Restart it to pick up the new PCR measurements:

    oc rollout restart deployment/kbs-deployment -n trustee-operator-system
  5. Verify: Deploy a test confidential pod or restart an existing one. Check the KBS logs to confirm successful attestation with the new measurements:

    oc logs -n trustee-operator-system -l app=kbs -f | grep "Attestation succeeded"

Troubleshooting attestation failures

If confidential pods fail to start or cannot retrieve secrets after an update:

  1. Check KBS logs for attestation errors:

    oc logs -n trustee-operator-system -l app=kbs --tail=100

    Look for messages like PCR mismatch or Attestation verification failed.

  2. Verify PCR measurements are loaded into Vault and synced to the cluster:

    # Check Vault has the measurements
    oc exec -n vault vault-0 -- vault kv get secret/hub/pcrMeasurements
    
    # Check the External Secret synced to the namespace
    oc get secret -n trustee-operator-system attestation-policy -o yaml
  3. Compare expected vs actual PCR values: The KBS logs show the PCR values presented by the peer-pod. Compare these against the expected values in ~/.coco-pattern/measurements.json. If they differ, re-run scripts/get-pcr.sh to ensure you extracted measurements from the correct image version.

  4. Confirm peer-pod image version: Verify the peer-pod is using the expected image:

    # Check the peer-pod configmap
    oc get configmap peer-pods-cm -n openshift-sandboxed-containers-operator -o yaml | grep podvm-image

If the PCR values still do not match after following these steps, the peer-pod VM image may have been modified outside the standard upgrade process, or hyperthreading/firmware settings may differ from the image used during PCR extraction.

Simple Confidential container tests

The pattern deploys some simple tests of CoCo with this pattern. A "Hello Openshift" (e.g. curl to return "Hello Openshift!") application has been deployed in three configurations:

  1. A vanilla kubernetes pod: oc get pods -n hello-openshift standard

  2. A confidential container with a strict policy: oc get pods -n hello-openshift secure

  3. A confidential container with a relaxed policy: oc get pods -n hello-openshift insecure-policy

In this case the insecure policy is designed to allow a user to be able to exec into the confidential container. Typically this is disabled by an immutable policy established at pod creation time.

Doing oc get pod -n hello-openshift secure -o yaml for either of the pods running a confidential container should show:

  • Azure deployments: runtimeClassName: kata-remote (peer-pod provisioned on Azure hypervisor)

  • Bare metal deployments: runtimeClassName: kata-cc (Kata container running on TDX/SEV-SNP hardware)

  • Bare metal GPU deployments: runtimeClassName: kata-cc-nvidia-gpu (GPU-enabled Kata container)

Azure-specific verification: Logging into the Azure portal once the pods have been provisioned will show that each confidential pod has its own Standard_DC2as_v5 virtual machine. These VMs are visible under the cluster’s resource group.

oc exec testing

In an OpenShift cluster without confidential containers, Role Based Access Control (RBAC) may be used to prevent users from using oc exec to access a container to mutate it. However:

  1. Cluster admins can always circumvent this capability

  2. Anyone logged into the node directly can also circumvent this capability

Confidential containers enforce this boundary at the hardware level, independent of RBAC. Running: oc exec -n hello-openshift -it secure — bash will result in a denial of access, irrespective of the user undertaking the action, including kubeadmin. The policy is baked into the pod at creation time and cannot be modified at runtime.

For comparison, oc exec -n hello-openshift -it standard — bash (the standard pod) and oc exec -n hello-openshift -it insecure-policy — bash (the CoCo pod with a relaxed policy) will both allow shell access.

Confidential Data Hub testing

Part of the CoCo VM is a component called the Confidential Data Hub (CDH), which simplifies access to the Trustee Key Broker Service (KBS) for end applications. The CDH runs inside the confidential VM and handles attestation transparently — applications simply make HTTP requests to a localhost endpoint.

Find out more about how the CDH and Trustee work together here.

trustee

The CDH presents to containers within the pod (only), via a localhost URL. The CoCo container with an insecure policy can be used for testing the behaviour, since it allows oc exec.

  • oc exec -n hello-openshift -it insecure-policy — bash to get a shell into a confidential container

  • Trustee’s configuration specifies the list of secrets which the KBS can access with the kbsSecretResources attribute. These are mapped to Vault paths (e.g. secret/data/hub/kbsres1).

  • Secrets within the CDH can be accessed (by default) at http://127.0.0.1:8006/cdh/resource/default/$K8S_SECRET/$K8S_SECRET_KEY.

  • In this case http://127.0.0.1:8006/cdh/resource/default/passphrase/passphrase by default will return a string which was randomly generated when the pattern was deployed.

  • To verify, compare the CDH output against the Vault-backed secret: oc get secrets -n trustee-operator-system passphrase -o yaml | yq '.data.passphrase' | base64 -d. The values should match.

  • Tailing the logs for the KBS container (e.g. oc logs -n trustee-operator-system -l app=kbs -f) shows the attestation evidence flowing from the CDH to the KBS, including TEE evidence validation.

kbs-access application

The kbs-access application is a web service deployed in the kbs-access namespace. It retrieves secrets from Trustee via the CDH and presents them through a web interface. This provides a convenient way to verify that the full attestation pipeline is working end-to-end without needing to exec into a pod.

Access the application via its OpenShift route: oc get route -n kbs-access.