Configuring managed clusters with policies and PolicyGenTemplate resources

Applied policy custom resources (CRs) configure the managed clusters that you provision. You can customize how Red Hat Advanced Cluster Management (RHACM) uses PolicyGenTemplate CRs to generate the applied policy CRs.

About the PolicyGenTemplate CRD

The PolicyGenTemplate custom resource definition (CRD) tells the PolicyGen policy generator what custom resources (CRs) to include in the cluster configuration, how to combine the CRs into the generated policies, and what items in those CRs need to be updated with overlay content.

The following example shows a PolicyGenTemplate CR (common-du-ranGen.yaml) extracted from the ztp-site-generate reference container. The common-du-ranGen.yaml file defines two Red Hat Advanced Cluster Management (RHACM) policies. The polices manage a collection of configuration CRs, one for each unique value of policyName in the CR. common-du-ranGen.yaml creates a single placement binding and a placement rule to bind the policies to clusters based on the labels listed in the bindingRules section.

Example PolicyGenTemplate CR - common-du-ranGen.yaml

  1. ---
  2. apiVersion: ran.openshift.io/v1
  3. kind: PolicyGenTemplate
  4. metadata:
  5. name: "common"
  6. namespace: "ztp-common"
  7. spec:
  8. bindingRules:
  9. common: "true" (1)
  10. sourceFiles: (2)
  11. - fileName: SriovSubscription.yaml
  12. policyName: "subscriptions-policy"
  13. - fileName: SriovSubscriptionNS.yaml
  14. policyName: "subscriptions-policy"
  15. - fileName: SriovSubscriptionOperGroup.yaml
  16. policyName: "subscriptions-policy"
  17. - fileName: SriovOperatorStatus.yaml
  18. policyName: "subscriptions-policy"
  19. - fileName: PtpSubscription.yaml
  20. policyName: "subscriptions-policy"
  21. - fileName: PtpSubscriptionNS.yaml
  22. policyName: "subscriptions-policy"
  23. - fileName: PtpSubscriptionOperGroup.yaml
  24. policyName: "subscriptions-policy"
  25. - fileName: PtpOperatorStatus.yaml
  26. policyName: "subscriptions-policy"
  27. - fileName: ClusterLogNS.yaml
  28. policyName: "subscriptions-policy"
  29. - fileName: ClusterLogOperGroup.yaml
  30. policyName: "subscriptions-policy"
  31. - fileName: ClusterLogSubscription.yaml
  32. policyName: "subscriptions-policy"
  33. - fileName: ClusterLogOperatorStatus.yaml
  34. policyName: "subscriptions-policy"
  35. - fileName: StorageNS.yaml
  36. policyName: "subscriptions-policy"
  37. - fileName: StorageOperGroup.yaml
  38. policyName: "subscriptions-policy"
  39. - fileName: StorageSubscription.yaml
  40. policyName: "subscriptions-policy"
  41. - fileName: StorageOperatorStatus.yaml
  42. policyName: "subscriptions-policy"
  43. - fileName: ReduceMonitoringFootprint.yaml
  44. policyName: "config-policy"
  45. - fileName: OperatorHub.yaml (3)
  46. policyName: "config-policy"
  47. - fileName: DefaultCatsrc.yaml (4)
  48. policyName: "config-policy" (5)
  49. metadata:
  50. name: redhat-operators
  51. spec:
  52. displayName: disconnected-redhat-operators
  53. image: registry.example.com:5000/disconnected-redhat-operators/disconnected-redhat-operator-index:v4.9
  54. - fileName: DisconnectedICSP.yaml
  55. policyName: "config-policy"
  56. spec:
  57. repositoryDigestMirrors:
  58. - mirrors:
  59. - registry.example.com:5000
  60. source: registry.redhat.io
1common: “true” applies the policies to all clusters with this label.
2Files listed under sourceFiles create the Operator policies for installed clusters.
3OperatorHub.yaml configures the OperatorHub for the disconnected registry.
4DefaultCatsrc.yaml configures the catalog source for the disconnected registry.
5policyName: “config-policy” configures Operator subscriptions. The OperatorHub CR disables the default and this CR replaces redhat-operators with a CatalogSource CR that points to the disconnected registry.

A PolicyGenTemplate CR can be constructed with any number of included CRs. Apply the following example CR in the hub cluster to generate a policy containing a single CR:

  1. apiVersion: ran.openshift.io/v1
  2. kind: PolicyGenTemplate
  3. metadata:
  4. name: "group-du-sno"
  5. namespace: "ztp-group"
  6. spec:
  7. bindingRules:
  8. group-du-sno: ""
  9. mcp: "master"
  10. sourceFiles:
  11. - fileName: PtpConfigSlave.yaml
  12. policyName: "config-policy"
  13. metadata:
  14. name: "du-ptp-slave"
  15. spec:
  16. profile:
  17. - name: "slave"
  18. interface: "ens5f0"
  19. ptp4lOpts: "-2 -s --summary_interval -4"
  20. phc2sysOpts: "-a -r -n 24"

Using the source file PtpConfigSlave.yaml as an example, the file defines a PtpConfig CR. The generated policy for the PtpConfigSlave example is named group-du-sno-config-policy. The PtpConfig CR defined in the generated group-du-sno-config-policy is named du-ptp-slave. The spec defined in PtpConfigSlave.yaml is placed under du-ptp-slave along with the other spec items defined under the source file.

The following example shows the group-du-sno-config-policy CR:

  1. apiVersion: policy.open-cluster-management.io/v1
  2. kind: Policy
  3. metadata:
  4. name: group-du-ptp-config-policy
  5. namespace: groups-sub
  6. annotations:
  7. policy.open-cluster-management.io/categories: CM Configuration Management
  8. policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
  9. policy.open-cluster-management.io/standards: NIST SP 800-53
  10. spec:
  11. remediationAction: inform
  12. disabled: false
  13. policy-templates:
  14. - objectDefinition:
  15. apiVersion: policy.open-cluster-management.io/v1
  16. kind: ConfigurationPolicy
  17. metadata:
  18. name: group-du-ptp-config-policy-config
  19. spec:
  20. remediationAction: inform
  21. severity: low
  22. namespaceselector:
  23. exclude:
  24. - kube-*
  25. include:
  26. - '*'
  27. object-templates:
  28. - complianceType: musthave
  29. objectDefinition:
  30. apiVersion: ptp.openshift.io/v1
  31. kind: PtpConfig
  32. metadata:
  33. name: du-ptp-slave
  34. namespace: openshift-ptp
  35. spec:
  36. recommend:
  37. - match:
  38. - nodeLabel: node-role.kubernetes.io/worker-du
  39. priority: 4
  40. profile: slave
  41. profile:
  42. - interface: ens5f0
  43. name: slave
  44. phc2sysOpts: -a -r -n 24
  45. ptp4lConf: |
  46. [global]
  47. #
  48. # Default Data Set
  49. #
  50. twoStepFlag 1
  51. slaveOnly 0
  52. priority1 128
  53. priority2 128
  54. domainNumber 24
  55. .....

Recommendations when customizing PolicyGenTemplate CRs

Consider the following best practices when customizing site configuration PolicyGenTemplate custom resources (CRs):

  • Use as few policies as are necessary. Using fewer policies requires less resources. Each additional policy creates overhead for the hub cluster and the deployed managed cluster. CRs are combined into policies based on the policyName field in the PolicyGenTemplate CR. CRs in the same PolicyGenTemplate which have the same value for policyName are managed under a single policy.

  • In disconnected environments, use a single catalog source for all Operators by configuring the registry as a single index containing all Operators. Each additional CatalogSource CR on the managed clusters increases CPU usage.

  • MachineConfig CRs should be included as extraManifests in the SiteConfig CR so that they are applied during installation. This can reduce the overall time taken until the cluster is ready to deploy applications.

  • PolicyGenTemplates should override the channel field to explicitly identify the desired version. This ensures that changes in the source CR during upgrades does not update the generated subscription.

Additional resources

When managing large numbers of spoke clusters on the hub cluster, minimize the number of policies to reduce resource consumption.

Grouping multiple configuration CRs into a single or limited number of policies is one way to reduce the overall number of policies on the hub cluster. When using the common, group, and site hierarchy of policies for managing site configuration, it is especially important to combine site-specific configuration into a single policy.

PolicyGenTemplate CRs for RAN deployments

Use PolicyGenTemplate (PGT) custom resources (CRs) to customize the configuration applied to the cluster by using the GitOps zero touch provisioning (ZTP) pipeline. The PGT CR allows you to generate one or more policies to manage the set of configuration CRs on your fleet of clusters. The PGT identifies the set of managed CRs, bundles them into policies, builds the policy wrapping around those CRs, and associates the policies with clusters by using label binding rules.

The reference configuration, obtained from the GitOps ZTP container, is designed to provide a set of critical features and node tuning settings that ensure the cluster can support the stringent performance and resource utilization constraints typical of RAN (Radio Access Network) Distributed Unit (DU) applications. Changes or omissions from the baseline configuration can affect feature availability, performance, and resource utilization. Use the reference PolicyGenTemplate CRs as the basis to create a hierarchy of configuration files tailored to your specific site requirements.

The baseline PolicyGenTemplate CRs that are defined for RAN DU cluster configuration can be extracted from the GitOps ZTP ztp-site-generate container. See “Preparing the GitOps ZTP site configuration repository” for further details.

The PolicyGenTemplate CRs can be found in the ./out/argocd/example/policygentemplates folder. The reference architecture has common, group, and site-specific configuration CRs. Each PolicyGenTemplate CR refers to other CRs that can be found in the ./out/source-crs folder.

The PolicyGenTemplate CRs relevant to RAN cluster configuration are described below. Variants are provided for the group PolicyGenTemplate CRs to account for differences in single-node, three-node compact, and standard cluster configurations. Similarly, site-specific configuration variants are provided for single-node clusters and multi-node (compact or standard) clusters. Use the group and site-specific configuration variants that are relevant for your deployment.

Table 1. PolicyGenTemplate CRs for RAN deployments
PolicyGenTemplate CRDescription

example-multinode-site.yaml

Contains a set of CRs that get applied to multi-node clusters. These CRs configure SR-IOV features typical for RAN installations.

example-sno-site.yaml

Contains a set of CRs that get applied to single-node OpenShift clusters. These CRs configure SR-IOV features typical for RAN installations.

common-ranGen.yaml

Contains a set of common RAN CRs that get applied to all clusters. These CRs subscribe to a set of operators providing cluster features typical for RAN as well as baseline cluster tuning.

group-du-3node-ranGen.yaml

Contains the RAN policies for three-node clusters only.

group-du-sno-ranGen.yaml

Contains the RAN policies for single-node clusters only.

group-du-standard-ranGen.yaml

Contains the RAN policies for standard three control-plane clusters.

group-du-3node-validator-ranGen.yaml

PolicyGenTemplate CR used to generate the various policies required for three-node clusters.

group-du-standard-validator-ranGen.yaml

PolicyGenTemplate CR used to generate the various policies required for standard clusters.

group-du-sno-validator-ranGen.yaml

PolicyGenTemplate CR used to generate the various policies required for single-node OpenShift clusters.

Additional resources

Customizing a managed cluster with PolicyGenTemplate CRs

Use the following procedure to customize the policies that get applied to the managed cluster that you provision using the zero touch provisioning (ZTP) pipeline.

Prerequisites

  • You have installed the OpenShift CLI (oc).

  • You have logged in to the hub cluster as a user with cluster-admin privileges.

  • You configured the hub cluster for generating the required installation and policy CRs.

  • You created a Git repository where you manage your custom site configuration data. The repository must be accessible from the hub cluster and be defined as a source repository for the Argo CD application.

Procedure

  1. Create a PolicyGenTemplate CR for site-specific configuration CRs.

    1. Choose the appropriate example for your CR from the out/argocd/example/policygentemplates folder, for example, example-sno-site.yaml or example-multinode-site.yaml.

    2. Change the bindingRules field in the example file to match the site-specific label included in the SiteConfig CR. In the example SiteConfig file, the site-specific label is sites: example-sno.

      Ensure that the labels defined in your PolicyGenTemplate bindingRules field correspond to the labels that are defined in the related managed clusters SiteConfig CR.

    3. Change the content in the example file to match the desired configuration.

  2. Optional: Create a PolicyGenTemplate CR for any common configuration CRs that apply to the entire fleet of clusters.

    1. Select the appropriate example for your CR from the out/argocd/example/policygentemplates folder, for example, common-ranGen.yaml.

    2. Change the content in the example file to match the desired configuration.

  3. Optional: Create a PolicyGenTemplate CR for any group configuration CRs that apply to the certain groups of clusters in the fleet.

    Ensure that the content of the overlaid spec files matches your desired end state. As a reference, the out/source-crs directory contains the full list of source-crs available to be included and overlaid by your PolicyGenTemplate templates.

    Depending on the specific requirements of your clusters, you might need more than a single group policy per cluster type, especially considering that the example group policies each have a single PerformancePolicy.yaml file that can only be shared across a set of clusters if those clusters consist of identical hardware configurations.

    1. Select the appropriate example for your CR from the out/argocd/example/policygentemplates folder, for example, group-du-sno-ranGen.yaml.

    2. Change the content in the example file to match the desired configuration.

  4. Optional. Create a validator inform policy PolicyGenTemplate CR to signal when the ZTP installation and configuration of the deployed cluster is complete. For more information, see “Creating a validator inform policy”.

  5. Define all the policy namespaces in a YAML file similar to the example out/argocd/example/policygentemplates/ns.yaml file.

    Do not include the Namespace CR in the same file with the PolicyGenTemplate CR.

  6. Add the PolicyGenTemplate CRs and Namespace CR to the kustomization.yaml file in the generators section, similar to the example shown in out/argocd/example/policygentemplates/kustomization.yaml.

  7. Commit the PolicyGenTemplate CRs, Namespace CR, and associated kustomization.yaml file in your Git repository and push the changes.

    The ArgoCD pipeline detects the changes and begins the managed cluster deployment. You can push the changes to the SiteConfig CR and the PolicyGenTemplate CR simultaneously.

Additional resources

Monitoring managed cluster policy deployment progress

The ArgoCD pipeline uses PolicyGenTemplate CRs in Git to generate the RHACM policies and then sync them to the hub cluster. You can monitor the progress of the managed cluster policy synchronization after the assisted service installs OKD on the managed cluster.

Prerequisites

  • You have installed the OpenShift CLI (oc).

  • You have logged in to the hub cluster as a user with cluster-admin privileges.

Procedure

  1. The Topology Aware Lifecycle Manager (TALM) applies the configuration policies that are bound to the cluster.

    After the cluster installation is complete and the cluster becomes Ready, a ClusterGroupUpgrade CR corresponding to this cluster, with a list of ordered policies defined by the ran.openshift.io/ztp-deploy-wave annotations, is automatically created by the TALM. The cluster’s policies are applied in the order listed in ClusterGroupUpgrade CR.

    You can monitor the high-level progress of configuration policy reconciliation by using the following commands:

    1. $ export CLUSTER=<clusterName>
    1. $ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[-1:]}' | jq

    Example output

    1. {
    2. "lastTransitionTime": "2022-11-09T07:28:09Z",
    3. "message": "Remediating non-compliant policies",
    4. "reason": "InProgress",
    5. "status": "True",
    6. "type": "Progressing"
    7. }
  2. You can monitor the detailed cluster policy compliance status by using the RHACM dashboard or the command line.

    1. To check policy compliance by using oc, run the following command:

      1. $ oc get policies -n $CLUSTER

      Example output

      1. NAME REMEDIATION ACTION COMPLIANCE STATE AGE
      2. ztp-common.common-config-policy inform Compliant 3h42m
      3. ztp-common.common-subscriptions-policy inform NonCompliant 3h42m
      4. ztp-group.group-du-sno-config-policy inform NonCompliant 3h42m
      5. ztp-group.group-du-sno-validator-du-policy inform NonCompliant 3h42m
      6. ztp-install.example1-common-config-policy-pjz9s enforce Compliant 167m
      7. ztp-install.example1-common-subscriptions-policy-zzd9k enforce NonCompliant 164m
      8. ztp-site.example1-config-policy inform NonCompliant 3h42m
      9. ztp-site.example1-perf-policy inform NonCompliant 3h42m
    2. To check policy status from the RHACM web console, perform the following actions:

      1. Click GovernanceFind policies.

      2. Click on a cluster policy to check it’s status.

When all of the cluster policies become compliant, ZTP installation and configuration for the cluster is complete. The ztp-done label is added to the cluster.

In the reference configuration, the final policy that becomes compliant is the one defined in the *-du-validator-policy policy. This policy, when compliant on a cluster, ensures that all cluster configuration, Operator installation, and Operator configuration is complete.

Validating the generation of configuration policy CRs

Policy custom resources (CRs) are generated in the same namespace as the PolicyGenTemplate from which they are created. The same troubleshooting flow applies to all policy CRs generated from a PolicyGenTemplate regardless of whether they are ztp-common, ztp-group, or ztp-site based, as shown using the following commands:

  1. $ export NS=<namespace>
  1. $ oc get policy -n $NS

The expected set of policy-wrapped CRs should be displayed.

If the policies failed synchronization, use the following troubleshooting steps.

Procedure

  1. To display detailed information about the policies, run the following command:

    1. $ oc describe -n openshift-gitops application policies
  2. Check for Status: Conditions: to show the error logs. For example, setting an invalid sourceFile→fileName: generates the error shown below:

    1. Status:
    2. Conditions:
    3. Last Transition Time: 2021-11-26T17:21:39Z
    4. Message: rpc error: code = Unknown desc = `kustomize build /tmp/https___git.com/ran-sites/policies/ --enable-alpha-plugins` failed exit status 1: 2021/11/26 17:21:40 Error could not find test.yaml under source-crs/: no such file or directory Error: failure in plugin configured via /tmp/kust-plugin-config-52463179; exit status 1: exit status 1
    5. Type: ComparisonError
  3. Check for Status: Sync:. If there are log errors at Status: Conditions:, the Status: Sync: shows Unknown or Error:

    1. Status:
    2. Sync:
    3. Compared To:
    4. Destination:
    5. Namespace: policies-sub
    6. Server: https://kubernetes.default.svc
    7. Source:
    8. Path: policies
    9. Repo URL: https://git.com/ran-sites/policies/.git
    10. Target Revision: master
    11. Status: Error
  4. When Red Hat Advanced Cluster Management (RHACM) recognizes that policies apply to a ManagedCluster object, the policy CR objects are applied to the cluster namespace. Check to see if the policies were copied to the cluster namespace:

    1. $ oc get policy -n $CLUSTER

    Example output:

    1. NAME REMEDIATION ACTION COMPLIANCE STATE AGE
    2. ztp-common.common-config-policy inform Compliant 13d
    3. ztp-common.common-subscriptions-policy inform Compliant 13d
    4. ztp-group.group-du-sno-config-policy inform Compliant 13d
    5. Ztp-group.group-du-sno-validator-du-policy inform Compliant 13d
    6. ztp-site.example-sno-config-policy inform Compliant 13d

    RHACM copies all applicable policies into the cluster namespace. The copied policy names have the format: <policyGenTemplate.Namespace>.<policyGenTemplate.Name>-<policyName>.

  5. Check the placement rule for any policies not copied to the cluster namespace. The matchSelector in the PlacementRule for those policies should match labels on the ManagedCluster object:

    1. $ oc get placementrule -n $NS
  6. Note the PlacementRule name appropriate for the missing policy, common, group, or site, using the following command:

    1. $ oc get placementrule -n $NS <placementRuleName> -o yaml
    • The status-decisions should include your cluster name.

    • The key-value pair of the matchSelector in the spec must match the labels on your managed cluster.

  7. Check the labels on the ManagedCluster object using the following command:

    1. $ oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq
  8. Check to see which policies are compliant using the following command:

    1. $ oc get policy -n $CLUSTER

    If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not, it is likely that the Operators did not install on the managed cluster. This causes the Operator configuration policies to fail to apply because the CRD is not yet applied to the spoke.

Restarting policy reconciliation

You can restart policy reconciliation when unexpected compliance issues occur, for example, when the ClusterGroupUpgrade custom resource (CR) has timed out.

Procedure

  1. A ClusterGroupUpgrade CR is generated in the namespace ztp-install by the Topology Aware Lifecycle Manager after the managed cluster becomes Ready:

    1. $ export CLUSTER=<clusterName>
    1. $ oc get clustergroupupgrades -n ztp-install $CLUSTER
  2. If there are unexpected issues and the policies fail to become complaint within the configured timeout (the default is 4 hours), the status of the ClusterGroupUpgrade CR shows UpgradeTimedOut:

    1. $ oc get clustergroupupgrades -n ztp-install $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
  3. A ClusterGroupUpgrade CR in the UpgradeTimedOut state automatically restarts its policy reconciliation every hour. If you have changed your policies, you can start a retry immediately by deleting the existing ClusterGroupUpgrade CR. This triggers the automatic creation of a new ClusterGroupUpgrade CR that begins reconciling the policies immediately:

    1. $ oc delete clustergroupupgrades -n ztp-install $CLUSTER

Note that when the ClusterGroupUpgrade CR completes with status UpgradeCompleted and the managed cluster has the label ztp-done applied, you can make additional configuration changes using PolicyGenTemplate. Deleting the existing ClusterGroupUpgrade CR will not make the TALM generate a new CR.

At this point, ZTP has completed its interaction with the cluster and any further interactions should be treated as an update and a new ClusterGroupUpgrade CR created for remediation of the policies.

Additional resources

Indication of done for ZTP installations

Zero touch provisioning (ZTP) simplifies the process of checking the ZTP installation status for a cluster. The ZTP status moves through three phases: cluster installation, cluster configuration, and ZTP done.

Cluster installation phase

The cluster installation phase is shown by the ManagedClusterJoined and ManagedClusterAvailable conditions in the ManagedCluster CR . If the ManagedCluster CR does not have these conditions, or the condition is set to False, the cluster is still in the installation phase. Additional details about installation are available from the AgentClusterInstall and ClusterDeployment CRs. For more information, see “Troubleshooting GitOps ZTP”.

Cluster configuration phase

The cluster configuration phase is shown by a ztp-running label applied the ManagedCluster CR for the cluster.

ZTP done

Cluster installation and configuration is complete in the ZTP done phase. This is shown by the removal of the ztp-running label and addition of the ztp-done label to the ManagedCluster CR. The ztp-done label shows that the configuration has been applied and the baseline DU configuration has completed cluster tuning.

The transition to the ZTP done state is conditional on the compliant state of a Red Hat Advanced Cluster Management (RHACM) validator inform policy. This policy captures the existing criteria for a completed installation and validates that it moves to a compliant state only when ZTP provisioning of the managed cluster is complete.

The validator inform policy ensures the configuration of the cluster is fully applied and Operators have completed their initialization. The policy validates the following:

  • The target MachineConfigPool contains the expected entries and has finished updating. All nodes are available and not degraded.

  • The SR-IOV Operator has completed initialization as indicated by at least one SriovNetworkNodeState with syncStatus: Succeeded.

  • The PTP Operator daemon set exists.