Configuring the logging collector

Logging subsystem for Red Hat OpenShift collects operations and application logs from your cluster and enriches the data with Kubernetes pod and project metadata.

You can configure the CPU and memory limits for the log collector and move the log collector pods to specific nodes. All supported modifications to the log collector can be performed though the spec.collection.log.fluentd stanza in the ClusterLogging custom resource (CR).

About unsupported configurations

The supported way of configuring the logging subsystem for Red Hat OpenShift is by configuring it using the options described in this documentation. Do not use other configurations, as they are unsupported. Configuration paradigms might change across OKD releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this documentation, your changes will disappear because the OpenShift Elasticsearch Operator and Red Hat OpenShift Logging Operator reconcile any differences. The Operators reverse everything to the defined state by default and by design.

If you must perform configurations not described in the OKD documentation, you must set your Red Hat OpenShift Logging Operator or OpenShift Elasticsearch Operator to Unmanaged. An unmanaged OpenShift Logging environment is not supported and does not receive updates until you return OpenShift Logging to Managed.

Viewing logging collector pods

You can view the Fluentd logging collector pods and the corresponding nodes that they are running on. The Fluentd logging collector pods run only in the openshift-logging project.

Procedure

  • Run the following command in the openshift-logging project to view the Fluentd logging collector pods and their details:
  1. $ oc get pods --selector component=collector -o wide -n openshift-logging

Example output

  1. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
  2. fluentd-8d69v 1/1 Running 0 134m 10.130.2.30 master1.example.com <none> <none>
  3. fluentd-bd225 1/1 Running 0 134m 10.131.1.11 master2.example.com <none> <none>
  4. fluentd-cvrzs 1/1 Running 0 134m 10.130.0.21 master3.example.com <none> <none>
  5. fluentd-gpqg2 1/1 Running 0 134m 10.128.2.27 worker1.example.com <none> <none>
  6. fluentd-l9j7j 1/1 Running 0 134m 10.129.2.31 worker2.example.com <none> <none>

Configure log collector CPU and memory limits

The log collector allows for adjustments to both the CPU and memory limits.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    1. $ oc -n openshift-logging edit ClusterLogging instance
    1. apiVersion: "logging.openshift.io/v1"
    2. kind: "ClusterLogging"
    3. metadata:
    4. name: "instance"
    5. namespace: openshift-logging
    6. ...
    7. spec:
    8. collection:
    9. logs:
    10. fluentd:
    11. resources:
    12. limits: (1)
    13. memory: 736Mi
    14. requests:
    15. cpu: 100m
    16. memory: 736Mi
    1Specify the CPU and memory limits and requests as needed. The values shown are the default values.

Advanced configuration for the log forwarder

The logging subsystem for Red Hat OpenShift includes multiple Fluentd parameters that you can use for tuning the performance of the Fluentd log forwarder. With these parameters, you can change the following Fluentd behaviors:

  • Chunk and chunk buffer sizes

  • Chunk flushing behavior

  • Chunk forwarding retry behavior

Fluentd collects log data in a single blob called a chunk. When Fluentd creates a chunk, the chunk is considered to be in the stage, where the chunk gets filled with data. When the chunk is full, Fluentd moves the chunk to the queue, where chunks are held before being flushed, or written out to their destination. Fluentd can fail to flush a chunk for a number of reasons, such as network issues or capacity issues at the destination. If a chunk cannot be flushed, Fluentd retries flushing as configured.

By default in OKD, Fluentd uses the exponential backoff method to retry flushing, where Fluentd doubles the time it waits between attempts to retry flushing again, which helps reduce connection requests to the destination. You can disable exponential backoff and use the periodic retry method instead, which retries flushing the chunks at a specified interval.

These parameters can help you determine the trade-offs between latency and throughput.

  • To optimize Fluentd for throughput, you could use these parameters to reduce network packet count by configuring larger buffers and queues, delaying flushes, and setting longer times between retries. Be aware that larger buffers require more space on the node file system.

  • To optimize for low latency, you could use the parameters to send data as soon as possible, avoid the build-up of batches, have shorter queues and buffers, and use more frequent flush and retries.

You can configure the chunking and flushing behavior using the following parameters in the ClusterLogging custom resource (CR). The parameters are then automatically added to the Fluentd config map for use by Fluentd.

These parameters are:

  • Not relevant to most users. The default settings should give good general performance.

  • Only for advanced users with detailed knowledge of Fluentd configuration and performance.

  • Only for performance tuning. They have no effect on functional aspects of logging.

Table 1. Advanced Fluentd Configuration Parameters
ParameterDescriptionDefault

chunkLimitSize

The maximum size of each chunk. Fluentd stops writing data to a chunk when it reaches this size. Then, Fluentd sends the chunk to the queue and opens a new chunk.

8m

totalLimitSize

The maximum size of the buffer, which is the total size of the stage and the queue. If the buffer size exceeds this value, Fluentd stops adding data to chunks and fails with an error. All data not in chunks is lost.

8G

flushInterval

The interval between chunk flushes. You can use s (seconds), m (minutes), h (hours), or d (days).

1s

flushMode

The method to perform flushes:

  • lazy: Flush chunks based on the timekey parameter. You cannot modify the timekey parameter.

  • interval: Flush chunks based on the flushInterval parameter.

  • immediate: Flush chunks immediately after data is added to a chunk.

interval

flushThreadCount

The number of threads that perform chunk flushing. Increasing the number of threads improves the flush throughput, which hides network latency.

2

overflowAction

The chunking behavior when the queue is full:

  • throw_exception: Raise an exception to show in the log.

  • block: Stop data chunking until the full buffer issue is resolved.

  • drop_oldest_chunk: Drop the oldest chunk to accept new incoming chunks. Older chunks have less value than newer chunks.

block

retryMaxInterval

The maximum time in seconds for the exponential_backoff retry method.

300s

retryType

The retry method when flushing fails:

  • exponential_backoff: Increase the time between flush retries. Fluentd doubles the time it waits until the next retry until the retry_max_interval parameter is reached.

  • periodic: Retries flushes periodically, based on the retryWait parameter.

exponential_backoff

retryTimeOut

The maximum time interval to attempt retries before the record is discarded.

60m

retryWait

The time in seconds before the next chunk flush.

1s

For more information on the Fluentd chunk lifecycle, see Buffer Plugins in the Fluentd documentation.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    1. $ oc edit ClusterLogging instance
  2. Add or modify any of the following parameters:

    1. apiVersion: logging.openshift.io/v1
    2. kind: ClusterLogging
    3. metadata:
    4. name: instance
    5. namespace: openshift-logging
    6. spec:
    7. forwarder:
    8. fluentd:
    9. buffer:
    10. chunkLimitSize: 8m (1)
    11. flushInterval: 5s (2)
    12. flushMode: interval (3)
    13. flushThreadCount: 3 (4)
    14. overflowAction: throw_exception (5)
    15. retryMaxInterval: "300s" (6)
    16. retryType: periodic (7)
    17. retryWait: 1s (8)
    18. totalLimitSize: 32m (9)
    19. ...
    1Specify the maximum size of each chunk before it is queued for flushing.
    2Specify the interval between chunk flushes.
    3Specify the method to perform chunk flushes: lazy, interval, or immediate.
    4Specify the number of threads to use for chunk flushes.
    5Specify the chunking behavior when the queue is full: throw_exception, block, or drop_oldest_chunk.
    6Specify the maximum interval in seconds for the exponential_backoff chunk flushing method.
    7Specify the retry type when chunk flushing fails: exponential_backoff or periodic.
    8Specify the time in seconds before the next chunk flush.
    9Specify the maximum size of the chunk buffer.
  3. Verify that the Fluentd pods are redeployed:

    1. $ oc get pods -l component=collector -n openshift-logging
  4. Check that the new values are in the fluentd config map:

    1. $ oc extract configmap/fluentd --confirm

    Example fluentd.conf

    1. <buffer>
    2. @type file
    3. path '/var/lib/fluentd/default'
    4. flush_mode interval
    5. flush_interval 5s
    6. flush_thread_count 3
    7. retry_type periodic
    8. retry_wait 1s
    9. retry_max_interval 300s
    10. retry_timeout 60m
    11. queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '32'}"
    12. total_limit_size 32m
    13. chunk_limit_size 8m
    14. overflow_action throw_exception
    15. </buffer>

Removing unused components if you do not use the default Elasticsearch log store

As an administrator, in the rare case that you forward logs to a third-party log store and do not use the default Elasticsearch log store, you can remove several unused components from your logging cluster.

In other words, if you do not use the default Elasticsearch log store, you can remove the internal Elasticsearch logStore and Kibana visualization components from the ClusterLogging custom resource (CR). Removing these components is optional but saves resources.

Prerequisites

  • Verify that your log forwarder does not send log data to the default internal Elasticsearch cluster. Inspect the ClusterLogForwarder CR YAML file that you used to configure log forwarding. Verify that it does not have an outputRefs element that specifies default. For example:

    1. outputRefs:
    2. - default

Suppose the ClusterLogForwarder CR forwards log data to the internal Elasticsearch cluster, and you remove the logStore component from the ClusterLogging CR. In that case, the internal Elasticsearch cluster will not be present to store the log data. This absence can cause data loss.

Procedure

  1. Edit the ClusterLogging custom resource (CR) in the openshift-logging project:

    1. $ oc edit ClusterLogging instance
  2. If they are present, remove the logStore and visualization stanzas from the ClusterLogging CR.

  3. Preserve the collection stanza of the ClusterLogging CR. The result should look similar to the following example:

    1. apiVersion: "logging.openshift.io/v1"
    2. kind: "ClusterLogging"
    3. metadata:
    4. name: "instance"
    5. namespace: "openshift-logging"
    6. spec:
    7. managementState: "Managed"
    8. collection:
    9. logs:
    10. type: "fluentd"
    11. fluentd: {}
  4. Verify that the collector pods are redeployed:

    1. $ oc get pods -l component=collector -n openshift-logging

Additional resources