CLI and Configuration Parameters of TiCDC Changefeeds

Changefeed CLI parameters

This section introduces the command-line parameters of TiCDC changefeeds by illustrating how to create a replication (changefeed) task:

  1. cdc cli changefeed create --server=http://10.0.10.25:8300 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task"
  1. Create changefeed successfully!
  2. ID: simple-replication-task
  3. Info: {"upstream_id":7178706266519722477,"namespace":"default","id":"simple-replication-task","sink_uri":"mysql://root:xxxxx@127.0.0.1:4000/?time-zone=","create_time":"2024-04-26T15:05:46.679218+08:00","start_ts":438156275634929669,"engine":"unified","config":{"case_sensitive":false,"enable_old_value":true,"force_replicate":false,"ignore_ineligible_table":false,"check_gc_safe_point":true,"enable_sync_point":true,"bdr_mode":false,"sync_point_interval":30000000000,"sync_point_retention":3600000000000,"filter":{"rules":["test.*"],"event_filters":null},"mounter":{"worker_num":16},"sink":{"protocol":"","schema_registry":"","csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false},"column_selectors":null,"transaction_atomicity":"none","encoder_concurrency":16,"terminator":"\r\n","date_separator":"none","enable_partition_separator":false},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"storage":""}},"state":"normal","creator_version":"v7.1.5"}
  • --changefeed-id: The ID of the replication task. The format must match the ^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$ regular expression. If this ID is not specified, TiCDC automatically generates a UUID (the version 4 format) as the ID.

  • --sink-uri: The downstream address of the replication task. Configure --sink-uri according to the following format. Currently, the scheme supports mysql, tidb, and kafka.

    ```

  1. [scheme]://[userinfo@][host]:[port][/path]?[query_parameters]
  2. ```
  3. When the sink URI contains special characters such as `! * ' ( ) ; : @ & = + $ , / ? % # [ ]`, you need to escape the special characters, for example, in [URI Encoder](https://www.urlencoder.org/).
  • --start-ts: Specifies the starting TSO of the changefeed. From this TSO, the TiCDC cluster starts pulling data. The default value is the current time.

  • --target-ts: Specifies the ending TSO of the changefeed. To this TSO, the TiCDC cluster stops pulling data. The default value is empty, which means that TiCDC does not automatically stop pulling data.

  • --config: Specifies the configuration file of the changefeed.

Changefeed configuration parameters

This section introduces the configuration of a replication task.

  1. # Specifies the memory quota (in bytes) that can be used in the capture server by the sink manager.
  2. # If the value is exceeded, the overused part will be recycled by the go runtime.
  3. # The default value is `1073741824` (1 GB).
  4. # memory-quota = 1073741824
  5. # Specifies whether the database names and tables in the configuration file are case-sensitive.
  6. # Starting from v6.5.6 and v7.1.3, the default value changes from true to false.
  7. # This configuration item affects configurations related to filter and sink.
  8. case-sensitive = false
  9. # Specifies whether to output the old value. New in v4.0.5. Since v5.0, the default value is `true`.
  10. enable-old-value = true
  11. # Specifies whether to enable the Syncpoint feature, which is supported since v6.3.0 and is disabled by default.
  12. # Since v6.4.0, only the changefeed with the SYSTEM_VARIABLES_ADMIN or SUPER privilege can use the TiCDC Syncpoint feature.
  13. # Note: This configuration item only takes effect if the downstream is TiDB.
  14. # enable-sync-point = false
  15. # Specifies the interval at which Syncpoint aligns the upstream and downstream snapshots.
  16. # The format is in h m s. For example, "1h30m30s".
  17. # The default value is "10m" and the minimum value is "30s".
  18. # Note: This configuration item only takes effect if the downstream is TiDB.
  19. # sync-point-interval = "5m"
  20. # Specifies how long the data is retained by Syncpoint in the downstream table. When this duration is exceeded, the data is cleaned up.
  21. # The format is in h m s. For example, "24h30m30s".
  22. # The default value is "24h".
  23. # Note: This configuration item only takes effect if the downstream is TiDB.
  24. # sync-point-retention = "1h"
  25. # Starting from v6.5.6 and v7.1.3, this configuration item specifies the SQL mode used when parsing DDL statements. Multiple modes are separated by commas.
  26. # The default value is the same as the default SQL mode of TiDB.
  27. # sql-mode = "ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
  28. [mounter]
  29. # The number of threads with which the mounter decodes KV data. The default value is 16.
  30. # worker-num = 16
  31. [filter]
  32. # Ignores the transaction of specified start_ts.
  33. # ignore-txn-start-ts = [1, 2]
  34. # Filter rules.
  35. # Filter syntax: <https://docs.pingcap.com/tidb/stable/table-filter#syntax>.
  36. rules = ['*.*', '!test.*']
  37. # Event filter rules.
  38. # The detailed syntax is described in <https://docs.pingcap.com/tidb/stable/ticdc-filter>
  39. # The first event filter rule.
  40. # [[filter.event-filters]]
  41. # matcher = ["test.worker"] # matcher is an allow list, which means this rule only applies to the worker table in the test database.
  42. # ignore-event = ["insert"] # Ignore insert events.
  43. # ignore-sql = ["^drop", "add column"] # Ignore DDLs that start with "drop" or contain "add column".
  44. # ignore-delete-value-expr = "name = 'john'" # Ignore delete DMLs that contain the condition "name = 'john'".
  45. # ignore-insert-value-expr = "id >= 100" # Ignore insert DMLs that contain the condition "id >= 100".
  46. # ignore-update-old-value-expr = "age < 18" # Ignore update DMLs whose old value contains "age < 18".
  47. # ignore-update-new-value-expr = "gender = 'male'" # Ignore update DMLs whose new value contains "gender = 'male'".
  48. # The second event filter rule.
  49. # matcher = ["test.fruit"] # matcher is an allow list, which means this rule only applies to the fruit table in the test database.
  50. # ignore-event = ["drop table", "delete"] # Ignore the `drop table` DDL events and the `delete` DML events.
  51. # ignore-sql = ["^drop table", "alter table"] # Ignore DDL statements that start with `drop table` or contain `alter table`.
  52. # ignore-insert-value-expr = "price > 1000 and origin = 'no where'" # Ignore insert DMLs that contain the conditions "price > 1000" and "origin = 'no where'".
  53. [scheduler]
  54. # Allocate tables to multiple TiCDC nodes for replication on a per-Region basis.
  55. # Note: This configuration item only takes effect on Kafka changefeeds and is not supported on MySQL changefeeds.
  56. # The value is "false" by default. Set it to "true" to enable this feature.
  57. enable-table-across-nodes = false
  58. # When `enable-table-across-nodes` is enabled, there are two allocation modes:
  59. # 1. Allocate tables based on the number of Regions, so that each TiCDC node handles roughly the same number of Regions. If the number of Regions for a table exceeds the value of `region-threshold`, the table will be allocated to multiple nodes for replication. The default value of `region-threshold` is 10000.
  60. # region-threshold = 10000
  61. # 2. Allocate tables based on the write traffic, so that each TiCDC node handles roughly the same number of modified rows. Only when the number of modified rows per minute in a table exceeds the value of `write-key-threshold`, will this allocation take effect.
  62. # write-key-threshold = 30000
  63. # Note:
  64. # * The default value of `write-key-threshold` is 0, which means that the traffic allocation mode is not used by default.
  65. # * You only need to configure one of the two modes. If both `region-threshold` and `write-key-threshold` are configured, TiCDC prioritizes the traffic allocation mode, namely `write-key-threshold`.
  66. [sink]
  67. # For the sink of MQ type, you can use dispatchers to configure the event dispatcher.
  68. # Since v6.1.0, TiDB supports two types of event dispatchers: partition and topic. For more information, see <partition and topic link>.
  69. # The matching syntax of matcher is the same as the filter rule syntax. For details about the matcher rules, see <>.
  70. # Note: This configuration item only takes effect if the downstream is MQ.
  71. # dispatchers = [
  72. # {matcher = ['test1.*', 'test2.*'], topic = "Topic expression 1", partition = "ts" },
  73. # {matcher = ['test3.*', 'test4.*'], topic = "Topic expression 2", partition = "index-value" },
  74. # {matcher = ['test1.*', 'test5.*'], topic = "Topic expression 3", partition = "table"},
  75. # {matcher = ['test6.*'], partition = "ts"}
  76. # ]
  77. # The protocol configuration item specifies the protocol format of the messages sent to the downstream.
  78. # When the downstream is Kafka, the protocol can only be canal-json or avro.
  79. # When the downstream is a storage service, the protocol can only be canal-json or csv.
  80. # Note: This configuration item only takes effect if the downstream is Kafka or a storage service.
  81. # protocol = "canal-json"
  82. # The following three configuration items are only used when you replicate data to storage sinks and can be ignored when replicating data to MQ or MySQL sinks.
  83. # Row terminator, used for separating two data change events. The default value is an empty string, which means "\r\n" is used.
  84. # terminator = ''
  85. # Date separator type used in the file directory. Value options are `none`, `year`, `month`, and `day`. `day` is the default value and means separating files by day. For more information, see <https://docs.pingcap.com/tidb/v7.1/ticdc-sink-to-cloud-storage#data-change-records>.
  86. # Note: This configuration item only takes effect if the downstream is a storage service.
  87. date-separator = 'day'
  88. # Whether to use partitions as the separation string. The default value is true, which means that partitions in a table are stored in separate directories. It is recommended that you keep the value as `true` to avoid potential data loss in downstream partitioned tables <https://github.com/pingcap/tiflow/issues/8724>. For usage examples, see <https://docs.pingcap.com/tidb/v7.1/ticdc-sink-to-cloud-storage#data-change-records>.
  89. # Note: This configuration item only takes effect if the downstream is a storage service.
  90. enable-partition-separator = true
  91. # Schema registry URL.
  92. # Note: This configuration item only takes effect if the downstream is MQ.
  93. # schema-registry = "http://localhost:80801/subjects/{subject-name}/versions/{version-number}/schema"
  94. # Specifies the number of encoder threads used when encoding data.
  95. # Note: This configuration item only takes effect if the downstream is MQ.
  96. # The default value is 16.
  97. # encoder-concurrency = 16
  98. # Specifies whether to enable kafka-sink-v2 that uses the kafka-go sink library.
  99. # Note: This configuration item only takes effect if the downstream is MQ.
  100. # The default value is false.
  101. # enable-kafka-sink-v2 = false
  102. # Starting from v7.1.0, this configuration item specifies whether to only output the updated columns.
  103. # Note: This configuration item only applies to the MQ downstream using the open-protocol and canal-json.
  104. # The default value is false.
  105. # only-output-updated-columns = false
  106. # Since v6.5.0, TiCDC supports saving data changes to storage services in CSV format. Ignore the following configurations if you replicate data to MQ or MySQL sinks.
  107. # [sink.csv]
  108. # The character used to separate fields in the CSV file. The value must be an ASCII character and defaults to `,`.
  109. # delimiter = ','
  110. # The quotation character used to surround fields in the CSV file. The default value is `"`. If the value is empty, no quotation is used.
  111. # quote = '"'
  112. # The character displayed when a CSV column is null. The default value is `\N`.
  113. # null = '\N'
  114. # Whether to include commit-ts in CSV rows. The default value is false.
  115. # include-commit-ts = false
  116. # The encoding method of binary data, which can be 'base64' or 'hex'. New in v7.1.2. The default value is 'base64'.
  117. # binary-encoding-method = 'base64'
  118. # Specifies the replication consistency configurations for a changefeed when using the redo log. For more information, see https://docs.pingcap.com/tidb/stable/ticdc-sink-to-mysql#eventually-consistent-replication-in-disaster-scenarios.
  119. # Note: The consistency-related configuration items only take effect when the downstream is a database and the redo log feature is enabled.
  120. [consistent]
  121. # The data consistency level. Available options are "none" and "eventual". "none" means that the redo log is disabled.
  122. # The default value is "none".
  123. level = "none"
  124. # The max redo log size in MB.
  125. # The default value is 64.
  126. max-log-size = 64
  127. # The flush interval for redo log. The default value is 2000 milliseconds.
  128. flush-interval = 2000
  129. # The storage URI of the redo log.
  130. # The default value is empty.
  131. storage = ""
  132. # Specifies whether to store the redo log in a local file.
  133. # The default value is false.
  134. use-file-backend = false
  135. # The number of encoding and decoding workers in the redo module.
  136. # The default value is 16.
  137. encoding-worker-num = 16
  138. # The number of flushing workers in the redo module.
  139. # The default value is 8.
  140. flush-worker-num = 8
  141. # The behavior to compress redo log files.
  142. # Available options are "" and "lz4". The default value is "", which means no compression.
  143. compression = ""
  144. # The concurrency for uploading a single redo file.
  145. # The default value is 1, which means concurrency is disabled.
  146. flush-concurrency = 1
  147. [integrity]
  148. # Whether to enable the checksum validation for single-row data. The default value is "none", which means to disable the feature. Value options are "none" and "correctness".
  149. integrity-check-level = "none"
  150. # Specifies the log level of the Changefeed when the checksum validation for single-row data fails. The default value is "warn". Value options are "warn" and "error".
  151. corruption-handle-level = "warn"
  152. # The following configuration items only take effect when the downstream is Kafka. Supported starting from v7.1.1.
  153. [sink.kafka-config]
  154. # The mechanism of Kafka SASL authentication. The default value is empty, indicating that SASL authentication is not used.
  155. sasl-mechanism = "OAUTHBEARER"
  156. # The client-id in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is required when the OAUTHBEARER authentication is used.
  157. sasl-oauth-client-id = "producer-kafka"
  158. # The client-secret in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is required when the OAUTHBEARER authentication is used.
  159. sasl-oauth-client-secret = "cHJvZHVjZXIta2Fma2E="
  160. # The token-url in the Kafka SASL OAUTHBEARER authentication to obtain the token. The default value is empty. This parameter is required when the OAUTHBEARER authentication is used.
  161. sasl-oauth-token-url = "http://127.0.0.1:4444/oauth2/token"
  162. # The scopes in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is optional when the OAUTHBEARER authentication is used.
  163. sasl-oauth-scopes = ["producer.kafka", "consumer.kafka"]
  164. # The grant-type in the Kafka SASL OAUTHBEARER authentication. The default value is "client_credentials". This parameter is optional when the OAUTHBEARER authentication is used.
  165. sasl-oauth-grant-type = "client_credentials"
  166. # The audience in the Kafka SASL OAUTHBEARER authentication. The default value is empty. This parameter is optional when the OAUTHBEARER authentication is used.
  167. sasl-oauth-audience = "kafka"
  168. [sink.cloud-storage-config]
  169. # The concurrency for saving data changes to the downstream cloud storage.
  170. # The default value is 16.
  171. worker-count = 16
  172. # The interval for saving data changes to the downstream cloud storage.
  173. # The default value is "2s".
  174. flush-interval = "2s"
  175. # A data change file is saved to the cloud storage when the number of bytes in this file exceeds `file-size`.
  176. # The default value is 67108864 (this is, 64 MiB).
  177. file-size = 67108864
  178. # The duration to retain files, which takes effect only when `date-separator` is configured as `day`. Assume that `file-expiration-days = 1` and `file-cleanup-cron-spec = "0 0 0 * * *"`, then TiCDC performs daily cleanup at 00:00:00 for files saved beyond 24 hours. For example, at 00:00:00 on 2023/12/02, TiCDC cleans up files generated before 2023/12/01, while files generated on 2023/12/01 remain unaffected.
  179. # The default value is 0, which means file cleanup is disabled.
  180. file-expiration-days = 0
  181. # The running cycle of the scheduled cleanup task, compatible with the crontab configuration, with a format of `<Second> <Minute> <Hour> <Day of the month> <Month> <Day of the week (Optional)>`
  182. # The default value is "0 0 2 * * *", which means that the cleanup task is executed every day at 2 AM.
  183. file-cleanup-cron-spec = "0 0 2 * * *"
  184. # The concurrency for uploading a single file.
  185. # The default value is 1, which means concurrency is disabled.
  186. flush-concurrency = 1