Deployment

Prepare to get module archive

Module archive is in the directory:inlong-sort-standalone/sort-standalone-dist/target/, the archive file is apache-inlong-sort-standalone-${project.version}-bin.tar.gz.

Start inlong-sort-standalone application

At first, decompress the archive file, execute the shell file “./bin/sort-start.sh”.

Configuration file:conf/common.properties

ParameterRequiredDefaultValueRemark
clusterIdYNAinlong-sort-standalone cluster id
sortSource.typeNorg.apache.inlong.sort.standalone.source.readapi.ReadApiSourceSource class name
sortChannel.typeNorg.apache.inlong.sort.standalone.channel.BufferQueueChannelChannel class name
sortSink.typeNorg.apache.inlong.sort.standalone.sink.hive.HiveSinkSink class name
sortClusterConfig.typeNorg.apache.inlong.sort.standalone.config.loader.ClassResourceSortClusterConfigLoaderConfiguration data loader class name
sortClusterConfig.managerPathNNAFor loader:org.apache.inlong.sort.standalone.config.loader.ManagerSortClusterConfigLoader, the parameter is the URL of InlongManager. For example:http://${manager ip:port}/api/inlong/manager/openapi/sort/standalone/getClusterConfig
eventFormatHandlerNorg.apache.inlong.sort.standalone.sink.hive.DefaultEventFormatHandlerFormater class name
maxThreadsN10sink thread number
reloadIntervalN60000interval updating Configuration data(millisecond)
processIntervalN100interval processing data(millisecond)
metricDomainsNSortdomain name of metric
metricDomains.Sort.domainListenersNorg.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListenerclass name list of metric listener, separated by space
prometheusHttpPortN8080HTTP server port of prometheus simple client
metricDomains.Sort.snapshotIntervalN60000interval snapshoting metric data(millisecond)

SortClusterConfig

  • Get SortClusterConfig from the file:SortClusterConfig.conf in classpath, but it can not support online updating.
  • Get SortClusterConfig from InlongManager URL, but it can support online updating.
ParameterRequiredDefaultValueRemark
clusterNameYNAinlong-sort-standalone cluster id
sortTasksYNASort task list

SortTaskConfig

ParameterRequiredDefaultValueRemark
nameYNAsort task name
typeYNAsort task type, for example:HIVE(“hive”), TUBE(“tube”), KAFKA(“kafka”), PULSAR(“pulsar”), ElasticSearch(“ElasticSearch”), UNKNOWN(“n”)
idParamsYNAInlong DataStream configuration
sinkParamsYNASort task parameters

idParams content of Hive sort task

ParameterRequiredDefaultValueRemark
inlongGroupIdYNAinlongGroupId
inlongStreamIdYNAinlongStreamId
separatorYNAsepartor
partitionIntervalMsN3600000partition interval(millisecond)
idRootPathYNAHDFS root path of Inlong DataStream
partitionSubPathYNApartition sub path of Inlong DataStream
hiveTableNameYNAHive table name of Inlong DataStream
partitionFieldNameNdtpartition field name of Inlong DataStream
partitionFieldPatternYNADate format of partition field value, the type have {yyyyMMdd},{yyyyMMddHH},{yyyyMMddHHmm}
msgTimeFieldPatternYNADate format of message generation time, it support Java date format
maxPartitionOpenDelayHourN8Max delay hour of partition(hour)

sinkParams content of Hive sort task

ParameterRequiredDefaultValueRemark
hdfsPathYNANameNode URL of HDFS
maxFileOpenDelayMinuteN5Max writing delay minute of simple HDFS file(minute)
tokenOvertimeMinuteN60token overtime of Inlong Data Stream(minute)
maxOutputFileSizeGbN2Max file size of simple HDFS file(GB)
hiveJdbcUrlYNAJDBC URL of Hive
hiveDatabaseYNAHive database
hiveUsernameYNAHive username
hivePasswordYNAHive password

idParams content of Pulsar sort task

ParameterRequiredDefaultValueRemark
inlongGroupIdYNAinlongGroupId
inlongStreamIdYNAinlongStreamId
topicYNAPulsar的Topic

sinkParams content of Pulsar sort task

ParameterRequiredDefaultValueRemark
serviceUrlYNAPulsar service URL
authenticationYNAPulsar authentication
enableBatchingNtrueenableBatching
batchingMaxBytesN5242880batchingMaxBytes
batchingMaxMessagesN3000batchingMaxMessages
batchingMaxPublishDelayN1batchingMaxPublishDelay
maxPendingMessagesN1000maxPendingMessages
maxPendingMessagesAcrossPartitionsN50000maxPendingMessagesAcrossPartitions
sendTimeoutN0sendTimeout
compressionTypeNNONEcompressionType
blockIfQueueFullNtrueblockIfQueueFull
roundRobinRouterBatchingPartitionSwitchFrequencyN10roundRobinRouterBatchingPartitionSwitchFrequency

Sample of Hive sort task

  1. {
  2. "data":{
  3. "clusterName":"hivev3-sz-sz1",
  4. "sortTasks":[
  5. {
  6. "idParams":[
  7. {
  8. "inlongGroupId":"0fc00000046",
  9. "inlongStreamId":"",
  10. "separator":"|",
  11. "partitionIntervalMs":3600000,
  12. "idRootPath":"/user/hive/warehouse/t_inlong_v1_0fc00000046",
  13. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  14. "hiveTableName":"t_inlong_v1_0fc00000046",
  15. "partitionFieldName":"dt",
  16. "partitionFieldPattern":"yyyyMMddHH",
  17. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  18. "maxPartitionOpenDelayHour":8
  19. },
  20. {
  21. "inlongGroupId":"03600000045",
  22. "inlongStreamId":"",
  23. "separator":"|",
  24. "partitionIntervalMs":3600000,
  25. "idRootPath":"/user/hive/warehouse/t_inlong_v1_03600000045",
  26. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  27. "hiveTableName":"t_inlong_v1_03600000045",
  28. "partitionFieldName":"dt",
  29. "partitionFieldPattern":"yyyyMMddHH",
  30. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  31. "maxPartitionOpenDelayHour":8
  32. },
  33. {
  34. "inlongGroupId":"05100054990",
  35. "inlongStreamId":"",
  36. "separator":"|",
  37. "partitionIntervalMs":3600000,
  38. "idRootPath":"/user/hive/warehouse/t_inlong_v1_05100054990",
  39. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  40. "hiveTableName":"t_inlong_v1_05100054990",
  41. "partitionFieldName":"dt",
  42. "partitionFieldPattern":"yyyyMMddHH",
  43. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  44. "maxPartitionOpenDelayHour":8
  45. },
  46. {
  47. "inlongGroupId":"09c00014434",
  48. "inlongStreamId":"",
  49. "separator":"|",
  50. "partitionIntervalMs":3600000,
  51. "idRootPath":"/user/hive/warehouse/t_inlong_v1_09c00014434",
  52. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  53. "hiveTableName":"t_inlong_v1_09c00014434",
  54. "partitionFieldName":"dt",
  55. "partitionFieldPattern":"yyyyMMddHH",
  56. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  57. "maxPartitionOpenDelayHour":8
  58. },
  59. {
  60. "inlongGroupId":"0c900035509",
  61. "inlongStreamId":"",
  62. "separator":"|",
  63. "partitionIntervalMs":3600000,
  64. "idRootPath":"/user/hive/warehouse/t_inlong_v1_0c900035509",
  65. "partitionSubPath":"/{yyyyMMdd}/{yyyyMMddHH}",
  66. "hiveTableName":"t_inlong_v1_0c900035509",
  67. "partitionFieldName":"dt",
  68. "partitionFieldPattern":"yyyyMMddHH",
  69. "msgTimeFieldPattern":"yyyy-MM-dd HH:mm:ss",
  70. "maxPartitionOpenDelayHour":8
  71. }
  72. ],
  73. "name":"sid_hive_inlong6th_v3",
  74. "sinkParams":{
  75. "hdfsPath":"hdfs://127.0.0.1:9000",
  76. "maxFileOpenDelayMinute":"5",
  77. "tokenOvertimeMinute":"60",
  78. "maxOutputFileSizeGb":"2",
  79. "hiveJdbcUrl":"jdbc:hive2://127.0.0.2:10000",
  80. "hiveDatabase":"default",
  81. "hiveUsername":"hive",
  82. "hivePassword":"hive"
  83. },
  84. "type":"HIVE"
  85. }
  86. ]
  87. },
  88. "errCode":0,
  89. "md5":"md5",
  90. "result":true
  91. }