Parameter Configuration Description

TransformConfig Configuration Description

  1. public class TransformConfig {
  2. @JsonProperty("sourceInfo")
  3. private SourceInfo sourceInfo; // Definition of data source decoding
  4. @JsonProperty("sinkInfo")
  5. private SinkInfo sinkInfo; // Definition of data result encoding
  6. @JsonProperty("transformSql")
  7. private String transformSql; //Data transformation SQL
  8. }

SourceInfo Configuration Description

CSV

  1. public CsvSourceInfo(
  2. @JsonProperty("charset") String charset, // Character set
  3. @JsonProperty("delimiter") String delimiter, // Delimiter
  4. @JsonProperty("escapeChar") String escapeChar, // Escape character, if empty, no unescaping operation is performed during decoding
  5. @JsonProperty("fields") List<FieldInfo> fields) // Field list, if empty, decode by default according to the delimiter, field names are assigned as $1, $2, $3... starting from 1;
  6. // If the number of defined fields is less than the number of decoded fields, the extra fields will be discarded
  7. );

KV

  1. public KvSourceInfo(
  2. @JsonProperty("charset") String charset, // Character set
  3. @JsonProperty("fields") List<FieldInfo> fields) // Field list, if empty, decode by default using the Key in KV as the field name
  4. // If the field name does not match the decoded field name, the field value will be empty, and extra field names will be discarded
  5. );

ProtoBuffer

  1. public PbSourceInfo(
  2. @JsonProperty("charset") String charset, // Character set
  3. @JsonProperty("protoDescription") String protoDescription, // Base64 encoded ProtoBuf protocol description
  4. @JsonProperty("rootMessageType") String rootMessageType, // MessageType of the decoded source data, MessageType needs to be defined in the ProtoBuf protocol
  5. @JsonProperty("rowsNodePath") String rowsNodePath) // Array node path of the ProtoBuf protocol containing multiple data to be converted
  6. );

Generate ProtoBuf Protocol Description

  • Install Protocol Buffers compiler
  1. PB_REL="https://github.com/protocolbuffers/protobuf/releases"
  2. curl -LO $PB_REL/download/v3.15.8/protoc-3.15.8-linux-x86_64.zip
  3. unzip protoc-3.15.8-linux-x86_64.zip -d $HOME/.local
  4. export PATH="$HOME/.local/bin:$PATH"
  5. protoc --version
  6. #Displays libprotoc 3.15.8
  • Parse the protocol to generate a Base64 encoded description
  1. # transform.proto is the proto protocol file, transform.description is the binary description file after parsing the protocol
  2. protoc --descriptor_set_out=transform.description ./transform.proto --proto_path=.
  3. # Base64 encode the binary description file transform.description and write it to the file transform.base64, which is the parameter protoDescription in the configuration interface
  4. base64 transform.description |tr -d '\n' > transform.base64
  • Example of transform.proto
  1. syntax = "proto3";
  2. package test;
  3. message SdkMessage {
  4. bytes msg = 1;
  5. int64 msgTime = 2;
  6. map<string, string> extinfo = 3;
  7. }
  8. message SdkDataRequest {
  9. string sid = 1;
  10. repeated SdkMessage msgs = 2;
  11. uint64 packageID = 3;
  12. }
  • Example of transform.base64
  1. CrcCCg90cmFuc2Zvcm0ucHJvdG8SBHRlc3QirQEKClNka01lc3NhZ2USEAoDbXNnGAEgASgMUgNtc2cSGAoHbXNnVGltZRgCIAEoA1IHbXNnVGltZRI3CgdleHRpbmZvGAMgAygLMh0udGVzdC5TZGtNZXNzYWdlLkV4dGluZm9FbnRyeVIHZXh0aW5mbxo6CgxFeHRpbmZvRW50cnkSEAoDa2V5GAEgASgJUgNrZXkSFAoFdmFsdWUYAiABKAlSBXZhbHVlOgI4ASJmCg5TZGtEYXRhUmVxdWVzdBIQCgNzaWQYASABKAlSA3NpZBIkCgRtc2dzGAIgAygLMhAudGVzdC5TZGtNZXNzYWdlUgRtc2dzEhwKCXBhY2thZ2VJRBgDIAEoBFIJcGFja2FnZUlEYgZwcm90bzM=
  • Example of transform.description Configuration Instructions - 图1

Json

  1. public JsonSourceInfo(
  2. @JsonProperty("charset") String charset, // Character set
  3. @JsonProperty("rowsNodePath") String rowsNodePath) // Array node path of the Json protocol containing multiple data to be converted
  4. );

SinkInfo Configuration Description

CSV

  1. public CsvSinkInfo(
  2. @JsonProperty("charset") String charset, // Character set
  3. @JsonProperty("delimiter") String delimiter, // Delimiter
  4. @JsonProperty("escapeChar") String escapeChar, // Escape character, if empty, no escaping operation is performed during encoding
  5. @JsonProperty("fields") List<FieldInfo> fields) // Field list, if empty, encode by default according to the Select field order of TransformSQL
  6. );

KV

  1. public KvSinkInfo(
  2. @JsonProperty("charset") String charset, // Character set
  3. @JsonProperty("fields") List<FieldInfo> fields) // Field list, if empty, encode by default using the Alias of Select fields in TransformSQL as the Key
  4. );

TransformSQL Configuration Description

CSV / KV Field Reference

  • SourceInfo does not have a configured field list.
    • For CSV format, field names are referenced using 2, $3…
    • For KV format, field names directly reference the Key in the source data.
  • If the field names in SourceInfo and SinkInfo are inconsistent, you can use the Alias of Select fields to map the conversion.

ProtoBuf / Json Tree Structure Field Reference

  • All fields can only be prefixed with “$root.”, “$child”.
    • “$root” means the root node.
    • “$child” means the array node of multiple rows.
  • For multi-level nodes, use a decimal point to separate, such as $root.extParams.name.
  • For array nodes, use parentheses to identify the array index, such as $root.msgs(1).msgTime.

Operator Support

  • Currently supported operators
    • Arithmetic operators: +, -, *, /, (, )
    • Comparison operators: =, !=, >, >=, <, <=
    • Logical operators: and, or, !, not, (, )

Function Description

  • CONCAT(string1, string2, …), returns a concatenated string of string1, string2, … If any parameter is NULL, it returns NULL. For example, CONCAT(‘AA’, ‘BB’, ‘CC’) returns “AABBCC”.
  • NOW(), returns the current SQL timestamp in the local timezone.
  • See the function description section for details.

SQL Example

  1. select ftime,extinfo from source where extinfo='ok'
  2. select $1 ftime,$2 extinfo from source where $2!='ok'
  3. select $root.sid,$root.packageID,$child.msgTime,$child.msg from source
  4. select $root.sid,$root.packageID,$root.msgs(1).msgTime,$root.msgs(0).msg from source
  5. select $root.sid,
  6. ($root.msgs(1).msgTime-$root.msgs(0).msgTime)/$root.packageID field2,
  7. $root.packageID*($root.msgs(0).msgTime*$root.packageID+$root.msgs(1).msgTime/$root.packageID)*$root.packageID field3,
  8. $root.msgs(0).msg field4
  9. from source
  10. where $root.packageID<($root.msgs(0).msgTime+$root.msgs(1).msgTime+$root.msgs(0).msgTime+$root.msgs(1).msgTime)
  11. select $root.sid,
  12. $root.packageID,
  13. $child.msgTime,
  14. concat($root.sid,$root.packageID,$child.msgTime,$child.msg) msg,$root.msgs.msgTime.msg
  15. from source
  16. select now() from source

Common Issues

  • SDK calls are thread-safe.
  • Configuration changes, directly modifying the parameters of the configuration object will not take effect, you need to rebuild the SDK object.
  • If the CSV, KV format conversion source data contains line breaks, delimiters (vertical bars, commas, etc.), backslashes (escape characters), you need to configure the correct escape character and line separator.
    • If not configured, the field order of the converted data will be disordered, line breaks will cause one piece of data to become two, and vertical bar delimiters will cause field misalignment.
  • Avoid creating an SDK object for each piece of data processed, SDK object initialization requires compiling the conversion SQL and establishing an AST semantic parsing tree, frequent calls will cause performance problems, the recommended usage is to reuse an initialized SDK object to process data in the program.