Orc Format

Format: Serialization Schema Format: Deserialization Schema

The Apache Orc format allows to read and write Orc data.

Dependencies

In order to use the ORC format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.

Maven dependencySQL Client
  1. <dependency>
  2. <groupId>org.apache.flink</groupId>
  3. <artifactId>flink-orc</artifactId>
  4. <version>1.16.0</version>
  5. </dependency>
Copied to clipboard!
Download

How to create a table with Orc format

Here is an example to create a table using Filesystem connector and Orc format.

  1. CREATE TABLE user_behavior (
  2. user_id BIGINT,
  3. item_id BIGINT,
  4. category_id BIGINT,
  5. behavior STRING,
  6. ts TIMESTAMP(3),
  7. dt STRING
  8. ) PARTITIONED BY (dt) WITH (
  9. 'connector' = 'filesystem',
  10. 'path' = '/tmp/user_behavior',
  11. 'format' = 'orc'
  12. )

Format Options

OptionRequiredDefaultTypeDescription
format
required(none)StringSpecify what format to use, here should be ‘orc’.

Orc format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY to enable snappy compression.

Data Type Mapping

Orc format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to Orc type.

Flink Data TypeOrc physical typeOrc logical type
CHARbytesCHAR
VARCHARbytesVARCHAR
STRINGbytesSTRING
BOOLEANlongBOOLEAN
BYTESbytesBINARY
DECIMALdecimalDECIMAL
TINYINTlongBYTE
SMALLINTlongSHORT
INTlongINT
BIGINTlongLONG
FLOATdoubleFLOAT
DOUBLEdoubleDOUBLE
DATElongDATE
TIMESTAMPtimestampTIMESTAMP
ARRAY-LIST
MAP-MAP
ROW-STRUCT