Orc Format
Format: Serialization Schema Format: Deserialization Schema
The Apache Orc format allows to read and write Orc data.
Dependencies
In order to use the ORC format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.
Maven dependency | SQL Client |
---|---|
Copied to clipboard! | Download |
How to create a table with Orc format
Here is an example to create a table using Filesystem connector and Orc format.
CREATE TABLE user_behavior (
user_id BIGINT,
item_id BIGINT,
category_id BIGINT,
behavior STRING,
ts TIMESTAMP(3),
dt STRING
) PARTITIONED BY (dt) WITH (
'connector' = 'filesystem',
'path' = '/tmp/user_behavior',
'format' = 'orc'
)
Format Options
Option | Required | Default | Type | Description |
---|---|---|---|---|
format | required | (none) | String | Specify what format to use, here should be ‘orc’. |
Orc format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY
to enable snappy compression.
Data Type Mapping
Orc format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to Orc type.
Flink Data Type | Orc physical type | Orc logical type |
---|---|---|
CHAR | bytes | CHAR |
VARCHAR | bytes | VARCHAR |
STRING | bytes | STRING |
BOOLEAN | long | BOOLEAN |
BYTES | bytes | BINARY |
DECIMAL | decimal | DECIMAL |
TINYINT | long | BYTE |
SMALLINT | long | SHORT |
INT | long | INT |
BIGINT | long | LONG |
FLOAT | double | FLOAT |
DOUBLE | double | DOUBLE |
DATE | long | DATE |
TIMESTAMP | timestamp | TIMESTAMP |
ARRAY | - | LIST |
MAP | - | MAP |
ROW | - | STRUCT |