Temporary partition

Since version 0.12, Doris supports temporary partitioning.

A temporary partition belongs to a partitioned table. Only partitioned tables can create temporary partitions.

Rules

  • The partition columns of the temporary partition is the same as the formal partition and cannot be modified.
  • The partition ranges of all temporary partitions of a table cannot overlap, but the ranges of temporary partitions and formal partitions can overlap.
  • The partition name of the temporary partition cannot be the same as the formal partitions and other temporary partitions.

Supported operations

The temporary partition supports add, delete, and replace operations.

Add temporary partition

You can add temporary partitions to a table with the ALTER TABLE ADD TEMPORARY PARTITION statement:

  1. ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN ("2020-02-01");
  2. ALTER TABLE tbl2 ADD TEMPORARY PARTITION tp1 VALUES [("2020-01-01"), ("2020-02-01"));
  3. ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN ("2020-02-01")
  4. ("in_memory" = "true", "replication_num" = "1")
  5. DISTRIBUTED BY HASH (k1) BUCKETS 5;

See HELP ALTER TABLE; for more help and examples.

Some instructions for adding operations:

  • Adding a temporary partition is similar to adding a formal partition. The partition range of the temporary partition is independent of the formal partition.
  • Temporary partition can independently specify some attributes. Includes information such as the number of buckets, the number of replicas, whether it is a memory table, or the storage medium.

Delete temporary partition

A table’s temporary partition can be dropped with the ALTER TABLE DROP TEMPORARY PARTITION statement:

  1. ALTER TABLE tbl1 DROP TEMPORARY PARTITION tp1;

See HELP ALTER TABLE; for more help and examples.

Some instructions for the delete operation:

  • Deleting the temporary partition will not affect the data of the formal partition.

Replace partition

You can replace formal partitions of a table with temporary partitions with the ALTER TABLE REPLACE PARTITION statement.

  1. ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
  2. ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2, tp3);
  3. ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2)
  4. PROPERTIES (
  5. "strict_range" = "false",
  6. "use_temp_partition_name" = "true"
  7. );

See HELP ALTER TABLE; for more help and examples.

The replace operation has two special optional parameters:

  1. strict_range

    The default is true. When this parameter is true, the range union of all formal partitions to be replaced needs to be the same as the range union of the temporary partitions to be replaced. When set to false, you only need to ensure that the range between the new formal partitions does not overlap after replacement. Here are some examples:

    • Example 1

      Range of partitions p1, p2, p3 to be replaced (=> union):

      1. (10, 20), [20, 30), [40, 50) => [10, 30), [40, 50)

      Replace the range of partitions tp1, tp2 (=> union):

      1. (10, 30), [40, 45), [45, 50) => [10, 30), [40, 50)

      The union of ranges is the same, so you can use tp1 and tp2 to replace p1, p2, p3.

    • Example 2

      Range of partition p1 to be replaced (=> union):

      1. (10, 50) => [10, 50)

      Replace the range of partitions tp1, tp2 (=> union):

      1. (10, 30), [40, 50) => [10, 30), [40, 50)

      The union of ranges is not the same. If strict_range is true, you cannot use tp1 and tp2 to replace p1. If false, and the two partition ranges [10, 30), [40, 50) and the other formal partitions do not overlap, they can be replaced.

  2. use_temp_partition_name

    The default is false. When this parameter is false, and the number of partitions to be replaced is the same as the number of replacement partitions, the name of the formal partition after the replacement remains unchanged. If true, after replacement, the name of the formal partition is the name of the replacement partition. Here are some examples:

    • Example 1

      1. ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);

      use_temp_partition_name is false by default. After replacement, the partition name is still p1, but the related data and attributes are replaced with tp1.

      If use_temp_partition_name is true by default, the name of the partition is tp1 after replacement. The p1 partition no longer exists.

    • Example 2

      1. ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1);

      use_temp_partition_name is false by default, but this parameter is invalid because the number of partitions to be replaced and the number of replacement partitions are different. After the replacement, the partition name is tp1, and p1 and p2 no longer exist.

Some instructions for the replacement operation:

  • After the partition is replaced successfully, the replaced partition will be deleted and cannot be recovered.

Load and query of temporary partitions

Users can load data into temporary partitions or specify temporary partitions for querying.

  1. Load temporary partition

    The syntax for specifying a temporary partition is slightly different depending on the load method. Here is a simple illustration through an example:

    1. INSERT INTO tbl TEMPORARY PARTITION (tp1, tp2, ...) SELECT ....
    1. curl --location-trusted -u root: -H "label: 123" -H "temporary_partition: tp1, tp2, ..." -T testData http: // host: port / api / testDb / testTbl / _stream_load
    1. LOAD LABEL example_db.label1
    2. (
    3. DATA INFILE ("hdfs: // hdfs_host: hdfs_port / user / palo / data / input / file")
    4. INTO TABLE `my_table`
    5. TEMPORARY PARTITION (tp1, tp2, ...)
    6. ...
    7. )
    8. WITH BROKER hdfs ("username" = "hdfs_user", "password" = "hdfs_password");
    1. CREATE ROUTINE LOAD example_db.test1 ON example_tbl
    2. COLUMNS (k1, k2, k3, v1, v2, v3 = k1 * 100),
    3. TEMPORARY PARTITIONS (tp1, tp2, ...),
    4. WHERE k1> 100
    5. PROPERTIES
    6. (...)
    7. FROM KAFKA
    8. (...);
  2. Query the temporary partition

    1. SELECT ... FROM
    2. tbl1 TEMPORARY PARTITION (tp1, tp2, ...)
    3. JOIN
    4. tbl2 TEMPORARY PARTITION (tp1, tp2, ...)
    5. ON ...
    6. WHERE ...;

Relationship to other operations

DROP

  • After using the DROP operation to directly drop the database or table, you can recover the database or table (within a limited time) through the RECOVER command, but the temporary partition will not be recovered.
  • After the formal partition is dropped using the ALTER command, the partition can be recovered by the RECOVER command (within a limited time). Operating a formal partition is not related to a temporary partition.
  • After the temporary partition is dropped using the ALTER command, the temporary partition cannot be recovered through the RECOVER command.

TRUNCATE

  • Use the TRUNCATE command to empty the table. The temporary partition of the table will be deleted and cannot be recovered.
  • When using TRUNCATE command to empty the formal partition, it will not affect the temporary partition.
  • You cannot use the TRUNCATE command to empty the temporary partition.

ALTER

  • When the table has a temporary partition, you cannot use the ALTER command to perform Schema Change, Rollup, etc. on the table.
  • You cannot add temporary partitions to a table while the table is undergoing a alter operation.

Best Practices

  1. Atomic overwrite

    In some cases, the user wants to be able to rewrite the data of a certain partition, but if it is dropped first and then loaded, there will be a period of time when the data cannot be seen. This is, the user can first create a corresponding temporary partition, load new data into the temporary partition, and then replace the original partition atomically through the REPLACE operation to achieve the purpose.

  2. Modify the number of buckets

    In some cases, the user used an inappropriate number of buckets when creating a partition. The user can first create a temporary partition corresponding to the partition range and specify a new number of buckets. Then use the INSERT INTO command to load the data of the formal partition into the temporary partition. Through the replacement operation, the original partition is replaced atomically to achieve the purpose.

  3. Merge or split partitions

    In some cases, users want to modify the range of partitions, such as merging two partitions, or splitting a large partition into multiple smaller partitions. Then the user can first create temporary partitions corresponding to the merged or divided range, and then load the data of the formal partition into the temporary partition through the INSERT INTO command. Through the replacement operation, the original partition is replaced atomically to achieve the purpose.