排序、聚合、集合、分析函数、Limit-k - 集合 - 《OceanBase v1.4 官方教程》

简介

数据库中的集合操作可以把多个查询的结果组合成一个结果集。集合操作主要包含：

UNION
INTERSECT
EXCEPT/MINUS

说明

注意：在Oceanbase中，我们同时支持EXCEPT与MINUS，这两者的语义和语法都是相同的。

对与UNION来说用户可以指定UNION的属性为ALL和DISTINCT/UNIQUE。分别代表集合可重复，和集合不可重复。所有的集合操作默认的属性是DISTINCT。在Oceanbase中，DISTINCT和UNIQUE的语义语法是相同的。而其他的几种集合操作是不能指定属性的。

执行计划

所有的集合操作在逻辑计划层面的展示都是[UNION|INTERSECT|EXCEPT] [ALL|DISTINCT]的形式。带有distinct关键字时，需要从输出的结果中去掉重复的行，目前OceanBase只支持基于排序的集合distinct去重，在计划层面，排序操作会下压到各分支内部进行。

OceanBase_114 (root@test)> explain select * from t1 union select * from t1\G
*************************** 1. row ***************************
Query Plan: =======================================
|ID|OPERATOR      |NAME|EST. ROWS|COST|
---------------------------------------
|0 |UNION DISTINCT|    |2000     |4861|
|1 | SORT         |    |1000     |2048|
|2 |  TABLE SCAN  |t1  |1000     |498 |
|3 | SORT         |    |1000     |2048|
|4 |  TABLE SCAN  |t1  |1000     |498 |
=======================================
Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.c1, t1.c1)], [UNION(t1.c2, t1.c2)]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  4 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
1 row in set (0.02 sec)
OceanBase_114 (root@test)> explain select * from t1 intersect select * from t1\G
*************************** 1. row ***************************
Query Plan: ===========================================
|ID|OPERATOR          |NAME|EST. ROWS|COST|
-------------------------------------------
|0 |INTERSECT DISTINCT|    |1000     |4723|
|1 | SORT             |    |1000     |2048|
|2 |  TABLE SCAN      |t1  |1000     |498 |
|3 | SORT             |    |1000     |2048|
|4 |  TABLE SCAN      |t1  |1000     |498 |
===========================================
Outputs & filters:
-------------------------------------
  0 - output([INTERSECT(t1.c1, t1.c1)], [INTERSECT(t1.c2, t1.c2)]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  4 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
1 row in set (0.02 sec)
OceanBase_114 (root@test)> explain select * from t1 minus select * from t1\G
*************************** 1. row ***************************
Query Plan: ========================================
|ID|OPERATOR       |NAME|EST. ROWS|COST|
----------------------------------------
|0 |EXCEPT DISTINCT|    |1000     |4723|
|1 | SORT          |    |1000     |2048|
|2 |  TABLE SCAN   |t1  |1000     |498 |
|3 | SORT          |    |1000     |2048|
|4 |  TABLE SCAN   |t1  |1000     |498 |
========================================
Outputs & filters:
-------------------------------------
  0 - output([EXCEPT(t1.c1, t1.c1)], [EXCEPT(t1.c2, t1.c2)]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  3 - output([t1.c1], [t1.c2]), filter(nil), sort_keys([t1.c1, ASC], [t1.c2, ASC])
  4 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
1 row in set (0.01 sec)

而带有ALL属性的集合操作不要求下层有序。

OceanBase_114 (root@test)> explain select * from t1 union all select * from t1\G
*************************** 1. row ***************************
Query Plan: ====================================
|ID|OPERATOR   |NAME|EST. ROWS|COST|
------------------------------------
|0 |UNION ALL  |    |2000     |1761|
|1 | TABLE SCAN|t1  |1000     |498 |
|2 | TABLE SCAN|t1  |1000     |498 |
====================================
Outputs & filters:
-------------------------------------
  0 - output([UNION(t1.c1, t1.c1)], [UNION(t1.c2, t1.c2)]), filter(nil)
  1 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
  2 - output([t1.c1], [t1.c2]), filter(nil),
      access([t1.c1], [t1.c2]), partitions(p0)
1 row in set (0.02 sec)

当集合操作两边出现类型不一致的时候，SQL层会为UNION左右两边的表达式分配CAST转换，这个操作在EXPLAIN中是可见的。

OceanBase_114 (root@test)> create table t1 (c1 int, c2 varchar(5));
Query OK, 0 rows affected (0.12 sec)
OceanBase_114 (root@test)> explain select c1 from t1 union select c2 from t1\G
*************************** 1. row ***************************
Query Plan: ========================================
|ID|OPERATOR      |NAME|EST. ROWS|COST |
----------------------------------------
|0 |UNION DISTINCT|    |2000     |11986|
|1 | SORT         |    |1000     |5664 |
|2 |  TABLE SCAN  |t1  |1000     |476  |
|3 | SORT         |    |1000     |5664 |
|4 |  TABLE SCAN  |t1  |1000     |476  |
========================================
Outputs & filters:
-------------------------------------
  0 - output([UNION(cast(t1.c1, VARCHAR(11)), cast(t1.c2, VARCHAR(11)))]), filter(nil)
  1 - output([cast(t1.c1, VARCHAR(11))]), filter(nil), sort_keys([cast(t1.c1, VARCHAR(11)), ASC])
  2 - output([cast(t1.c1, VARCHAR(11))]), filter(nil),
      access([t1.c1]), partitions(p0)
  3 - output([cast(t1.c2, VARCHAR(11))]), filter(nil), sort_keys([cast(t1.c2, VARCHAR(11)), ASC])
  4 - output([cast(t1.c2, VARCHAR(11))]), filter(nil),
      access([t1.c2]), partitions(p0)
1 row in set (0.01 sec)