Unstable Result Set
This document describes how to solve unstable result set errors.
GROUP BY
For convenience, MySQL “extends” the GROUP BY
syntax to allow the SELECT
clause to refer to non-aggregated fields not declared in the GROUP BY
clause, that is, the NON-FULL GROUP BY
syntax. In other databases, this is considered a syntax ERROR because it causes unstable result sets.
For example, you have two tables:
stu_info
stores the student informationstu_score
stores the student test scores.
Then you can write a SQL query statement like this:
SELECT
`a`.`class`,
`a`.`stuname`,
max( `b`.`courscore` )
FROM
`stu_info` `a`
JOIN `stu_score` `b` ON `a`.`stuno` = `b`.`stuno`
GROUP BY
`a`.`class`,
`a`.`stuname`
ORDER BY
`a`.`class`,
`a`.`stuname`;
Result:
+------------+--------------+------------------+
| class | stuname | max(b.courscore) |
+------------+--------------+------------------+
| 2018_CS_01 | MonkeyDLuffy | 95.5 |
| 2018_CS_03 | PatrickStar | 99.0 |
| 2018_CS_03 | SpongeBob | 95.0 |
+------------+--------------+------------------+
3 rows in set (0.00 sec)
The a
.class
and a
.stuname
fields are specified in the GROUP BY
statement, and the selected columns are a
.class
, a
.stuname
and b
.courscore
. The only column that is not in the GROUP BY
condition, b
.courscore
, is also specified with a unique value using the max()
function. There is ONLY ONE result that satisfies this SQL statement without any ambiguity, which is called the FULL GROUP BY
syntax.
A counterexample is the NON-FULL GROUP BY
syntax. For example, in these two tables, write the following SQL query (delete a
.stuname
in GROUP BY
).
SELECT
`a`.`class`,
`a`.`stuname`,
max( `b`.`courscore` )
FROM
`stu_info` `a`
JOIN `stu_score` `b` ON `a`.`stuno` = `b`.`stuno`
GROUP BY
`a`.`class`
ORDER BY
`a`.`class`,
`a`.`stuname`;
Then two values that match this SQL are returned.
The first returned value:
+------------+--------------+------------------------+
| class | stuname | max( `b`.`courscore` ) |
+------------+--------------+------------------------+
| 2018_CS_01 | MonkeyDLuffy | 95.5 |
| 2018_CS_03 | PatrickStar | 99.0 |
+------------+--------------+------------------------+
The second returned value:
+------------+--------------+------------------+
| class | stuname | max(b.courscore) |
+------------+--------------+------------------+
| 2018_CS_01 | MonkeyDLuffy | 95.5 |
| 2018_CS_03 | SpongeBob | 99.0 |
+------------+--------------+------------------+
There are two results because you did NOT specify how to get the value of the a
.stuname
field in SQL, and two results are both satisfied by SQL semantics. It results in an unstable result set. Therefore, if you want to guarantee the stability of the result set of the GROUP BY
statement, use the FULL GROUP BY
syntax.
MySQL provides a sql_mode
switch ONLY_FULL_GROUP_BY
to control whether to check the FULL GROUP BY
syntax or not. TiDB is also compatible with this sql_mode
switch.
mysql> select a.class, a.stuname, max(b.courscore) from stu_info a join stu_score b on a.stuno=b.stuno group by a.class order by a.class, a.stuname;
+------------+--------------+------------------+
| class | stuname | max(b.courscore) |
+------------+--------------+------------------+
| 2018_CS_01 | MonkeyDLuffy | 95.5 |
| 2018_CS_03 | PatrickStar | 99.0 |
+------------+--------------+------------------+
2 rows in set (0.01 sec)
mysql> set @@sql_mode='STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.01 sec)
mysql> select a.class, a.stuname, max(b.courscore) from stu_info a join stu_score b on a.stuno=b.stuno group by a.class order by a.class, a.stuname;
ERROR 1055 (42000): Expression #2 of ORDER BY is not in GROUP BY clause and contains nonaggregated column '' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
Run results: The above example shows the effect when you set ONLY_FULL_GROUP_BY
for sql_mode
.
ORDER BY
In SQL semantics, the result set is output in order only if the ORDER BY
syntax is used. For a single-instance database, since the data is stored on one server, the results of multiple executions are often stable without data reorganization. Some databases (especially the MySQL InnoDB storage engine) can even output the result sets in order of the primary key or index.
As a distributed database, TiDB stores data on multiple servers. In addition, the TiDB layer does not cache data pages, so the result set order of SQL statements without ORDER BY
is easily perceived as unstable. To output a sequential result set, you need to explicitly add the order field into the ORDER BY
clause, which conforms to SQL semantics.
In the following example, only one field is added to the ORDER BY
clause, and TiDB only sorts the results by that one field.
mysql> select a.class, a.stuname, b.course, b.courscore from stu_info a join stu_score b on a.stuno=b.stuno order by a.class;
+------------+--------------+-------------------------+-----------+
| class | stuname | course | courscore |
+------------+--------------+-------------------------+-----------+
| 2018_CS_01 | MonkeyDLuffy | PrinciplesofDatabase | 60.5 |
| 2018_CS_01 | MonkeyDLuffy | English | 43.0 |
| 2018_CS_01 | MonkeyDLuffy | OpSwimming | 67.0 |
| 2018_CS_01 | MonkeyDLuffy | OpFencing | 76.0 |
| 2018_CS_01 | MonkeyDLuffy | FundamentalsofCompiling | 88.0 |
| 2018_CS_01 | MonkeyDLuffy | OperatingSystem | 90.5 |
| 2018_CS_01 | MonkeyDLuffy | PrincipleofStatistics | 69.0 |
| 2018_CS_01 | MonkeyDLuffy | ProbabilityTheory | 76.0 |
| 2018_CS_01 | MonkeyDLuffy | Physics | 63.5 |
| 2018_CS_01 | MonkeyDLuffy | AdvancedMathematics | 95.5 |
| 2018_CS_01 | MonkeyDLuffy | LinearAlgebra | 92.5 |
| 2018_CS_01 | MonkeyDLuffy | DiscreteMathematics | 89.0 |
| 2018_CS_03 | SpongeBob | PrinciplesofDatabase | 88.0 |
| 2018_CS_03 | SpongeBob | English | 79.0 |
| 2018_CS_03 | SpongeBob | OpBasketball | 92.0 |
| 2018_CS_03 | SpongeBob | OpTennis | 94.0 |
| 2018_CS_03 | PatrickStar | LinearAlgebra | 6.5 |
| 2018_CS_03 | PatrickStar | AdvancedMathematics | 5.0 |
| 2018_CS_03 | SpongeBob | DiscreteMathematics | 72.0 |
| 2018_CS_03 | PatrickStar | ProbabilityTheory | 12.0 |
| 2018_CS_03 | PatrickStar | PrincipleofStatistics | 20.0 |
| 2018_CS_03 | PatrickStar | OperatingSystem | 36.0 |
| 2018_CS_03 | PatrickStar | FundamentalsofCompiling | 2.0 |
| 2018_CS_03 | PatrickStar | DiscreteMathematics | 14.0 |
| 2018_CS_03 | PatrickStar | PrinciplesofDatabase | 9.0 |
| 2018_CS_03 | PatrickStar | English | 60.0 |
| 2018_CS_03 | PatrickStar | OpTableTennis | 12.0 |
| 2018_CS_03 | PatrickStar | OpPiano | 99.0 |
| 2018_CS_03 | SpongeBob | FundamentalsofCompiling | 43.0 |
| 2018_CS_03 | SpongeBob | OperatingSystem | 95.0 |
| 2018_CS_03 | SpongeBob | PrincipleofStatistics | 90.0 |
| 2018_CS_03 | SpongeBob | ProbabilityTheory | 87.0 |
| 2018_CS_03 | SpongeBob | Physics | 65.0 |
| 2018_CS_03 | SpongeBob | AdvancedMathematics | 55.0 |
| 2018_CS_03 | SpongeBob | LinearAlgebra | 60.5 |
| 2018_CS_03 | PatrickStar | Physics | 6.0 |
+------------+--------------+-------------------------+-----------+
36 rows in set (0.01 sec)
Results are unstable when the ORDER BY
values are the same. To reduce randomness, ORDER BY
values should be unique. If you can’t guarantee the uniqueness, you need to add more ORDER BY
fields until the combination of the ORDER BY
fields in ORDER BY
is unique, then the result will be stable.
The result set is unstable because order by is not used in GROUP_CONCAT()
The result set is unstable because TiDB reads data from the storage layer in parallel, so the result set order returned by GROUP_CONCAT()
without ORDER BY
is easily perceived as unstable.
To let GROUP_CONCAT()
get the result set output in order, you need to add the sorting fields to the ORDER BY
clause, which conforms to the SQL semantics. In the following example, GROUP_CONCAT()
that splices customer_id
without ORDER BY
causes an unstable result set.
Excluded
ORDER BY
First query:
mysql> select GROUP_CONCAT( customer_id SEPARATOR ',' ) FROM customer where customer_id like '200002%';
+-------------------------------------------------------------------------+
| GROUP_CONCAT(customer_id SEPARATOR ',') |
+-------------------------------------------------------------------------+
| 20000200992,20000200993,20000200994,20000200995,20000200996,20000200... |
+-------------------------------------------------------------------------+
Second query:
mysql> select GROUP_CONCAT( customer_id SEPARATOR ',' ) FROM customer where customer_id like '200002%';
+-------------------------------------------------------------------------+
| GROUP_CONCAT(customer_id SEPARATOR ',') |
+-------------------------------------------------------------------------+
| 20000203040,20000203041,20000203042,20000203043,20000203044,20000203... |
+-------------------------------------------------------------------------+
Include
ORDER BY
First query:
mysql> select GROUP_CONCAT( customer_id order by customer_id SEPARATOR ',' ) FROM customer where customer_id like '200002%';
+-------------------------------------------------------------------------+
| GROUP_CONCAT(customer_id SEPARATOR ',') |
+-------------------------------------------------------------------------+
| 20000200000,20000200001,20000200002,20000200003,20000200004,20000200... |
+-------------------------------------------------------------------------+
Second query:
mysql> select GROUP_CONCAT( customer_id order by customer_id SEPARATOR ',' ) FROM customer where customer_id like '200002%';
+-------------------------------------------------------------------------+
| GROUP_CONCAT(customer_id SEPARATOR ',') |
+-------------------------------------------------------------------------+
| 20000200000,20000200001,20000200002,20000200003,20000200004,20000200... |
+-------------------------------------------------------------------------+
Unstable results in SELECT * FROM T LIMIT N
The returned result is related to the distribution of data on the storage node (TiKV). If multiple queries are performed, different storage units (Regions) of the storage nodes (TiKV) return results at different speeds, which can cause unstable results.