Collation
New in version 3.4.
Collation allows users to specify language-specific rules for stringcomparison, such as rules for lettercase and accent marks.
You can specify collation for a collection or a view, an index, orspecific operations that support collation.
Collation Document
A collation document has the following fields:
- {
- locale: <string>,
- caseLevel: <boolean>,
- caseFirst: <string>,
- strength: <int>,
- numericOrdering: <boolean>,
- alternate: <string>,
- maxVariable: <string>,
- backwards: <boolean>
- }
When specifying collation, the locale
field is mandatory; allother collation fields are optional. For descriptions of the fields,see Collation Document.
Default collation parameter values vary depending on whichlocale you specify. For a complete list of default collationparameters and the locales they are associated with, seeCollation Default Parameters.
Field | Type | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
locale | string | The ICU locale. See Supported Languages and Locales for alist of supported locales.To specify simple binary comparison, specify locale value of"simple" . | ||||||||||||
strength | integer | Optional. The level of comparison to perform.Corresponds to ICU Comparison Levels.Possible values are:
See ICU Collation: Comparison Levelsfor details. | ||||||||||||
caseLevel | boolean | Optional. Flag that determines whether to include case comparison atstrength level 1 or 2 .
If
If For more information, see ICU Collation: Case Level. | ||||||||||||
caseFirst | string | Optional. A field that determines sort order of case differences duringtertiary level comparisons.
Possible values are:
| ||||||||||||
numericOrdering | boolean | Optional. Flag that determines whether to compare numeric strings as numbersor as strings.
If If Default is | ||||||||||||
alternate | string | Optional. Field that determines whether collation should consider whitespaceand punctuation as base characters for purposes of comparison.
Possible values are:
See ICU Collation: Comparison Levelsfor more information. Default is | ||||||||||||
maxVariable | string | Optional. Field that determines up to which characters are consideredignorable when alternate: "shifted" . Has no effect ifalternate: "non-ignorable"
Possible values are:
| ||||||||||||
backwards | boolean | Optional. Flag that determines whether strings with diacritics sort from backof the string, such as with some French dictionary ordering.
If If The default value is | ||||||||||||
normalization | boolean | Optional. Flag that determines whether to check if text require normalizationand to perform normalization. Generally, majority of text does notrequire this normalization processing.
If If The default value is Seehttp://userguide.icu-project.org/collation/concepts#TOC-Normalization for details. |
Operations that Support Collation
You can specify collation for the following operations:
Note
You cannot specify multiple collations for an operation. Forexample, you cannot specify different collations per field, or ifperforming a find with a sort, you cannot use one collation for thefind and another for the sort.
[1] | (1, 2) Some index types do not support collation. SeeCollation and Unsupported Index Types for details. |
Behavior
Local Variants
Some collation locales have variants, which employ speciallanguage-specific rules. To specify a locale variant, use the followingsyntax:
- { "locale" : "<locale code>@collation=<variant>" }
For example, to use the pinyin
variant of the Chinese collation:
- { "locale" : "zh@collation=pinyin" }
For a complete list of all collation locales and their variants, seeCollation Locales.
Collation and Views
- You can specify a default collationfor a view at creation time. If no collation is specified, theview’s default collation is the “simple” binary comparisoncollator. That is, the view does not inherit the collection’sdefault collation.
- String comparisons on the view use the view’s default collation.An operation that attempts to change or override a view’s defaultcollation will fail with an error.
- If creating a view from another view, you cannot specify acollation that differs from the source view’s collation.
- If performing an aggregation that involves multiple views, such aswith
$lookup
or$graphLookup
, the views musthave the same collation.
Collation and Index Use
To use an index for string comparisons, an operation must alsospecify the same collation. That is, an index with a collationcannot support an operation that performs string comparisons on theindexed fields if the operation specifies a different collation.
For example, the collection myColl
has an index on a stringfield category
with the collation locale "fr"
.
- db.myColl.createIndex( { category: 1 }, { collation: { locale: "fr" } } )
The following query operation, which specifies the same collation asthe index, can use the index:
- db.myColl.find( { category: "cafe" } ).collation( { locale: "fr" } )
However, the following query operation, which by default uses the“simple” binary collator, cannot use the index:
- db.myColl.find( { category: "cafe" } )
For a compound index where the index prefix keys are not strings,arrays, and embedded documents, an operation that specifies adifferent collation can still use the index to support comparisonson the index prefix keys.
For example, the collection myColl
has a compound index on thenumeric fields score
and price
and the string fieldcategory
; the index is created with the collation locale"fr"
for string comparisons:
- db.myColl.createIndex(
- { score: 1, price: 1, category: 1 },
- { collation: { locale: "fr" } } )
The following operations, which use "simple"
binary collationfor string comparisons, can use the index:
- db.myColl.find( { score: 5 } ).sort( { price: 1 } )
- db.myColl.find( { score: 5, price: { $gt: NumberDecimal( "10" ) } } ).sort( { price: 1 } )
The following operation, which uses "simple"
binary collationfor string comparisons on the indexed category
field, can usethe index to fulfill only the score: 5
portion of the query:
- db.myColl.find( { score: 5, category: "cafe" } )
Collation and Unsupported Index Types
The following indexes only support simple binary comparison and donot support collation:
- text indexes,
- 2d indexes, and
- geoHaystack indexes.
Tip
To create a text
, a 2d
, or a geoHaystack
index on acollection that has a non-simple collation, you must explicitlyspecify {collation: {locale: "simple"} }
when creating theindex.