Sparse vector query
Sparse vector query
The sparse vector query executes a query consisting of sparse vectors, such as built by a learned sparse retrieval model. This can be achieved with one of two strategies:
- Using an natural language processing model to convert query text into a list of token-weight pairs
- Sending in precalculated token-weight pairs as query vectors
These token-weight pairs are then used in a query against a sparse vector. At query time, query vectors are calculated using the same inference model that was used to create the tokens. When querying, these query vectors are ORed together with their respective weights, which means scoring is effectively a dot product calculation between stored dimensions and query dimensions.
For example, a stored vector {"feature_0": 0.12, "feature_1": 1.2, "feature_2": 3.0}
with query vector {"feature_0": 2.5, "feature_2": 0.2}
would score the document _score = 0.12*2.5 + 3.0*0.2 = 0.9
Example request using an natural language processing model
resp = client.search(
query={
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "the inference ID to produce the token weights",
"query": "the query string"
}
},
)
print(resp)
const response = await client.search({
query: {
sparse_vector: {
field: "ml.tokens",
inference_id: "the inference ID to produce the token weights",
query: "the query string",
},
},
});
console.log(response);
GET _search
{
"query":{
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "the inference ID to produce the token weights",
"query": "the query string"
}
}
}
Example request using precomputed vectors
resp = client.search(
query={
"sparse_vector": {
"field": "ml.tokens",
"query_vector": {
"token1": 0.5,
"token2": 0.3,
"token3": 0.2
}
}
},
)
print(resp)
const response = await client.search({
query: {
sparse_vector: {
field: "ml.tokens",
query_vector: {
token1: 0.5,
token2: 0.3,
token3: 0.2,
},
},
},
});
console.log(response);
GET _search
{
"query":{
"sparse_vector": {
"field": "ml.tokens",
"query_vector": { "token1": 0.5, "token2": 0.3, "token3": 0.2 }
}
}
}
Top level parameters for sparse_vector
field
(Required, string) The name of the field that contains the token-weight pairs to be searched against.
inference_id
(Optional, string) The inference ID to use to convert the query text into token-weight pairs. It must be the same inference ID that was used to create the tokens from the input text. Only one of inference_id
and query_vector
is allowed. If inference_id
is specified, query
must also be specified.
query
(Optional, string) The query text you want to use for search. If inference_id
is specified, query
must also be specified. If query_vector
is specified, query
must not be specified.
query_vector
(Optional, dictionary) A dictionary of token-weight pairs representing the precomputed query vector to search. Searching using this query vector will bypass additional inference. Only one of inference_id
and query_vector
is allowed.
prune
(Optional, boolean) [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If prune
is true but the pruning_config
is not specified, pruning will occur but default values will be used. Default: false.
pruning_config
(Optional, object) [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if prune
is set to true
. If prune
is set to true
but pruning_config
is not specified, default values will be used.
Parameters for pruning_config
are:
tokens_freq_ratio_threshold
(Optional, integer) [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. Tokens whose frequency is more than
tokens_freq_ratio_threshold
times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default:5
.tokens_weight_threshold
(Optional, float) [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. Tokens whose weight is less than
tokens_weight_threshold
are considered insignificant and pruned. This value must be between 0 and 1. Default:0.4
.only_score_pruned_tokens
(Optional, boolean) [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. If
true
we only input pruned tokens into scoring, and discard non-pruned tokens. It is strongly recommended to set this tofalse
for the main query, but this can be set totrue
for a rescore query to get more relevant results. Default:false
.
The default values for tokens_freq_ratio_threshold
and tokens_weight_threshold
were chosen based on tests using ELSERv2 that provided the most optimal results.
Example ELSER query
The following is an example of the sparse_vector
query that references the ELSER model to perform semantic search. For a more detailed description of how to perform semantic search by using ELSER and the sparse_vector
query, refer to this tutorial.
resp = client.search(
index="my-index",
query={
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?"
}
},
)
print(resp)
const response = await client.search({
index: "my-index",
query: {
sparse_vector: {
field: "ml.tokens",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
},
},
});
console.log(response);
GET my-index/_search
{
"query":{
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?"
}
}
}
Multiple sparse_vector
queries can be combined with each other or other query types. This can be achieved by wrapping them in boolean query clauses and using linear boosting:
resp = client.search(
index="my-index",
query={
"bool": {
"should": [
{
"sparse_vector": {
"field": "ml.inference.title_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
},
{
"sparse_vector": {
"field": "ml.inference.description_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
},
{
"multi_match": {
"query": "How is the weather in Jamaica?",
"fields": [
"title",
"description"
],
"boost": 4
}
}
]
}
},
)
print(resp)
const response = await client.search({
index: "my-index",
query: {
bool: {
should: [
{
sparse_vector: {
field: "ml.inference.title_expanded.predicted_value",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
boost: 1,
},
},
{
sparse_vector: {
field: "ml.inference.description_expanded.predicted_value",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
boost: 1,
},
},
{
multi_match: {
query: "How is the weather in Jamaica?",
fields: ["title", "description"],
boost: 4,
},
},
],
},
},
});
console.log(response);
GET my-index/_search
{
"query": {
"bool": {
"should": [
{
"sparse_vector": {
"field": "ml.inference.title_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
},
{
"sparse_vector": {
"field": "ml.inference.description_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
},
{
"multi_match": {
"query": "How is the weather in Jamaica?",
"fields": [
"title",
"description"
],
"boost": 4
}
}
]
}
}
}
This can also be achieved using reciprocal rank fusion (RRF), through an rrf retriever with multiple standard retrievers.
resp = client.search(
index="my-index",
retriever={
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"multi_match": {
"query": "How is the weather in Jamaica?",
"fields": [
"title",
"description"
]
}
}
}
},
{
"standard": {
"query": {
"sparse_vector": {
"field": "ml.inference.title_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
}
}
},
{
"standard": {
"query": {
"sparse_vector": {
"field": "ml.inference.description_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
}
}
}
],
"window_size": 10,
"rank_constant": 20
}
},
)
print(resp)
const response = await client.search({
index: "my-index",
retriever: {
rrf: {
retrievers: [
{
standard: {
query: {
multi_match: {
query: "How is the weather in Jamaica?",
fields: ["title", "description"],
},
},
},
},
{
standard: {
query: {
sparse_vector: {
field: "ml.inference.title_expanded.predicted_value",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
boost: 1,
},
},
},
},
{
standard: {
query: {
sparse_vector: {
field: "ml.inference.description_expanded.predicted_value",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
boost: 1,
},
},
},
},
],
window_size: 10,
rank_constant: 20,
},
},
});
console.log(response);
GET my-index/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"multi_match": {
"query": "How is the weather in Jamaica?",
"fields": [
"title",
"description"
]
}
}
}
},
{
"standard": {
"query": {
"sparse_vector": {
"field": "ml.inference.title_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
}
}
},
{
"standard": {
"query": {
"sparse_vector": {
"field": "ml.inference.description_expanded.predicted_value",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"boost": 1
}
}
}
}
],
"window_size": 10,
"rank_constant": 20
}
}
}
Example ELSER query with pruning configuration and rescore
The following is an extension to the above example that adds a [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. pruning configuration to the sparse_vector
query. The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running sparse_vector
with a pruning_config
on a multi-shard index, we strongly recommend adding a Rescore filtered search results function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
resp = client.search(
index="my-index",
query={
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"prune": True,
"pruning_config": {
"tokens_freq_ratio_threshold": 5,
"tokens_weight_threshold": 0.4,
"only_score_pruned_tokens": False
}
}
},
rescore={
"window_size": 100,
"query": {
"rescore_query": {
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"prune": True,
"pruning_config": {
"tokens_freq_ratio_threshold": 5,
"tokens_weight_threshold": 0.4,
"only_score_pruned_tokens": True
}
}
}
}
},
)
print(resp)
const response = await client.search({
index: "my-index",
query: {
sparse_vector: {
field: "ml.tokens",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
prune: true,
pruning_config: {
tokens_freq_ratio_threshold: 5,
tokens_weight_threshold: 0.4,
only_score_pruned_tokens: false,
},
},
},
rescore: {
window_size: 100,
query: {
rescore_query: {
sparse_vector: {
field: "ml.tokens",
inference_id: "my-elser-model",
query: "How is the weather in Jamaica?",
prune: true,
pruning_config: {
tokens_freq_ratio_threshold: 5,
tokens_weight_threshold: 0.4,
only_score_pruned_tokens: true,
},
},
},
},
},
});
console.log(response);
GET my-index/_search
{
"query":{
"sparse_vector":{
"field": "ml.tokens",
"inference_id": "my-elser-model",
"query":"How is the weather in Jamaica?",
"prune": true,
"pruning_config": {
"tokens_freq_ratio_threshold": 5,
"tokens_weight_threshold": 0.4,
"only_score_pruned_tokens": false
}
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"sparse_vector": {
"field": "ml.tokens",
"inference_id": "my-elser-model",
"query": "How is the weather in Jamaica?",
"prune": true,
"pruning_config": {
"tokens_freq_ratio_threshold": 5,
"tokens_weight_threshold": 0.4,
"only_score_pruned_tokens": true
}
}
}
}
}
}
When performing cross-cluster search, inference is performed on the local cluster.