Script score query

Script score query

Uses a script to provide a custom score for returned documents.

The script_score query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.

Example request

The following script_score query assigns each returned document a score equal to the my-int field value divided by 10.

  1. resp = client.search(
  2. query={
  3. "script_score": {
  4. "query": {
  5. "match": {
  6. "message": "elasticsearch"
  7. }
  8. },
  9. "script": {
  10. "source": "doc['my-int'].value / 10 "
  11. }
  12. }
  13. },
  14. )
  15. print(resp)
  1. response = client.search(
  2. body: {
  3. query: {
  4. script_score: {
  5. query: {
  6. match: {
  7. message: 'elasticsearch'
  8. }
  9. },
  10. script: {
  11. source: "doc['my-int'].value / 10 "
  12. }
  13. }
  14. }
  15. }
  16. )
  17. puts response
  1. const response = await client.search({
  2. query: {
  3. script_score: {
  4. query: {
  5. match: {
  6. message: "elasticsearch",
  7. },
  8. },
  9. script: {
  10. source: "doc['my-int'].value / 10 ",
  11. },
  12. },
  13. },
  14. });
  15. console.log(response);
  1. GET /_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query": {
  6. "match": { "message": "elasticsearch" }
  7. },
  8. "script": {
  9. "source": "doc['my-int'].value / 10 "
  10. }
  11. }
  12. }
  13. }

Top-level parameters for script_score

query

(Required, query object) Query used to return documents.

script

(Required, script object) Script used to compute the score of documents returned by the query.

Final relevance scores from the script_score query cannot be negative. To support certain search optimizations, Lucene requires scores be positive or 0.

min_score

(Optional, float) Documents with a score lower than this floating point number are excluded from the search results.

boost

(Optional, float) Documents’ scores produced by script are multiplied by boost to produce final documents’ scores. Defaults to 1.0.

Notes

Use relevance scores in a script

Within a script, you can access the _score variable which represents the current relevance score of a document.

Use term statistics in a script

Within a script, you can access the _termStats variable which provides statistical information about the terms used in the child query of the script_score query.

Predefined functions

You can use any of the available painless functions in your script. You can also use the following predefined functions to customize scoring:

We suggest using these predefined functions instead of writing your own. These functions take advantage of efficiencies from Elasticsearch’ internal mechanisms.

Saturation

saturation(value,k) = value/(k + value)

  1. "script" : {
  2. "source" : "saturation(doc['my-int'].value, 1)"
  3. }
Sigmoid

sigmoid(value, k, a) = value^a/ (k^a + value^a)

  1. "script" : {
  2. "source" : "sigmoid(doc['my-int'].value, 2, 1)"
  3. }
Random score function

random_score function generates scores that are uniformly distributed from 0 up to but not including 1.

randomScore function has the following syntax: randomScore(<seed>, <fieldName>). It has a required parameter - seed as an integer value, and an optional parameter - fieldName as a string value.

  1. "script" : {
  2. "source" : "randomScore(100, '_seq_no')"
  3. }

If the fieldName parameter is omitted, the internal Lucene document ids will be used as a source of randomness. This is very efficient, but unfortunately not reproducible since documents might be renumbered by merges.

  1. "script" : {
  2. "source" : "randomScore(100)"
  3. }

Note that documents that are within the same shard and have the same value for field will get the same score, so it is usually desirable to use a field that has unique values for all documents across a shard. A good default choice might be to use the _seq_no field, whose only drawback is that scores will change if the document is updated since update operations also update the value of the _seq_no field.

Decay functions for numeric fields

You can read more about decay functions here.

  • double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
  • double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
  • double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)
  1. "script" : {
  2. "source" : "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['dval'].value)",
  3. "params": {
  4. "origin": 20,
  5. "scale": 10,
  6. "decay" : 0.5,
  7. "offset" : 0
  8. }
  9. }

Using params allows to compile the script only once, even if params change.

Decay functions for geo fields
  • double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
  • double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
  • double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
  1. "script" : {
  2. "source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)",
  3. "params": {
  4. "origin": "40, -70.12",
  5. "scale": "200km",
  6. "offset": "0km",
  7. "decay" : 0.2
  8. }
  9. }
Decay functions for date fields
  • double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
  • double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
  • double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
  1. "script" : {
  2. "source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)",
  3. "params": {
  4. "origin": "2008-01-01T01:00:00Z",
  5. "scale": "1h",
  6. "offset" : "0",
  7. "decay" : 0.5
  8. }
  9. }

Decay functions on dates are limited to dates in the default format and default time zone. Also calculations with now are not supported.

Functions for vector fields

Functions for vector fields are accessible through script_score query.

Allow expensive queries

Script score queries will not be executed if search.allow_expensive_queries is set to false.

Faster alternatives

The script_score query calculates the score for every matching document, or hit. There are faster alternative query types that can efficiently skip non-competitive hits:

  • If you want to boost documents on some static fields, use the rank_feature query.
  • If you want to boost documents closer to a date or geographic point, use the distance_feature query.

Transition from the function score query

We recommend using the script_score query instead of function_score query for the simplicity of the script_score query.

You can implement the following functions of the function_score query using the script_score query:

script_score

What you used in script_score of the Function Score query, you can copy into the Script Score query. No changes here.

weight

weight function can be implemented in the Script Score query through the following script:

  1. "script" : {
  2. "source" : "params.weight * _score",
  3. "params": {
  4. "weight": 2
  5. }
  6. }
random_score

Use randomScore function as described in random score function.

field_value_factor

field_value_factor function can be easily implemented through script:

  1. "script" : {
  2. "source" : "Math.log10(doc['field'].value * params.factor)",
  3. "params" : {
  4. "factor" : 5
  5. }
  6. }

For checking if a document has a missing value, you can use doc['field'].size() == 0. For example, this script will use a value 1 if a document doesn’t have a field field:

  1. "script" : {
  2. "source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)",
  3. "params" : {
  4. "factor" : 5
  5. }
  6. }

This table lists how field_value_factor modifiers can be implemented through a script:

ModifierImplementation in Script Score

none

-

log

Math.log10(doc[‘f’].value)

log1p

Math.log10(doc[‘f’].value + 1)

log2p

Math.log10(doc[‘f’].value + 2)

ln

Math.log(doc[‘f’].value)

ln1p

Math.log(doc[‘f’].value + 1)

ln2p

Math.log(doc[‘f’].value + 2)

square

Math.pow(doc[‘f’].value, 2)

sqrt

Math.sqrt(doc[‘f’].value)

reciprocal

1.0 / doc[‘f’].value

decay functions

The script_score query has equivalent decay functions that can be used in scripts.

Functions for vector fields

During vector functions’ calculation, all matched documents are linearly scanned. Thus, expect the query time grow linearly with the number of matched documents. For this reason, we recommend to limit the number of matched documents with a query parameter.

This is the list of available vector functions and vector access methods:

  1. cosineSimilarity – calculates cosine similarity
  2. dotProduct – calculates dot product
  3. l1norm – calculates L1 distance
  4. hamming – calculates Hamming distance
  5. l2norm - calculates L2 distance
  6. doc[].vectorValue – returns a vector’s value as an array of floats
  7. doc[].magnitude – returns a vector’s magnitude

The cosineSimilarity function is not supported for bit vectors.

The recommended way to access dense vectors is through the cosineSimilarity, dotProduct, l1norm or l2norm functions. Please note however, that you should call these functions only once per script. For example, don’t use these functions in a loop to calculate the similarity between a document vector and multiple other vectors. If you need that functionality, reimplement these functions yourself by accessing vector values directly.

Let’s create an index with a dense_vector mapping and index a couple of documents into it.

  1. resp = client.indices.create(
  2. index="my-index-000001",
  3. mappings={
  4. "properties": {
  5. "my_dense_vector": {
  6. "type": "dense_vector",
  7. "index": False,
  8. "dims": 3
  9. },
  10. "my_byte_dense_vector": {
  11. "type": "dense_vector",
  12. "index": False,
  13. "dims": 3,
  14. "element_type": "byte"
  15. },
  16. "status": {
  17. "type": "keyword"
  18. }
  19. }
  20. },
  21. )
  22. print(resp)
  23. resp1 = client.index(
  24. index="my-index-000001",
  25. id="1",
  26. document={
  27. "my_dense_vector": [
  28. 0.5,
  29. 10,
  30. 6
  31. ],
  32. "my_byte_dense_vector": [
  33. 0,
  34. 10,
  35. 6
  36. ],
  37. "status": "published"
  38. },
  39. )
  40. print(resp1)
  41. resp2 = client.index(
  42. index="my-index-000001",
  43. id="2",
  44. document={
  45. "my_dense_vector": [
  46. -0.5,
  47. 10,
  48. 10
  49. ],
  50. "my_byte_dense_vector": [
  51. 0,
  52. 10,
  53. 10
  54. ],
  55. "status": "published"
  56. },
  57. )
  58. print(resp2)
  59. resp3 = client.indices.refresh(
  60. index="my-index-000001",
  61. )
  62. print(resp3)
  1. const response = await client.indices.create({
  2. index: "my-index-000001",
  3. mappings: {
  4. properties: {
  5. my_dense_vector: {
  6. type: "dense_vector",
  7. index: false,
  8. dims: 3,
  9. },
  10. my_byte_dense_vector: {
  11. type: "dense_vector",
  12. index: false,
  13. dims: 3,
  14. element_type: "byte",
  15. },
  16. status: {
  17. type: "keyword",
  18. },
  19. },
  20. },
  21. });
  22. console.log(response);
  23. const response1 = await client.index({
  24. index: "my-index-000001",
  25. id: 1,
  26. document: {
  27. my_dense_vector: [0.5, 10, 6],
  28. my_byte_dense_vector: [0, 10, 6],
  29. status: "published",
  30. },
  31. });
  32. console.log(response1);
  33. const response2 = await client.index({
  34. index: "my-index-000001",
  35. id: 2,
  36. document: {
  37. my_dense_vector: [-0.5, 10, 10],
  38. my_byte_dense_vector: [0, 10, 10],
  39. status: "published",
  40. },
  41. });
  42. console.log(response2);
  43. const response3 = await client.indices.refresh({
  44. index: "my-index-000001",
  45. });
  46. console.log(response3);
  1. PUT my-index-000001
  2. {
  3. "mappings": {
  4. "properties": {
  5. "my_dense_vector": {
  6. "type": "dense_vector",
  7. "index": false,
  8. "dims": 3
  9. },
  10. "my_byte_dense_vector": {
  11. "type": "dense_vector",
  12. "index": false,
  13. "dims": 3,
  14. "element_type": "byte"
  15. },
  16. "status" : {
  17. "type" : "keyword"
  18. }
  19. }
  20. }
  21. }
  22. PUT my-index-000001/_doc/1
  23. {
  24. "my_dense_vector": [0.5, 10, 6],
  25. "my_byte_dense_vector": [0, 10, 6],
  26. "status" : "published"
  27. }
  28. PUT my-index-000001/_doc/2
  29. {
  30. "my_dense_vector": [-0.5, 10, 10],
  31. "my_byte_dense_vector": [0, 10, 10],
  32. "status" : "published"
  33. }
  34. POST my-index-000001/_refresh
Cosine similarity

The cosineSimilarity function calculates the measure of cosine similarity between a given query vector and document vectors.

  1. resp = client.search(
  2. index="my-index-000001",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "bool": {
  7. "filter": {
  8. "term": {
  9. "status": "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
  16. "params": {
  17. "query_vector": [
  18. 4,
  19. 3.4,
  20. -0.2
  21. ]
  22. }
  23. }
  24. }
  25. },
  26. )
  27. print(resp)
  1. response = client.search(
  2. index: 'my-index-000001',
  3. body: {
  4. query: {
  5. script_score: {
  6. query: {
  7. bool: {
  8. filter: {
  9. term: {
  10. status: 'published'
  11. }
  12. }
  13. }
  14. },
  15. script: {
  16. source: "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
  17. params: {
  18. query_vector: [
  19. 4,
  20. 3.4,
  21. -0.2
  22. ]
  23. }
  24. }
  25. }
  26. }
  27. }
  28. )
  29. puts response
  1. const response = await client.search({
  2. index: "my-index-000001",
  3. query: {
  4. script_score: {
  5. query: {
  6. bool: {
  7. filter: {
  8. term: {
  9. status: "published",
  10. },
  11. },
  12. },
  13. },
  14. script: {
  15. source:
  16. "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
  17. params: {
  18. query_vector: [4, 3.4, -0.2],
  19. },
  20. },
  21. },
  22. },
  23. });
  24. console.log(response);
  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "bool" : {
  7. "filter" : {
  8. "term" : {
  9. "status" : "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
  16. "params": {
  17. "query_vector": [4, 3.4, -0.2]
  18. }
  19. }
  20. }
  21. }
  22. }

To restrict the number of documents on which script score calculation is applied, provide a filter.

The script adds 1.0 to the cosine similarity to prevent the score from being negative.

To take advantage of the script optimizations, provide a query vector as a script parameter.

If a document’s dense vector field has a number of dimensions different from the query’s vector, an error will be thrown.

Dot product

The dotProduct function calculates the measure of dot product between a given query vector and document vectors.

  1. resp = client.search(
  2. index="my-index-000001",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "bool": {
  7. "filter": {
  8. "term": {
  9. "status": "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "\n double value = dotProduct(params.query_vector, 'my_dense_vector');\n return sigmoid(1, Math.E, -value); \n ",
  16. "params": {
  17. "query_vector": [
  18. 4,
  19. 3.4,
  20. -0.2
  21. ]
  22. }
  23. }
  24. }
  25. },
  26. )
  27. print(resp)
  1. response = client.search(
  2. index: 'my-index-000001',
  3. body: {
  4. query: {
  5. script_score: {
  6. query: {
  7. bool: {
  8. filter: {
  9. term: {
  10. status: 'published'
  11. }
  12. }
  13. }
  14. },
  15. script: {
  16. source: "\n double value = dotProduct(params.query_vector, 'my_dense_vector');\n return sigmoid(1, Math.E, -value); \n ",
  17. params: {
  18. query_vector: [
  19. 4,
  20. 3.4,
  21. -0.2
  22. ]
  23. }
  24. }
  25. }
  26. }
  27. }
  28. )
  29. puts response
  1. const response = await client.search({
  2. index: "my-index-000001",
  3. query: {
  4. script_score: {
  5. query: {
  6. bool: {
  7. filter: {
  8. term: {
  9. status: "published",
  10. },
  11. },
  12. },
  13. },
  14. script: {
  15. source:
  16. "\n double value = dotProduct(params.query_vector, 'my_dense_vector');\n return sigmoid(1, Math.E, -value); \n ",
  17. params: {
  18. query_vector: [4, 3.4, -0.2],
  19. },
  20. },
  21. },
  22. },
  23. });
  24. console.log(response);
  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "bool" : {
  7. "filter" : {
  8. "term" : {
  9. "status" : "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": """
  16. double value = dotProduct(params.query_vector, 'my_dense_vector');
  17. return sigmoid(1, Math.E, -value);
  18. """,
  19. "params": {
  20. "query_vector": [4, 3.4, -0.2]
  21. }
  22. }
  23. }
  24. }
  25. }

Using the standard sigmoid function prevents scores from being negative.

L1 distance (Manhattan distance)

The l1norm function calculates L1 distance (Manhattan distance) between a given query vector and document vectors.

  1. resp = client.search(
  2. index="my-index-000001",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "bool": {
  7. "filter": {
  8. "term": {
  9. "status": "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))",
  16. "params": {
  17. "queryVector": [
  18. 4,
  19. 3.4,
  20. -0.2
  21. ]
  22. }
  23. }
  24. }
  25. },
  26. )
  27. print(resp)
  1. response = client.search(
  2. index: 'my-index-000001',
  3. body: {
  4. query: {
  5. script_score: {
  6. query: {
  7. bool: {
  8. filter: {
  9. term: {
  10. status: 'published'
  11. }
  12. }
  13. }
  14. },
  15. script: {
  16. source: "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))",
  17. params: {
  18. "queryVector": [
  19. 4,
  20. 3.4,
  21. -0.2
  22. ]
  23. }
  24. }
  25. }
  26. }
  27. }
  28. )
  29. puts response
  1. const response = await client.search({
  2. index: "my-index-000001",
  3. query: {
  4. script_score: {
  5. query: {
  6. bool: {
  7. filter: {
  8. term: {
  9. status: "published",
  10. },
  11. },
  12. },
  13. },
  14. script: {
  15. source: "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))",
  16. params: {
  17. queryVector: [4, 3.4, -0.2],
  18. },
  19. },
  20. },
  21. },
  22. });
  23. console.log(response);
  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "bool" : {
  7. "filter" : {
  8. "term" : {
  9. "status" : "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))",
  16. "params": {
  17. "queryVector": [4, 3.4, -0.2]
  18. }
  19. }
  20. }
  21. }
  22. }

Unlike cosineSimilarity that represent similarity, l1norm and l2norm shown below represent distances or differences. This means, that the more similar the vectors are, the lower the scores will be that are produced by the l1norm and l2norm functions. Thus, as we need more similar vectors to score higher, we reversed the output from l1norm and l2norm. Also, to avoid division by 0 when a document vector matches the query exactly, we added 1 in the denominator.

Hamming distance

The hamming function calculates Hamming distance between a given query vector and document vectors. It is only available for byte and bit vectors.

  1. resp = client.search(
  2. index="my-index-000001",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "bool": {
  7. "filter": {
  8. "term": {
  9. "status": "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "(24 - hamming(params.queryVector, 'my_byte_dense_vector')) / 24",
  16. "params": {
  17. "queryVector": [
  18. 4,
  19. 3,
  20. 0
  21. ]
  22. }
  23. }
  24. }
  25. },
  26. )
  27. print(resp)
  1. const response = await client.search({
  2. index: "my-index-000001",
  3. query: {
  4. script_score: {
  5. query: {
  6. bool: {
  7. filter: {
  8. term: {
  9. status: "published",
  10. },
  11. },
  12. },
  13. },
  14. script: {
  15. source:
  16. "(24 - hamming(params.queryVector, 'my_byte_dense_vector')) / 24",
  17. params: {
  18. queryVector: [4, 3, 0],
  19. },
  20. },
  21. },
  22. },
  23. });
  24. console.log(response);
  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "bool" : {
  7. "filter" : {
  8. "term" : {
  9. "status" : "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "(24 - hamming(params.queryVector, 'my_byte_dense_vector')) / 24",
  16. "params": {
  17. "queryVector": [4, 3, 0]
  18. }
  19. }
  20. }
  21. }
  22. }

Calculate the Hamming distance and normalize it by the bits to get a score between 0 and 1.

L2 distance (Euclidean distance)

The l2norm function calculates L2 distance (Euclidean distance) between a given query vector and document vectors.

  1. resp = client.search(
  2. index="my-index-000001",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "bool": {
  7. "filter": {
  8. "term": {
  9. "status": "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  16. "params": {
  17. "queryVector": [
  18. 4,
  19. 3.4,
  20. -0.2
  21. ]
  22. }
  23. }
  24. }
  25. },
  26. )
  27. print(resp)
  1. response = client.search(
  2. index: 'my-index-000001',
  3. body: {
  4. query: {
  5. script_score: {
  6. query: {
  7. bool: {
  8. filter: {
  9. term: {
  10. status: 'published'
  11. }
  12. }
  13. }
  14. },
  15. script: {
  16. source: "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  17. params: {
  18. "queryVector": [
  19. 4,
  20. 3.4,
  21. -0.2
  22. ]
  23. }
  24. }
  25. }
  26. }
  27. }
  28. )
  29. puts response
  1. const response = await client.search({
  2. index: "my-index-000001",
  3. query: {
  4. script_score: {
  5. query: {
  6. bool: {
  7. filter: {
  8. term: {
  9. status: "published",
  10. },
  11. },
  12. },
  13. },
  14. script: {
  15. source: "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  16. params: {
  17. queryVector: [4, 3.4, -0.2],
  18. },
  19. },
  20. },
  21. },
  22. });
  23. console.log(response);
  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "bool" : {
  7. "filter" : {
  8. "term" : {
  9. "status" : "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  16. "params": {
  17. "queryVector": [4, 3.4, -0.2]
  18. }
  19. }
  20. }
  21. }
  22. }
Checking for missing values

If a document doesn’t have a value for a vector field on which a vector function is executed, an error will be thrown.

You can check if a document has a value for the field my_vector with doc['my_vector'].size() == 0. Your overall script can look like this:

  1. "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
Accessing vectors directly

You can access vector values directly through the following functions:

  • doc[<field>].vectorValue – returns a vector’s value as an array of floats

For bit vectors, it does return a float[], where each element represents 8 bits.

  • doc[<field>].magnitude – returns a vector’s magnitude as a float (for vectors created prior to version 7.5 the magnitude is not stored. So this function calculates it anew every time it is called).

For bit vectors, this is just the square root of the sum of 1 bits.

For example, the script below implements a cosine similarity using these two functions:

  1. resp = client.search(
  2. index="my-index-000001",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "bool": {
  7. "filter": {
  8. "term": {
  9. "status": "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": "\n float[] v = doc['my_dense_vector'].vectorValue;\n float vm = doc['my_dense_vector'].magnitude;\n float dotProduct = 0;\n for (int i = 0; i < v.length; i++) {\n dotProduct += v[i] * params.queryVector[i];\n }\n return dotProduct / (vm * (float) params.queryVectorMag);\n ",
  16. "params": {
  17. "queryVector": [
  18. 4,
  19. 3.4,
  20. -0.2
  21. ],
  22. "queryVectorMag": 5.25357
  23. }
  24. }
  25. }
  26. },
  27. )
  28. print(resp)
  1. response = client.search(
  2. index: 'my-index-000001',
  3. body: {
  4. query: {
  5. script_score: {
  6. query: {
  7. bool: {
  8. filter: {
  9. term: {
  10. status: 'published'
  11. }
  12. }
  13. }
  14. },
  15. script: {
  16. source: "\n float[] v = doc['my_dense_vector'].vectorValue;\n float vm = doc['my_dense_vector'].magnitude;\n float dotProduct = 0;\n for (int i = 0; i < v.length; i++) {\n dotProduct += v[i] * params.queryVector[i];\n }\n return dotProduct / (vm * (float) params.queryVectorMag);\n ",
  17. params: {
  18. "queryVector": [
  19. 4,
  20. 3.4,
  21. -0.2
  22. ],
  23. "queryVectorMag": 5.25357
  24. }
  25. }
  26. }
  27. }
  28. }
  29. )
  30. puts response
  1. const response = await client.search({
  2. index: "my-index-000001",
  3. query: {
  4. script_score: {
  5. query: {
  6. bool: {
  7. filter: {
  8. term: {
  9. status: "published",
  10. },
  11. },
  12. },
  13. },
  14. script: {
  15. source:
  16. "\n float[] v = doc['my_dense_vector'].vectorValue;\n float vm = doc['my_dense_vector'].magnitude;\n float dotProduct = 0;\n for (int i = 0; i < v.length; i++) {\n dotProduct += v[i] * params.queryVector[i];\n }\n return dotProduct / (vm * (float) params.queryVectorMag);\n ",
  17. params: {
  18. queryVector: [4, 3.4, -0.2],
  19. queryVectorMag: 5.25357,
  20. },
  21. },
  22. },
  23. },
  24. });
  25. console.log(response);
  1. GET my-index-000001/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "bool" : {
  7. "filter" : {
  8. "term" : {
  9. "status" : "published"
  10. }
  11. }
  12. }
  13. },
  14. "script": {
  15. "source": """
  16. float[] v = doc['my_dense_vector'].vectorValue;
  17. float vm = doc['my_dense_vector'].magnitude;
  18. float dotProduct = 0;
  19. for (int i = 0; i < v.length; i++) {
  20. dotProduct += v[i] * params.queryVector[i];
  21. }
  22. return dotProduct / (vm * (float) params.queryVectorMag);
  23. """,
  24. "params": {
  25. "queryVector": [4, 3.4, -0.2],
  26. "queryVectorMag": 5.25357
  27. }
  28. }
  29. }
  30. }
  31. }
Bit vectors and vector functions

When using bit vectors, not all the vector functions are available. The supported functions are:

  • hamming – calculates Hamming distance, the sum of the bitwise XOR of the two vectors
  • l1norm – calculates L1 distance, this is simply the hamming distance
  • l2norm - calculates L2 distance, this is the square root of the hamming distance
  • dotProduct – calculates dot product. When comparing two bit vectors, this is the sum of the bitwise AND of the two vectors. If providing float[] or byte[], who has dims number of elements, as a query vector, the dotProduct is the sum of the floating point values using the stored bit vector as a mask.

When comparing floats and bytes with bit vectors, the bit vector is treated as a mask in big-endian order. For example, if the bit vector is 10100001 (e.g. the single byte value 161) and its compared with array of values [1, 2, 3, 4, 5, 6, 7, 8] the dotProduct will be 1 + 3 + 8 = 16.

Here is an example of using dot-product with bit vectors.

  1. resp = client.indices.create(
  2. index="my-index-bit-vectors",
  3. mappings={
  4. "properties": {
  5. "my_dense_vector": {
  6. "type": "dense_vector",
  7. "index": False,
  8. "element_type": "bit",
  9. "dims": 40
  10. }
  11. }
  12. },
  13. )
  14. print(resp)
  15. resp1 = client.index(
  16. index="my-index-bit-vectors",
  17. id="1",
  18. document={
  19. "my_dense_vector": [
  20. 8,
  21. 5,
  22. -15,
  23. 1,
  24. -7
  25. ]
  26. },
  27. )
  28. print(resp1)
  29. resp2 = client.index(
  30. index="my-index-bit-vectors",
  31. id="2",
  32. document={
  33. "my_dense_vector": [
  34. -1,
  35. 115,
  36. -3,
  37. 4,
  38. -128
  39. ]
  40. },
  41. )
  42. print(resp2)
  43. resp3 = client.index(
  44. index="my-index-bit-vectors",
  45. id="3",
  46. document={
  47. "my_dense_vector": [
  48. 2,
  49. 18,
  50. -5,
  51. 0,
  52. -124
  53. ]
  54. },
  55. )
  56. print(resp3)
  57. resp4 = client.indices.refresh(
  58. index="my-index-bit-vectors",
  59. )
  60. print(resp4)
  1. const response = await client.indices.create({
  2. index: "my-index-bit-vectors",
  3. mappings: {
  4. properties: {
  5. my_dense_vector: {
  6. type: "dense_vector",
  7. index: false,
  8. element_type: "bit",
  9. dims: 40,
  10. },
  11. },
  12. },
  13. });
  14. console.log(response);
  15. const response1 = await client.index({
  16. index: "my-index-bit-vectors",
  17. id: 1,
  18. document: {
  19. my_dense_vector: [8, 5, -15, 1, -7],
  20. },
  21. });
  22. console.log(response1);
  23. const response2 = await client.index({
  24. index: "my-index-bit-vectors",
  25. id: 2,
  26. document: {
  27. my_dense_vector: [-1, 115, -3, 4, -128],
  28. },
  29. });
  30. console.log(response2);
  31. const response3 = await client.index({
  32. index: "my-index-bit-vectors",
  33. id: 3,
  34. document: {
  35. my_dense_vector: [2, 18, -5, 0, -124],
  36. },
  37. });
  38. console.log(response3);
  39. const response4 = await client.indices.refresh({
  40. index: "my-index-bit-vectors",
  41. });
  42. console.log(response4);
  1. PUT my-index-bit-vectors
  2. {
  3. "mappings": {
  4. "properties": {
  5. "my_dense_vector": {
  6. "type": "dense_vector",
  7. "index": false,
  8. "element_type": "bit",
  9. "dims": 40
  10. }
  11. }
  12. }
  13. }
  14. PUT my-index-bit-vectors/_doc/1
  15. {
  16. "my_dense_vector": [8, 5, -15, 1, -7]
  17. }
  18. PUT my-index-bit-vectors/_doc/2
  19. {
  20. "my_dense_vector": [-1, 115, -3, 4, -128]
  21. }
  22. PUT my-index-bit-vectors/_doc/3
  23. {
  24. "my_dense_vector": [2, 18, -5, 0, -124]
  25. }
  26. POST my-index-bit-vectors/_refresh

The number of dimensions or bits for the bit vector.

This vector represents 5 bytes, or 5 * 8 = 40 bits, which equals the configured dimensions

  1. resp = client.search(
  2. index="my-index-bit-vectors",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "match_all": {}
  7. },
  8. "script": {
  9. "source": "dotProduct(params.query_vector, 'my_dense_vector')",
  10. "params": {
  11. "query_vector": [
  12. 8,
  13. 5,
  14. -15,
  15. 1,
  16. -7
  17. ]
  18. }
  19. }
  20. }
  21. },
  22. )
  23. print(resp)
  1. const response = await client.search({
  2. index: "my-index-bit-vectors",
  3. query: {
  4. script_score: {
  5. query: {
  6. match_all: {},
  7. },
  8. script: {
  9. source: "dotProduct(params.query_vector, 'my_dense_vector')",
  10. params: {
  11. query_vector: [8, 5, -15, 1, -7],
  12. },
  13. },
  14. },
  15. },
  16. });
  17. console.log(response);
  1. GET my-index-bit-vectors/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "match_all": {}
  7. },
  8. "script": {
  9. "source": "dotProduct(params.query_vector, 'my_dense_vector')",
  10. "params": {
  11. "query_vector": [8, 5, -15, 1, -7]
  12. }
  13. }
  14. }
  15. }
  16. }

This vector is 40 bits, and thus will compute a bitwise & operation with the stored vectors.

  1. resp = client.search(
  2. index="my-index-bit-vectors",
  3. query={
  4. "script_score": {
  5. "query": {
  6. "match_all": {}
  7. },
  8. "script": {
  9. "source": "dotProduct(params.query_vector, 'my_dense_vector')",
  10. "params": {
  11. "query_vector": [
  12. 0.23,
  13. 1.45,
  14. 3.67,
  15. 4.89,
  16. -0.56,
  17. 2.34,
  18. 3.21,
  19. 1.78,
  20. -2.45,
  21. 0.98,
  22. -0.12,
  23. 3.45,
  24. 4.56,
  25. 2.78,
  26. 1.23,
  27. 0.67,
  28. 3.89,
  29. 4.12,
  30. -2.34,
  31. 1.56,
  32. 0.78,
  33. 3.21,
  34. 4.12,
  35. 2.45,
  36. -1.67,
  37. 0.34,
  38. -3.45,
  39. 4.56,
  40. -2.78,
  41. 1.23,
  42. -0.67,
  43. 3.89,
  44. -4.34,
  45. 2.12,
  46. -1.56,
  47. 0.78,
  48. -3.21,
  49. 4.45,
  50. 2.12,
  51. 1.67
  52. ]
  53. }
  54. }
  55. }
  56. },
  57. )
  58. print(resp)
  1. const response = await client.search({
  2. index: "my-index-bit-vectors",
  3. query: {
  4. script_score: {
  5. query: {
  6. match_all: {},
  7. },
  8. script: {
  9. source: "dotProduct(params.query_vector, 'my_dense_vector')",
  10. params: {
  11. query_vector: [
  12. 0.23, 1.45, 3.67, 4.89, -0.56, 2.34, 3.21, 1.78, -2.45, 0.98, -0.12,
  13. 3.45, 4.56, 2.78, 1.23, 0.67, 3.89, 4.12, -2.34, 1.56, 0.78, 3.21,
  14. 4.12, 2.45, -1.67, 0.34, -3.45, 4.56, -2.78, 1.23, -0.67, 3.89,
  15. -4.34, 2.12, -1.56, 0.78, -3.21, 4.45, 2.12, 1.67,
  16. ],
  17. },
  18. },
  19. },
  20. },
  21. });
  22. console.log(response);
  1. GET my-index-bit-vectors/_search
  2. {
  3. "query": {
  4. "script_score": {
  5. "query" : {
  6. "match_all": {}
  7. },
  8. "script": {
  9. "source": "dotProduct(params.query_vector, 'my_dense_vector')",
  10. "params": {
  11. "query_vector": [0.23, 1.45, 3.67, 4.89, -0.56, 2.34, 3.21, 1.78, -2.45, 0.98, -0.12, 3.45, 4.56, 2.78, 1.23, 0.67, 3.89, 4.12, -2.34, 1.56, 0.78, 3.21, 4.12, 2.45, -1.67, 0.34, -3.45, 4.56, -2.78, 1.23, -0.67, 3.89, -4.34, 2.12, -1.56, 0.78, -3.21, 4.45, 2.12, 1.67]
  12. }
  13. }
  14. }
  15. }
  16. }

This vector is 40 individual dimensions, and thus will sum the floating point values using the stored bit vector as a mask.

Currently, the cosineSimilarity function is not supported for bit vectors.

Explain request

Using an explain request provides an explanation of how the parts of a score were computed. The script_score query can add its own explanation by setting the explanation parameter:

  1. resp = client.explain(
  2. index="my-index-000001",
  3. id="0",
  4. query={
  5. "script_score": {
  6. "query": {
  7. "match": {
  8. "message": "elasticsearch"
  9. }
  10. },
  11. "script": {
  12. "source": "\n long count = doc['count'].value;\n double normalizedCount = count / 10;\n if (explanation != null) {\n explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);\n }\n return normalizedCount;\n "
  13. }
  14. }
  15. },
  16. )
  17. print(resp)
  1. response = client.explain(
  2. index: 'my-index-000001',
  3. id: 0,
  4. body: {
  5. query: {
  6. script_score: {
  7. query: {
  8. match: {
  9. message: 'elasticsearch'
  10. }
  11. },
  12. script: {
  13. source: "\n long count = doc['count'].value;\n double normalizedCount = count / 10;\n if (explanation != nil) {\n explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);\n }\n return normalizedCount;\n "
  14. }
  15. }
  16. }
  17. }
  18. )
  19. puts response
  1. const response = await client.explain({
  2. index: "my-index-000001",
  3. id: 0,
  4. query: {
  5. script_score: {
  6. query: {
  7. match: {
  8. message: "elasticsearch",
  9. },
  10. },
  11. script: {
  12. source:
  13. "\n long count = doc['count'].value;\n double normalizedCount = count / 10;\n if (explanation != null) {\n explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);\n }\n return normalizedCount;\n ",
  14. },
  15. },
  16. },
  17. });
  18. console.log(response);
  1. GET /my-index-000001/_explain/0
  2. {
  3. "query": {
  4. "script_score": {
  5. "query": {
  6. "match": { "message": "elasticsearch" }
  7. },
  8. "script": {
  9. "source": """
  10. long count = doc['count'].value;
  11. double normalizedCount = count / 10;
  12. if (explanation != null) {
  13. explanation.set('normalized count = count / 10 = ' + count + ' / 10 = ' + normalizedCount);
  14. }
  15. return normalizedCount;
  16. """
  17. }
  18. }
  19. }
  20. }

Note that the explanation will be null when using in a normal _search request, so having a conditional guard is best practice.