OpenSearch/docs/reference/vectors/vector-functions.asciidoc

[role="xpack"]
[testenv="basic"]
[[vector-functions]]
===== Functions for vector fields

experimental[]

These functions are used for
for <<dense-vector,`dense_vector`>>  and
<<sparse-vector,`sparse_vector`>> fields.

NOTE: During vector functions' calculation, all matched documents are
linearly scanned. Thus, expect the query time grow linearly 
with the number of matched documents. For this reason, we recommend
to limit the number of matched documents with a `query` parameter.

Let's create an index with the following mapping and index a couple
of documents into it.

[source,js]
--------------------------------------------------
PUT my_index
{
  "mappings": {
    "properties": {
      "my_dense_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "my_sparse_vector" : {
        "type" : "sparse_vector"
      },
      "status" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "my_dense_vector": [0.5, 10, 6],
  "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
  "status" : "published"
}

PUT my_index/_doc/2
{
  "my_dense_vector": [-0.5, 10, 10],
  "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
  "status" : "published"
}

--------------------------------------------------
// CONSOLE
// TESTSETUP

For dense_vector fields, `cosineSimilarity` calculates the measure of
cosine similarity between a given query vector and document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published" <1>
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", <2>
        "params": {
          "query_vector": [4, 3.4, -0.2]  <3>
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE
<1> To restrict the number of documents on which script score calculation is applied, provide a filter.
<2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
<3> To take advantage of the script optimizations, provide a query vector as a script parameter.

NOTE: If a document's dense vector field has a number of dimensions
different from the query's vector, an error will be thrown.

Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
between a given query vector and document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
        "params": {
          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

For dense_vector fields, `dotProduct` calculates the measure of
dot product between a given query vector and document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          double value = dotProduct(params.query_vector, doc['my_dense_vector']);
          return sigmoid(1, Math.E, -value); <1>
        """,
        "params": {
          "query_vector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

<1> Using the standard sigmoid function prevents scores from being negative.

Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
between a given query vector and document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
          return sigmoid(1, Math.E, -value);
        """,
         "params": {
          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

For dense_vector fields, `l1norm` calculates L^1^ distance
(Manhattan distance) between a given query vector and
document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
`l2norm` shown below represent distances or differences. This means, that
the more similar the vectors are, the lower the scores will be that are
produced by the `l1norm` and `l2norm` functions.
Thus, as we need more similar vectors to score higher,
we reversed the output from `l1norm` and `l2norm`. Also, to avoid
division by 0 when a document vector matches the query exactly,
we added `1` in the denominator.

For sparse_vector fields, `l1normSparse` calculates L^1^ distance
between a given query vector and document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",
        "params": {
          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

For dense_vector fields, `l2norm` calculates L^2^ distance
(Euclidean distance) between a given query vector and
document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distance
between a given query vector and document vectors.

[source,js]
--------------------------------------------------
GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l2normSparse(params.queryVector, doc['my_sparse_vector']))",
        "params": {
          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}
--------------------------------------------------
// CONSOLE

NOTE: If a document doesn't have a value for a vector field on which
a vector function is executed, an error will be thrown.

You can check if a document has a value for the field `my_vector` by
`doc['my_vector'].size() == 0`. Your overall script can look like this:

[source,js]
--------------------------------------------------
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
--------------------------------------------------
// NOTCONSOLE
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`[role="xpack"]`
			`[testenv="basic"]`
			`[[vector-functions]]`
			`===== Functions for vector fields`

			`experimental[]`

			`These functions are used for`
			for <<dense-vector,`dense_vector`>> and
			<<sparse-vector,`sparse_vector`>> fields.

			`NOTE: During vector functions' calculation, all matched documents are`
			`linearly scanned. Thus, expect the query time grow linearly`
			`with the number of matched documents. For this reason, we recommend`
			to limit the number of matched documents with a `query` parameter.

			`Let's create an index with the following mapping and index a couple`
			`of documents into it.`

			`[source,js]`
			`--------------------------------------------------`
			`PUT my_index`
			`{`
			`"mappings": {`
			`"properties": {`
			`"my_dense_vector": {`
			`"type": "dense_vector",`
			`"dims": 3`
			`},`
			`"my_sparse_vector" : {`
			`"type" : "sparse_vector"`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`},`
			`"status" : {`
			`"type" : "keyword"`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`}`
			`}`
			`}`
			`}`

			`PUT my_index/_doc/1`
			`{`
			`"my_dense_vector": [0.5, 10, 6],`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},`
			`"status" : "published"`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`}`

			`PUT my_index/_doc/2`
			`{`
			`"my_dense_vector": [-0.5, 10, 10],`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},`
			`"status" : "published"`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`}`

			`--------------------------------------------------`
			`// CONSOLE`
			`// TESTSETUP`

			For dense_vector fields, `cosineSimilarity` calculates the measure of
			`cosine similarity between a given query vector and document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published" <1>`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", <2>`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`"params": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query_vector": [4, 3.4, -0.2] <3>`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`<1> To restrict the number of documents on which script score calculation is applied, provide a filter.`
			`<2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.`
			`<3> To take advantage of the script optimizations, provide a query vector as a script parameter.`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00
			`NOTE: If a document's dense vector field has a number of dimensions`
			`different from the query's vector, an error will be thrown.`

			Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
			`between a given query vector and document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",`
			`"params": {`
			`"query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			For dense_vector fields, `dotProduct` calculates the measure of
			`dot product between a given query vector and document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": """`
			`double value = dotProduct(params.query_vector, doc['my_dense_vector']);`
			`return sigmoid(1, Math.E, -value); <1>`
			`""",`
			`"params": {`
			`"query_vector": [4, 3.4, -0.2]`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			`<1> Using the standard sigmoid function prevents scores from being negative.`

			Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
			`between a given query vector and document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": """`
			`double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);`
			`return sigmoid(1, Math.E, -value);`
			`""",`
			`"params": {`
			`"query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			For dense_vector fields, `l1norm` calculates L^1^ distance
			`(Manhattan distance) between a given query vector and`
			`document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>`
			`"params": {`
			`"queryVector": [4, 3.4, -0.2]`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
			`l2norm` shown below represent distances or differences. This means, that
			`the more similar the vectors are, the lower the scores will be that are`
			produced by the `l1norm` and `l2norm` functions.
			`Thus, as we need more similar vectors to score higher,`
			we reversed the output from `l1norm` and `l2norm`. Also, to avoid
			`division by 0 when a document vector matches the query exactly,`
			we added `1` in the denominator.

			For sparse_vector fields, `l1normSparse` calculates L^1^ distance
			`between a given query vector and document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",`
			`"params": {`
			`"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			For dense_vector fields, `l2norm` calculates L^2^ distance
			`(Euclidean distance) between a given query vector and`
			`document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",`
			`"params": {`
			`"queryVector": [4, 3.4, -0.2]`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distance
			`between a given query vector and document vectors.`

			`[source,js]`
			`--------------------------------------------------`
			`GET my_index/_search`
			`{`
			`"query": {`
			`"script_score": {`
Add filters in examples of vector functions (#45327) 2019-08-08 09:38:05 -04:00			`"query" : {`
			`"bool" : {`
			`"filter" : {`
			`"term" : {`
			`"status" : "published"`
			`}`
			`}`
			`}`
Add l1norm and l2norm distances for vectors (#44116) Add L1norm - Manhattan distance Add L2norm - Euclidean distance relates to #37947 2019-07-11 14:14:23 -04:00			`},`
			`"script": {`
			`"source": "1 / (1 + l2normSparse(params.queryVector, doc['my_sparse_vector']))",`
			`"params": {`
			`"queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}`
			`}`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`

			`NOTE: If a document doesn't have a value for a vector field on which`
			`a vector function is executed, an error will be thrown.`

			You can check if a document has a value for the field `my_vector` by
			`doc['my_vector'].size() == 0`. Your overall script can look like this:

			`[source,js]`
			`--------------------------------------------------`
			`"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"`
			`--------------------------------------------------`
			`// NOTCONSOLE`