Commit Graph

19 Commits

Author SHA1 Message Date
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Zachary Tong c9db2de41d
[7.x] Comprehensively test supported/unsupported field type:agg combinations (#54451)
* Comprehensively test supported/unsupported field type:agg combinations (#52493)

This adds a test to AggregatorTestCase that allows us to programmatically
verify that an aggregator supports or does not support a particular
field type.  It fetches the list of registered field type parsers,
creates a MappedFieldType from the parser and then attempts to run
a basic agg against the field.

A supplied list of supported VSTypes are then compared against the
output (success or exception) and suceeds or fails the test accordingly.

Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com>
* Skip fields that are not aggregatable

* Use newIndexSearcher() to avoid incompatible readers (#52723)

Lucene's `newSearcher()` can generate readers like ParallelCompositeReader
which we can't use.  We need to instead use our helper `newIndexSearcher`
2020-03-31 14:35:03 -04:00
Alan Woodward 71b703edd1 Rename AtomicFieldData to LeafFieldData (#53554)
This conforms with lucene's LeafReader naming convention, and
matches other per-segment structures in elasticsearch.
2020-03-17 12:30:12 +00:00
Nik Everett 1d1956ee93
Add size support to `top_metrics` (backport of #52662) (#52914)
This adds support for returning the top "n" metrics instead of just the
very top.

Relates to #51813
2020-02-27 16:12:52 -05:00
Nik Everett 146def8caa
Implement top_metrics agg (#51155) (#52366)
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.

At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.

Co-Authored by: Zachary Tong <zach@elastic.co>
2020-02-14 11:19:11 -05:00
Mayya Sharipova e3da60c23d Increase the number of vector dims to 2048 (#46895) 2019-11-20 07:47:33 -05:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Julie Tibshirani ae1ef5fd92 Refactor unit tests for vector functions. (#48662)
This PR performs the following changes:
* Split `ScoreScriptUtilsTests` into `DenseVectorFunctionTests` and
`SparseVectorFunctionTests`. This will make it easier to delete all sparse
vector function tests once we remove support on 8.x.
* As much as possible, break up the large test methods into individual tests
for each vector function (`cosineSimilarity`, `l2norm`, etc.).
2019-10-30 15:36:06 -07:00
Julie Tibshirani 89c65752dc
Update the signature of vector script functions. (#48653)
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
2019-10-29 15:46:05 -07:00
Julie Tibshirani 2664cbd20b
Deprecate the sparse_vector field type. (#48368)
We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
2019-10-23 16:35:03 -07:00
Henning Andersen 42453aec96 Fix XPackPlugin usages in tests (#47252)
XPackPlugin holds data in statics and can only be initialized once. This
caused tests to fail primarily when running with a low max-workers.

Replaced usages with the LocalStateCompositeXPackPlugin, which handles
this properly for testing.
2019-10-02 12:36:02 +02:00
Julie Tibshirani 40c3225d26
First round of optimizations for vector functions. (#46294)
This PR merges the `vectors-optimize-brute-force` feature branch, which makes
the following changes to how vector functions are computed:
* Precompute the L2 norm of each vector at indexing time. (#45390)
* Switch to ByteBuffer for vector encoding. (#45936)
* Decode vectors and while computing the vector function. (#46103) 
* Use an array instead of a List for the query vector. (#46155)
* Precompute the normalized query vector when using cosine similarity. (#46190)

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2019-09-04 14:45:57 -07:00
Julie Tibshirani d94c4dcffb Use float instead of double for query vectors. (#46004)
Currently, when using script_score functions like cosineSimilarity, the query
vector is treated as an array of doubles. Since the stored document vectors use
floats, it seems like the least surprising behavior for the query vectors to
also be float arrays.

In addition to improving consistency, this change may help with some
optimizations we have been considering around vector dot product.
2019-08-28 11:03:14 -07:00
Mayya Sharipova 0c68765088
Adds usage stats for vectors (#45023)
Example of usage:

_xpack/usage

"vectors": {
    "available": true,
    "enabled": true,
    "dense_vector_fields_count" : 1,
    "sparse_vector_fields_count" : 1,
    "dense_vector_dims_avg_count" : 100
}
Backport for #44512
2019-07-31 12:32:41 -04:00
Mayya Sharipova 32cb47b91c Add l1norm and l2norm distances for vectors (#44116)
Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947
2019-07-11 14:30:02 -04:00
Mayya Sharipova 37e1ad7062 Forbid empty doc values on vector functions (#43944)
Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
2019-07-05 18:09:06 -04:00
Mayya Sharipova 756c42f99f
Add dims parameter to dense_vector mapping (#43444) (#43895)
Typically, dense vectors of both documents and queries must have the same
number of dimensions. Different number of dimensions among documents
or query vector indicate an error. This PR enforces that all vectors
for the same field have the same number of dimensions. It also enforces
that query vectors have the same number of dimensions.
2019-07-02 21:14:16 -04:00
Mayya Sharipova 813551e070 Fix eclipse build gradle for vectors project
Closes #43496
2019-06-24 09:22:48 -04:00
Mayya Sharipova aa6248d4d7
Move dense_vector and sparse_vector to module (#43280) (#43333) 2019-06-18 11:56:04 -04:00