OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nik Everett	24a24d050a	Implement fields fetch for runtime fields (backport of #61995 ) (#62416 ) This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to #59332	2020-09-15 20:24:10 -04:00
Nik Everett	b8e9a7125f	Speed up empty highlighting many fields (backport of #61860 ) (#62122 ) Kibana often highlights everything like this: ``` POST /_search { "query": ..., "size": 500, "highlight": { "fields": { "": { ... } } } } ``` This can get slow when there are hundreds of mapped fields. I tested this locally and unscientifically and it took a request from 20ms to 150ms when there are 100 fields. I've seen clusters with 2000 fields where simple search go from 500ms to 1500ms just by turning on this sort of highlighting. Even when the query is just a `range` that and the fields are all numbers and stuff so it won't highlight anything. This speeds up the `unified` highlighter in this case in a few ways: 1. Build the highlighting infrastructure once field rather than once pre document per field. This cuts out a ton* of work analyzing the query over and over and over again. 2. Bail out of the highlighter before loading values if we can't produce any results. Combined these take that local 150ms case down to 65ms. This is unlikely to be really useful when there are only a few fetched docs and only a few fields, but we often end up having many fields with many fetched docs.	2020-09-08 15:49:50 -04:00
Alan Woodward	e2f006eeb4	Merge FetchSubPhase hitsExecute and hitExecute methods (#60907 ) (#61893 ) FetchSubPhase has two 'execute' methods, one which takes all hits to be examined, and one which takes a single HitContext. It's not obvious which one should be implemented by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first run the hitExecute methods of all subphases, and then subsequently call all the hitsExecute methods. This commit reworks FetchSubPhase to replace these two variants with a processor class, `FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method. This processor class has two methods, `setNextReader()` and `process`. FetchPhase collects processors from all its subphases (if a subphase does not need to execute on the current search context, it can return `null` from `getProcessor`). It then sorts its hits by docid, and groups them by lucene leaf reader. For each reader group, it calls `setNextReader()` on all non-null processors, and then passes each doc id to `process()`. Implementations of fetch sub phases can divide their concerns into per-request, per-reader and per-document sections, and no longer need to worry about sorting docs or dealing with reader slices. FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods, setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all these executors together (if a phase should not be executed, then it returns null here); then it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then execute for each hit in turn. Individual sub phases no longer need to concern themselves with sorting docs or keeping track of readers; global structures can be built in getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.	2020-09-03 12:20:55 +01:00
Julie Tibshirani	997c73ec17	Correct how field retrieval handles multifields and copy_to. (#61391 ) Before when a value was copied to a field through a parent field or `copy_to`, we parsed it using the `FieldMapper` from the source field. Instead we should parse it using the target `FieldMapper`. This ensures that we apply the appropriate mapping type and options to the copied value. To implement the fix cleanly, this PR refactors the value parsing strategy. Now instead of looking up values directly, field mappers produce a helper object `ValueFetcher`. The value fetchers are responsible for almost all aspects of fetching, including looking up the right paths in the _source. The PR is fairly big but each commit can be reviewed individually. Fixes #61033.	2020-08-20 15:53:35 -07:00
Julie Tibshirani	dfd7f226f0	Clarify SourceLookup sharing across fetch subphases. (#60484 ) The `SourceLookup` class provides access to the _source for a particular document, specified through `SourceLookup#setSegmentAndDocument`. Previously the search context contained a single `SourceLookup` that was shared between different fetch subphases. It was hard to reason about its state: is `SourceLookup` set to the expected document? Is the _source already loaded and available? Instead of using a global source lookup, the fetch hit context now provides access to a lookup that is set to load from the hit document. This refactor closes #31000, since the same `SourceLookup` is no longer shared between the 'fetch _source phase' and script execution.	2020-07-30 13:22:31 -07:00
Julie Tibshirani	5359417ec3	Minor clean-up around search highlight context. (#60422 ) * Rename SearchContextHighlight -> SearchHighlightContext. * Rename HighlighterContext to FieldHighlightContext. * Make the search highlight context immutable. * Avoid storing SearchHighlightContext on HighlighterContext.	2020-07-29 11:39:17 -07:00
Jake Landis	6ce30bea08	[7.x] Convert most OSS plugins from integTest to [yaml \| java]RestTest or internalClusterTest (#59444 ) (#60343 ) For all OSS plugins (except repository-* and discovery-) integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. This commit does NOT convert the discovery- and repository-* since they are bit more complex then the rest of tests and this PR is large enough. Those plugins will be addressed in a future PR(s). This commit also fixes a minor issue that did not copy the rest api for projects that only had YAML TEST tests. related: #56841	2020-07-29 13:06:13 -05:00
Julie Tibshirani	c7bfb5de41	Add search `fields` parameter to support high-level field retrieval. (#60258 ) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.	2020-07-28 10:58:20 -07:00
Alan Woodward	f4caadd239	MappedFieldType no longer requires equals/hashCode/clone (#59212 ) With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer have any cause to compare MappedFieldType instances. This means that we can remove all equals and hashCode implementations, and in addition we no longer need the clone implementations which were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes, which will be particularly useful for the runtime fields project.	2020-07-09 21:05:10 +01:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
Alan Woodward	8ebd341710	Add text search information to MappedFieldType (#58230 ) (#58432 ) Now that MappedFieldType no longer extends lucene's FieldType, we need to have a way of getting the index information about a field necessary for building text queries, building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo abstraction that holds this information, and a getTextSearchInfo() method to MappedFieldType to make it available. Field types that do not support text search can just return null here. This allows us to remove the MapperService.getLuceneFieldType() shim method.	2020-06-23 14:37:26 +01:00
Alan Woodward	ca2d12d039	Remove Settings parameter from FieldMapper base class (#58237 ) This is currently used to set the indexVersionCreated parameter on FieldMapper. However, this parameter is only actually used by two implementations, and clutters the API considerably. We should just remove it, and use it directly in the implementations that require it.	2020-06-18 12:53:54 +01:00
Alan Woodward	12a3f6dfca	MappedFieldType should not extend FieldType (#58160 ) MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to #56814	2020-06-16 16:56:43 +01:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Alan Woodward	d33d13f2be	Simplify generics on Mapper.Builder (#56747 ) Mapper.Builder currently has some complex generics on it to allow fluent builder construction. However, the second parameter, a return type from the build() method, is unnecessary, as we can use covariant return types. This commit removes this second generic parameter.	2020-05-15 12:14:49 +01:00
Julie Tibshirani	e852bb29b7	Simplify signature of FieldMapper#parseCreateField. (#56144 ) `FieldMapper#parseCreateField` accepts the parse context, plus a list of fields as an output parameter. These fields are immediately added to the document through `ParseContext#doc()`. This commit simplifies the signature by removing the list of fields, and having the mappers add the fields directly to `ParseContext#doc()`. I think this is nicer for implementors, because previously fields could be added either through the list, or the context (through `add`, `addWithKey`, etc.)	2020-05-06 11:12:09 -07:00
Alan Woodward	fe2c65185e	Annotated text type should extend TextFieldType (#49555 ) The annotated text mapper has a field type that currently extends StringFieldType, which means that all the positional-related query factory methods need to be copied over from TextFieldType. In addition, MappedFieldType.intervals() hasn't been overridden, so you can't use intervals queries with annotated text - a major drawback, since one of the purposes of annotated text is to be able to run positional queries against annotations. This commit changes the annotated text field type to extend TextFieldType instead, adding tests to ensure that position queries work correctly. Closes #49289	2019-11-26 16:52:21 +00:00
Jim Ferenczi	bd6e2592a7	Remove the SearchContext from the highlighter context (#47733 ) Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523	2019-10-10 10:34:10 +02:00
Alan Woodward	44c3418531	Simplify handling of keyword field normalizers (#42002 ) We have a number of places in analysis-handling code where we check if a field type is a keyword field, and if so then extract the normalizer rather than pulling the index-time analyzer. However, a keyword normalizer is really just a special case of an analyzer, so we should be able to simplify this by setting the normalizer as the index-time analyzer at construction time.	2019-05-10 14:38:46 +01:00
Andy Bristol	23395a9b9f	search as you type fieldmapper (#35600 ) Adds the search_as_you_type field type that acts like a text field optimized for as-you-type search completion. It creates a couple subfields that analyze the indexed terms as shingles, against which full terms are queried, and a prefix subfield that analyze terms as the largest shingle size used and edge-ngrams, against which partial terms are queried Adds a match_bool_prefix query type that creates a boolean clause of a term query for each term except the last, for which a boolean clause with a prefix query is created. The match_bool_prefix query is the recommended way of querying a search as you type field, which will boil down to term queries for each shingle of the input text on the appropriate shingle field, and the final (possibly partial) term as a term query on the prefix field. This field type also supports phrase and phrase prefix queries however	2019-03-27 13:29:13 -07:00
markharwood	1873de5240	Bug fix for AnnotatedTextHighlighter - port of 39525 (#39749 ) Bug fix for AnnotatedTextHighlighter - port of 39525 Relates to #39395	2019-03-06 19:02:04 +00:00
Julie Tibshirani	c2e9d13ebd	Default include_type_name to false in the yml test harness. (#38058 ) This PR removes the temporary change we made to the yml test harness in #37285 to automatically set `include_type_name` to `true` in index creation requests if it's not already specified. This is possible now that the vast majority of index creation requests were updated to be typeless in #37611. A few additional tests also needed updating here. Additionally, this PR updates the test harness to set `include_type_name` to `false` in index creation requests when communicating with 6.x nodes. This mirrors the logic added in #37611 to allow for typeless document write requests in test set-up code. With this update in place, we can remove many references to `include_type_name: false` from the yml tests.	2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe	21e392e95e	Removes typed calls from YAML REST tests (#37611 ) This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour. The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.	2019-01-30 16:32:58 +00:00
Jim Ferenczi	4351a5e537	Allow field types to optimize phrase prefix queries (#37436 ) This change adds a way to customize how phrase prefix queries should be created on field types. The match phrase prefix query is exposed in field types in order to allow optimizations based on the options set on the field. For instance the text field uses the configured prefix field (if available) to build a span near that mixes the original field and the prefix field on the last position. This change also contains a small refactoring of the match/multi_match query that simplifies the interactions between the builders. Closes #31921	2019-01-17 15:10:28 +01:00
Nhat Nguyen	7580d9d925	Make SourceToParse immutable (#36971 ) Today the routing of a SourceToParse is assigned in a separate step after the object is created. We can easily forget to set the routing. With this commit, the routing must be provided in the constructor of SourceToParse. Relates #36921	2018-12-24 14:06:50 -05:00
Boaz Leskes	733a6d34c1	Add seq no powered optimistic locking support to the index and delete transport actions (#36619 ) This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control) in the delete and index transport actions and requests. A follow up will come with adding sequence numbers to the update and get results. Relates #36148 Relates #10708	2018-12-15 17:59:57 +01:00
Jim Ferenczi	18866c4c0b	Make hits.total an object in the search response (#35849 ) This commit changes the format of the `hits.total` in the search response to be an object with a `value` and a `relation`. The `value` indicates the number of hits that match the query and the `relation` indicates whether the number is accurate (in which case the relation is equals to `eq`) or a lower bound of the total (in which case it is equals to `gte`). This change also adds a parameter called `rest_total_hits_as_int` that can be used in the search APIs to opt out from this change (retrieve the total hits as a number in the rest response). Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain `hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a follow up (to allow numbers to be passed to `track_total_hits`). Relates #33028	2018-12-05 19:49:06 +01:00
Nick Knize	a5e1f4d3a2	Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224 )	2018-11-06 11:55:23 +01:00
Alan Woodward	f243d75f59	Remove special-casing of Synonym filters in AnalysisRegistry (#34034 ) The synonym filters no longer need access to the AnalysisRegistry in their constructors, so we can remove the special-case code and move them to the common analysis module. This commit means that synonyms are no longer available for `server` integration tests, so several of these are either rewritten or migrated to the common analysis module as rest-spec-api tests	2018-09-28 09:02:47 +01:00
markharwood	27ee5e87e4	Fix for test version following backport of annotated_text plugin (#33979 )	2018-09-24 14:19:17 +01:00
Christoph Büscher	b654d986d7	Add OneStatementPerLineCheck to Checkstyle rules (#33682 ) This change adds the OneStatementPerLineCheck to our checkstyle precommit checks. This rule restricts the number of statements per line to one. The resoning behind this is that it is very difficult to read multiple statements on one line. People seem to mostly use it in short lambdas and switch statements in our code base, but just going through the changes already uncovered some actual problems in randomization in test code, so I think its worth it.	2018-09-21 11:52:31 +02:00
markharwood	2fa09f062e	New plugin - Annotated_text field type (#30364 ) New plugin for annotated_text field type. Largely a copy of `text` field type but adds ability to include markdown-like syntax in the text. The “AnnotatedText” class parses text+markup and converts into plain text and AnnotationTokens. The annotation token values are injected unchanged alongside the regular text tokens to provide a form of additional indexed overlay useful in positional searches and highlighting. Annotated_text fields do not support fielddata as we want to phase this out. Also includes a new "annotated" highlighter type that retains annotations and merges in search hits as additional annotation markup. Closes #29467	2018-09-18 10:25:27 +01:00

32 Commits