SynonymFilters produces token streams with stacked tokens such that
conjunction queries need to be parsed in a special way such that the
stacked tokens are added as an innner disjuncition.
Closes#3881
The array holding the payloads (TermVectorFields.payloads) is reused for each token. If the
previous token had payloads but the current token had not, then the payloads of the previous
token were returned, because the payloads of the previous token were never invalidated.
For example, for a field only contained two tokens each occurring once, the first having a
payload and the second not, then for the second token, the payload of the first was returned.
closes#3873
Extended the specialized simplified mapping source method to support metadata mapping fields. These fields can just specified as normal fields, but will automatically be placed as top level mapping field.
Closes#3818
Return a merge_id element in each segment of the segments API, allowing to group segments that are being merged as part of a single merge and indicate which ones are being merged now.
closes#3904
Some tests use AbstractIntegrationTest#indexRandom which sometimes uses async
indexing. This can easily run into queue size based rejections on a slow
box. In that case we should retry blocked indexing.
Now that we properly fixed the ability to set the queue size on the index / bulk thread pool, we should actually set them to a somehow reasonable value to protect from users potentially overflowing our system.
I suggest defaults to be 50 for bulk, and 200 for indexing.
Also, set the thread pool for get, which we should set (in a similar value to a "read" queue size we have today).
closes#3888
We want to make sure that Plugin Manager still downloading plugins from internet.
New tests requires internet access (`@Network` annotation has been added).
By default, tests annotated with `@Network` are not launched.
If you need to run these tests, use `-Dtests.network=true` option.
Closes#3894.
This commit adds support for random merge policies set for every
index created in an AbstractIntegrationTest. It will either set
'logbyte', 'logdoc' or 'tiered' merge policy as well as a random
value for compound files.
Added checks to IndexRequest#source(Object...) to ensure and even number
of parameters. This method otherwise throws an AIOOBException which is
confusing to users and doesn't report the root cause of the problem.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.
So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
- field data filtering: this will fail if doc values are enabled,
- field data cache clearing, even for memory-based doc values formats,
- getting the memory usage for a specific field,
- knowing whether a field is actually multi-valued.
This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).
Closes#3806
Migrate SearchContext.Rewrite to Releasable. All the parent child queries are now implemented as Lucene queries and the state is kept in the Weight.
Completely disable caching for `has_child` and `has_parent` filters, this has never worked and it also can also never work. The matching docIds are cached per segment while the collection of parent ids is top level.
Closes#3822