The plain highligher fails when it tries to select the fragments based on a query containing either a `has_child` or `has_parent` query.
The plain highligher should just ignore parent/child queries as it makes no sense to highligh a parent match with a has_child as the child documents are not available at highlight time. Instead if child document should be highlighed inner hits should be used.
Parent/child queries already have no effect when the `fvh` or `postings` highligher is used. The test added in this commit verifies that.
Closes#14999
[DOCS] add java REST client docs
Add some docs on how to get started with the Java REST client, some common configuration that may be needed and the sniffer component.
Currently when the `fields` parameter used in a `multi_match` query contains a
wildcard expression that doesn't resolve to any field name in the target index,
MultiMatchQueryBuilder produces a `null` query. This change changes it to be a
MatchNoDocs query, since returning no documents for this case is already the
current behaviour. Also adding missing field names (with and without wildcards)
to the unit and integration test.
This change adds a second ParseField for the `aggs` field in the search
request so both `aggregations` and `aggs` are undeprecated allowed
fields in the search request
Closes#19504
The current heuristic to compute a default shard size is pretty aggressive,
it returns `max(10, number_of_shards * size)` as a value for the shard size.
I think making it less aggressive has the benefit that it would reduce the
likelyness of running into OOME when there are many shards (yearly
aggregations with time-based indices can make numbers of shards in the
thousands) and make the use of breadth-first more likely/efficient.
This commit replaces the heuristic with `size * 1.5 + 10`, which is enough
to have good accuracy on zipfian distributions.
upgrades if it determines the read data is in the legacy
format. It writes the upgraded version if it is not a
read-only repository and caches the repository data if
it is a read-only repository.
Add an assertion to the most popular way of turning the response object
into the actual http response. As it stands all places we return
`201 CREATED` we return the `Location` header. This will help to keep it
that way, though it won't catch all uses.
Followup to #19509
This removes two packages, consolidating them into their parent package
and adds `package-info.java` files to describe all of the packages under
`org.elasticsearch.test.rest`.
Consuming the response body to make it part of the exception message means that it may not be readable anymore later, depending on whether the entity is repeatable or not. Turns out that the response body tells a lot about the error itself, and considering that we don't expect bodies to be incredibly big for errors, we can wrap the entity into a BufferedHttpEntity to make it repeatable.
Closes#19622
When you simulate a pipeline without specifying an id against a node where the request is redirected to a master node,
the request and the response is throwing a NPE:
```
java.lang.NullPointerException
at __randomizedtesting.SeedInfo.seed([3B9536AC6AA23C06:DD62280CF765DA1F]:0)
at org.elasticsearch.common.io.stream.StreamOutput.writeString(StreamOutput.java:300)
at org.elasticsearch.action.ingest.SimulatePipelineRequest.writeTo(SimulatePipelineRequest.java:92)
at org.elasticsearch.transport.local.LocalTransport.sendRequest(LocalTransport.java:222)
at org.elasticsearch.test.transport.AssertingLocalTransport.sendRequest(AssertingLocalTransport.java:95)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:470)
at org.elasticsearch.action.TransportActionNodeProxy.execute(TransportActionNodeProxy.java:51)
at org.elasticsearch.client.transport.support.TransportProxyClient.lambda$execute$441(TransportProxyClient.java:63)
at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:233)
at org.elasticsearch.client.transport.support.TransportProxyClient.execute(TransportProxyClient.java:63)
at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:309)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:67)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:403)
at org.elasticsearch.client.support.AbstractClient$ClusterAdmin.execute(AbstractClient.java:710)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:80)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:54)
at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:62)
at org.elasticsearch.ingest.bano.BanoProcessorIntegrationTest.testSimulateProcessorConfigTarget(BanoProcessorIntegrationTest.java:139)
```
This patch fixes this and adds some random tests.
* fix explain in function_score if no function filter matches
When each function in function_score has a filter but none of them matches
we always assume 1 for the combined functions and then combine that with the
sub query score.
But the explanation did not reflect that because in case no function matched
we did not even use the actual score that was computed in the explanation.
* Update gateway.asciidoc
Added a note to clarify that, in cases where nodes in a cluster have different setting, the node that is the elected master takes precedence over anything else.
* Update gateway.asciidoc
Updated as per @bleskes's comments
The reason for this change is that currently if a user specifies e.g.`2M`
meaning 2 months as a time value instead of throwing an exception
explaining that time units in months are not supported (due to months
having variable time spans) we instead will parse this to 2 minutes.
This could be surprising to a user and could mean put a lot of load on
the cluster performing a task that was never intended and whose results
will be useless anyway.
It is generally accepted that `m` indicates minutes and `M` indicates
months with time values so this is consistent with the expectations a
user might have around specifying time units.
A concrete example of where this causes issues is in the decay score
function which uses TimeValue to parse the scale and offset parameters
of the decay into millisecond values to use in the calculation.
Relates to #19619
Currently to use the ResourceWatcherService to watch files, you
implement a FileChangesListener. However, this is a class, not an
interface, even though it has no base state or anything like that, just
defining a few methods. This change converts FileChangesListener to an
interface.
The tests for authentication extend ESIntegTestCase and use a mock
authentication plugin. This way the clients don't have to worry about
running it. Sadly, that means we don't really have good coverage on the
REST portion of the authentication.
This also adds ElasticsearchStatusException, and exception on which you
can set an explicit status. The nice thing about it is that you can
set the RestStatus that it returns to whatever arbitrary status you like
based on the status that comes back from the remote system.
reindex-from-remote then uses it to wrap all remote failures, preserving
the status from the remote Elasticsearch or whatever proxy is between us
and the remove Elasticsearch.
This adds a second query evaluation metric alongside precision_at. Reciprocal
Rank is defined as 1/rank, where rank is the position of the first relevant
document in the search result. The results are averaged across all queries
across the sample of queries, according to
https://en.wikipedia.org/wiki/Mean_reciprocal_rank