It used to be a hybrid store between `niofs` and `mmapfs`, which we removed when
we switched to `fs` by default (which is `mmapfs` on 64-bits systems).
Currently, the `terms` query is just syctactic sugar for a `bool` query when
used in a query context. This change proposes to always generate the same query
in query and filter contexts, which is less confusing.
When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
These tests using ping timeouts on the order of seconds, but this is
unnecessary since all the sockets are within the same JVM it really
should not take that long.
Relates #21874
In some cases, such as the creation of DiscoveryNode instances for unicast ping requests, the
host information was not being populated properly and instead the address string was being used.
Additionally, when serializing a DiscoveryNode and in turn a transport address, the host was not
being set on the InetAddress when deserializing the object, so even if the address was created
from a hostname, the address in the deserialized instance had no knowledge of the hostname that
was originally used.
SearchTemplateRequest to implement CompositeIndicesRequest
Given that SearchTemplateRequest effectively delegates to search when a search is being executed, it should implement the CompositeIndicesRequest interface. The subrequests method should return a single search request. When a search is not going to be executed, because we are in simulate mode, there are no inner requests, and there are no corresponding indices to that request either.
Closes#21747
These query names were all deprecated in 5.0.0:
- in is removed in favour of terms
- geo_bbox is removed in favour of geo_bounding_box
- mlt is removed in favour of more_like_this
- fuzzy_match and match_fuzzy are removed in favour of match
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.
Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.
When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
The Transport#connectToNodeLight concepts is confusing and not very flexible.
neither really testable on a unittest level. This commit cleans up the code used
to connect to nodes and simplifies transport implementations to share more code.
This also allows to connect to nodes with custom profiles if needed, for instance
future improvements can be added to connect to/from nodes that are non-data nodes without
dedicated bulks and recovery connections.
This commit improves the decision explanation messages,
particularly for NO decisions, in the various AllocationDecider
implementations by including the setting(s) in the explanation
message that led to the decision.
This commit also returns a THROTTLE decision instead of a NO
decision when the concurrent rebalances limit has been reached
in ConcurrentRebalanceAllocationDecider, because it more accurately
reflects a temporary throttling that will turn into a YES decision
once the number of concurrent rebalances lessens, as opposed to a
more permanent NO decision (e.g. due to filtering).
Integrate the patch from LUCENE-6664 into elasticsearch and
add support for handling a graph token stream in match/multi-match
queries.
This fixes longstanding bugs with multi-token synonyms returning
incorrect results with proximity queries.
This is a cleanup of the fix pushed in https://github.com/elastic/elasticsearch/pull/20400.
FiltersFunctionScoreQuery sub query should be extracted in CustomQueryScorer.extract (and not in CustomQueryScorer.extractUnknownQuery).
This does not fix any bug in this branch (it's just a cleanup) but the intent is first to clean up and then to backport in 2.x where there is a real bug.
The bug is in 2.x only because the backport of https://github.com/elastic/elasticsearch/pull/20400 in 2.x mistakenly renamed the FiltersFunctionScoreQuery to FunctionScoreQuery.
This leads to incorrect highlighting on FiltersFunctionScoreQuery in 2.x.
When a file script fails to compile, rather than logging the
exception that caused the failure this logs the xcontent of
that exception. This is both shorter and has the script stack
which is useful for figuring out why the compilation failed.
Still logs the entire stacktrace at debug level just in case
you need it.
Relates to #21733
In making changes for the 5.0 version of snapshots, a bug was
introduced where if an index-N file could not be found for an
individual shard, the backup was to iterate over all snap-*.dat
files in the shard folder to know which snapshots contain that
shard's data, but in 5.0, reading the snap-*.dat files as backup
was incorrectly passing in the blob name for the snap-*.dat file,
thereby failing to load all index files for a given snapshot
when the index-N file is missing. This condition should be rare
as there is no reason an index-N file should be absent (unless
it was deleted or there was corruption reading the file), but
nevertheless, this situation can be encountered and this commit
fixes the bug by reading the correct snap-*.dat blob name in the
shard data folder.
TransportAddress used to be customizable per transport but this has been removed
a while ago. Therefore we can remove all usage of this method as well.
Relates to #20695
* Fix cross_fields type on multi_match query with synonyms
This change fixes the cross_fields type of the multi_match query when synonyms are involved.
Since 2.x the Lucene query parser creates SynonymQuery for words that appear at the same position.
For simple term query the CrossFieldsQueryBuilder expands the term to all requested fields and creates a BlendedTermQuery.
This change adds the same mechanism for SynonymQuery which otherwise are not expanded to all requested fields.
As a side note I wonder if we should not replace the BlendedTermQuery with the SynonymQuery. They have the same purpose and behave similarly.
Fixes#21633
* Fallback to SynonymQuery for blended terms on a single field
The disruption type LongGCDisruption simulates GCs on a node by suspending all the threads of that node. If the suspended threads are in a code section with shared JVM locks, however, it can prevent the other nodes from doing their thing. The class LongGCDisruption has a list of class names for which we know that this can occur. Whenever a test using the GC disruption type fails in mysterious ways, it becomes a long guessing game to find the offending class. This commit adds code to LongGCDisruption to automatically detect these situations, fail the test early and report the offending class and all relevant context.
This commit removes the unused AllocationExplanation class. The
RoutingAllocation class only created an empty instance of it and
never used it anywhere else. The allocation explanations will be
encompassed in the various decision classes exposed via the cluster
allocation explain API. Therefore, there is no reason to keep the
AllocationExplanation class.
With #21738 we added an indices section to the search shards api, that will return the concrete indices hit by the request, and eventually the corresponding alias filter.
The java API returns the AliasFilter object, which holds the filter itself and an array of aliases that pointed to the index in the original request. The REST layer doesn't print out the aliases array though. This commit adds the aliases array as well and tests for this.
Group, List and Affix settings generate a bogus diff that turns the actual
diff into a string containing a json structure for instance:
```
"action" : {
"search" : {
"remote" : {
"" : "{\"my_remote_cluster\":\"[::1]:60378\"}"
}
}
}
```
which make reading the setting impossible. This happens for instance
if a group or affix setting is rendered via `_cluster/settings?include_defaults=true`
This change fixes the issue as well as several minor issues with affix settings that
where not accepted as valid setting today.
Today if a comma-separated list is passed to action.auto_create_index
leading and trailing whitespaces are not trimmed but since the values are
index expressions whitespaces should be removed for convenience.
Closes#21449
* ClusterSearchShardsGroup to return ShardId rather than the int shard id
This allows more info to be retrieved, like the index uuid which is exposed through the ShardId object but was not available before
* Make ClusterSearchShardsResponse empty constructor public
This allows to receive such responses when sending ClusterSearchShardsRequests directly through TransportService (not using ClusterSearchShardsAction via Client), otherwise an empty response cannot be created unless the class that does it is in org.elasticsearch.action, admin.cluster.shards package
* adjust visibility of ClusterSearchShards members
The index uuid is unique across multiple clusters, while the index name is not. Using the index uuid to look up filters in the alias filters map is better and will be needed for multi cluster search.
This commit makes sure that there is only one instance of the two services rather than one per transport action that uses it.
Also, we take their initialization out of guice's hands by binding it to a specific instance. Otherwise those two objects would get created within a constructor that is called by guice. That may cause problem for instance when throwing an exception from such constructors as guice tries all over again to re-initialize objects and fills up logs with stacktraces.
* replace ShardRouting argument in AbstractSearchAsyncAction#onFirstPhaseResult with more contained String nodeId
There is no need to pass in ShardRouting if the only info read from it is the current node id, the shard id can be read directly from the ShardIterator that's already provided as an argument.
* avoid creating a new ShardId when creating a SearchShardTarget in SnapshotsService
Today when handling unreleased versions for backwards compatilibity
support, we scatted version constants across the code base and add some
asserts to support removing these constants when the version in question
is actually released. This commit improves this situation, enabling us
to just add a single unreleased version constant that can be renamed
when the version is actually released. This should make maintenance of
these versions simpler.
Relates #21760
ShardSearchRequest was previously taking in the whole ShardRouting as a constructor argument while it only needs the ShardsId, changed that to carry over only the needed bits.
* Transport client: Fix remove address to actually work
The removeTransportAddress method of TransportClient removes the address
from the list of nodes that the client pings to sniff for nodes.
However, it does not remove it from the list of existing connected
nodes. This means removing a node is not possible, as long as that node
is still up.
This change removes the node from the connected nodes list before
triggering sampling (ie sniffing). While the fix is simple, testing was
not because there were no existing tests for sniffing. This change also
modifies the mocks used by transport client unit tests in order to allow
mocking sniffing.
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
The `error_trace` parameter turns on the `stack_trace` field
in errors which returns stack traces.
Removes documentation for `camelCase` because it hasn't worked
in a while....
Documents the internal parameters used to render stack traces as
internal only.
Closes#21708
This commit refactors the handling of bind permissions, which is in need
of a little cleanup. For example, in its current state, the code for
handling permissions for transport profiles is split across two
methods. This commit refactors this code hopefully making it easier to
work with in future changes. This change is mostly mechanical, no
functionality is changed.
Relates #21742
The test UnicastZenPing#testResolveTimeout chooses a random resolve
timeout between 1ms and 100ms. Close to the lower bound, this is far too
short and the test races against the concurrent resolves executing
before the timeout elapses. This commit increases the timeout to
something that is far less likely to race, yet will not slow the test
down since we are not doing resolves against a real DNS service anyway.
Note that we still want a short resolve timeout since we are testing
whether or not timeouts really work here (by latching one of the
resolves to respond slowly).
Add indices and filter information to search shards api output
The search shards api returns info about which shards are going to be hit by executing a search with provided parameters: indices, routing, preference. Indices can also be aliases, which can also hold filters. The output includes an array of shards and a summary of all the nodes the shards are allocated on. This commit adds a new indices section to the search shards output that includes one entry per index, where each index can be associated with an optional filter in case the index was hit through a filtered alias.
This is relevant since we have moved parsing of alias filters to the coordinating node.
Relates to #20916