When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
These tests using ping timeouts on the order of seconds, but this is
unnecessary since all the sockets are within the same JVM it really
should not take that long.
Relates #21874
In some cases, such as the creation of DiscoveryNode instances for unicast ping requests, the
host information was not being populated properly and instead the address string was being used.
Additionally, when serializing a DiscoveryNode and in turn a transport address, the host was not
being set on the InetAddress when deserializing the object, so even if the address was created
from a hostname, the address in the deserialized instance had no knowledge of the hostname that
was originally used.
SearchTemplateRequest to implement CompositeIndicesRequest
Given that SearchTemplateRequest effectively delegates to search when a search is being executed, it should implement the CompositeIndicesRequest interface. The subrequests method should return a single search request. When a search is not going to be executed, because we are in simulate mode, there are no inner requests, and there are no corresponding indices to that request either.
Closes#21747
These query names were all deprecated in 5.0.0:
- in is removed in favour of terms
- geo_bbox is removed in favour of geo_bounding_box
- mlt is removed in favour of more_like_this
- fuzzy_match and match_fuzzy are removed in favour of match
Set lucene version to 6.4.0-snapshot-ec38570 and update all the sha1s/license
Fix invalid combo after upgrade in query_string query. split_on_whitespace=false is disallowed if auto_generate_phrase_queries=true
Adapt the expectations of some tests to the new format of the Lucene explain output
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.
Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.
When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
The Transport#connectToNodeLight concepts is confusing and not very flexible.
neither really testable on a unittest level. This commit cleans up the code used
to connect to nodes and simplifies transport implementations to share more code.
This also allows to connect to nodes with custom profiles if needed, for instance
future improvements can be added to connect to/from nodes that are non-data nodes without
dedicated bulks and recovery connections.
This commit improves the decision explanation messages,
particularly for NO decisions, in the various AllocationDecider
implementations by including the setting(s) in the explanation
message that led to the decision.
This commit also returns a THROTTLE decision instead of a NO
decision when the concurrent rebalances limit has been reached
in ConcurrentRebalanceAllocationDecider, because it more accurately
reflects a temporary throttling that will turn into a YES decision
once the number of concurrent rebalances lessens, as opposed to a
more permanent NO decision (e.g. due to filtering).
Integrate the patch from LUCENE-6664 into elasticsearch and
add support for handling a graph token stream in match/multi-match
queries.
This fixes longstanding bugs with multi-token synonyms returning
incorrect results with proximity queries.
This is a cleanup of the fix pushed in https://github.com/elastic/elasticsearch/pull/20400.
FiltersFunctionScoreQuery sub query should be extracted in CustomQueryScorer.extract (and not in CustomQueryScorer.extractUnknownQuery).
This does not fix any bug in this branch (it's just a cleanup) but the intent is first to clean up and then to backport in 2.x where there is a real bug.
The bug is in 2.x only because the backport of https://github.com/elastic/elasticsearch/pull/20400 in 2.x mistakenly renamed the FiltersFunctionScoreQuery to FunctionScoreQuery.
This leads to incorrect highlighting on FiltersFunctionScoreQuery in 2.x.
When a file script fails to compile, rather than logging the
exception that caused the failure this logs the xcontent of
that exception. This is both shorter and has the script stack
which is useful for figuring out why the compilation failed.
Still logs the entire stacktrace at debug level just in case
you need it.
Relates to #21733
In making changes for the 5.0 version of snapshots, a bug was
introduced where if an index-N file could not be found for an
individual shard, the backup was to iterate over all snap-*.dat
files in the shard folder to know which snapshots contain that
shard's data, but in 5.0, reading the snap-*.dat files as backup
was incorrectly passing in the blob name for the snap-*.dat file,
thereby failing to load all index files for a given snapshot
when the index-N file is missing. This condition should be rare
as there is no reason an index-N file should be absent (unless
it was deleted or there was corruption reading the file), but
nevertheless, this situation can be encountered and this commit
fixes the bug by reading the correct snap-*.dat blob name in the
shard data folder.
TransportAddress used to be customizable per transport but this has been removed
a while ago. Therefore we can remove all usage of this method as well.
Relates to #20695
* Fix cross_fields type on multi_match query with synonyms
This change fixes the cross_fields type of the multi_match query when synonyms are involved.
Since 2.x the Lucene query parser creates SynonymQuery for words that appear at the same position.
For simple term query the CrossFieldsQueryBuilder expands the term to all requested fields and creates a BlendedTermQuery.
This change adds the same mechanism for SynonymQuery which otherwise are not expanded to all requested fields.
As a side note I wonder if we should not replace the BlendedTermQuery with the SynonymQuery. They have the same purpose and behave similarly.
Fixes#21633
* Fallback to SynonymQuery for blended terms on a single field
The disruption type LongGCDisruption simulates GCs on a node by suspending all the threads of that node. If the suspended threads are in a code section with shared JVM locks, however, it can prevent the other nodes from doing their thing. The class LongGCDisruption has a list of class names for which we know that this can occur. Whenever a test using the GC disruption type fails in mysterious ways, it becomes a long guessing game to find the offending class. This commit adds code to LongGCDisruption to automatically detect these situations, fail the test early and report the offending class and all relevant context.
This commit removes the unused AllocationExplanation class. The
RoutingAllocation class only created an empty instance of it and
never used it anywhere else. The allocation explanations will be
encompassed in the various decision classes exposed via the cluster
allocation explain API. Therefore, there is no reason to keep the
AllocationExplanation class.
With #21738 we added an indices section to the search shards api, that will return the concrete indices hit by the request, and eventually the corresponding alias filter.
The java API returns the AliasFilter object, which holds the filter itself and an array of aliases that pointed to the index in the original request. The REST layer doesn't print out the aliases array though. This commit adds the aliases array as well and tests for this.
Group, List and Affix settings generate a bogus diff that turns the actual
diff into a string containing a json structure for instance:
```
"action" : {
"search" : {
"remote" : {
"" : "{\"my_remote_cluster\":\"[::1]:60378\"}"
}
}
}
```
which make reading the setting impossible. This happens for instance
if a group or affix setting is rendered via `_cluster/settings?include_defaults=true`
This change fixes the issue as well as several minor issues with affix settings that
where not accepted as valid setting today.
Today if a comma-separated list is passed to action.auto_create_index
leading and trailing whitespaces are not trimmed but since the values are
index expressions whitespaces should be removed for convenience.
Closes#21449
* ClusterSearchShardsGroup to return ShardId rather than the int shard id
This allows more info to be retrieved, like the index uuid which is exposed through the ShardId object but was not available before
* Make ClusterSearchShardsResponse empty constructor public
This allows to receive such responses when sending ClusterSearchShardsRequests directly through TransportService (not using ClusterSearchShardsAction via Client), otherwise an empty response cannot be created unless the class that does it is in org.elasticsearch.action, admin.cluster.shards package
* adjust visibility of ClusterSearchShards members
The index uuid is unique across multiple clusters, while the index name is not. Using the index uuid to look up filters in the alias filters map is better and will be needed for multi cluster search.
This commit makes sure that there is only one instance of the two services rather than one per transport action that uses it.
Also, we take their initialization out of guice's hands by binding it to a specific instance. Otherwise those two objects would get created within a constructor that is called by guice. That may cause problem for instance when throwing an exception from such constructors as guice tries all over again to re-initialize objects and fills up logs with stacktraces.
* replace ShardRouting argument in AbstractSearchAsyncAction#onFirstPhaseResult with more contained String nodeId
There is no need to pass in ShardRouting if the only info read from it is the current node id, the shard id can be read directly from the ShardIterator that's already provided as an argument.
* avoid creating a new ShardId when creating a SearchShardTarget in SnapshotsService
Today when handling unreleased versions for backwards compatilibity
support, we scatted version constants across the code base and add some
asserts to support removing these constants when the version in question
is actually released. This commit improves this situation, enabling us
to just add a single unreleased version constant that can be renamed
when the version is actually released. This should make maintenance of
these versions simpler.
Relates #21760
ShardSearchRequest was previously taking in the whole ShardRouting as a constructor argument while it only needs the ShardsId, changed that to carry over only the needed bits.
* Transport client: Fix remove address to actually work
The removeTransportAddress method of TransportClient removes the address
from the list of nodes that the client pings to sniff for nodes.
However, it does not remove it from the list of existing connected
nodes. This means removing a node is not possible, as long as that node
is still up.
This change removes the node from the connected nodes list before
triggering sampling (ie sniffing). While the fix is simple, testing was
not because there were no existing tests for sniffing. This change also
modifies the mocks used by transport client unit tests in order to allow
mocking sniffing.
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
The `error_trace` parameter turns on the `stack_trace` field
in errors which returns stack traces.
Removes documentation for `camelCase` because it hasn't worked
in a while....
Documents the internal parameters used to render stack traces as
internal only.
Closes#21708
This commit refactors the handling of bind permissions, which is in need
of a little cleanup. For example, in its current state, the code for
handling permissions for transport profiles is split across two
methods. This commit refactors this code hopefully making it easier to
work with in future changes. This change is mostly mechanical, no
functionality is changed.
Relates #21742
The test UnicastZenPing#testResolveTimeout chooses a random resolve
timeout between 1ms and 100ms. Close to the lower bound, this is far too
short and the test races against the concurrent resolves executing
before the timeout elapses. This commit increases the timeout to
something that is far less likely to race, yet will not slow the test
down since we are not doing resolves against a real DNS service anyway.
Note that we still want a short resolve timeout since we are testing
whether or not timeouts really work here (by latching one of the
resolves to respond slowly).
Add indices and filter information to search shards api output
The search shards api returns info about which shards are going to be hit by executing a search with provided parameters: indices, routing, preference. Indices can also be aliases, which can also hold filters. The output includes an array of shards and a summary of all the nodes the shards are allocated on. This commit adds a new indices section to the search shards output that includes one entry per index, where each index can be associated with an optional filter in case the index was hit through a filtered alias.
This is relevant since we have moved parsing of alias filters to the coordinating node.
Relates to #20916
Today there is no way to get notified if a node is disconnected. Client code
must poll the TransportClient constantly to detect that a node is not connected
anymore in order to react and add new nodes or notify altering etc. For instance
if a hostname gets resolved to an IP but that host is disconnected clients want
to reconnect by resolving the hostname again which is a common situation in cloud
environments.
Closes#21424
Today we eagerly resolve unicast hosts. This means that if DNS changes,
we will never find the host at the new address. Moreover, a single host
failng to resolve causes startup to abort. This commit introduces lazy
resolution of unicast hosts. If a DNS entry changes, there is an
opportunity for the host to be discovered. Note that under the Java
security manager, there is a default positive cache of infinity for
resolved hosts; this means that if a user does want to operate in an
environment where DNS can change, they must adjust
networkaddress.cache.ttl in their security policy. And if a host fails
to resolve, we warn log the hostname but continue pinging other
configured hosts.
When doing DNS resolutions for unicast hostnames, we wait until the DNS
lookups timeout. This appears to be forty-five seconds on modern JVMs,
and it is not configurable. If we do these serially, the cluster can be
blocked during ping for a lengthy period of time. This commit introduces
doing the DNS lookups in parallel, and adds a user-configurable timeout
for these lookups.
Relates #21630
PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.
This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
This commit clarifies the contract of Cache#computeIfAbsent so that an exception that occurs during the execution
of the loader is thrown to all callers. Prior to this commit, the first caller would get the ExecutionException
and other callers that called during the load execution would get null, which is confusing.
The `type` parameter has always been accepted by the search_shards api, probably to make the api and its urls the same as search. Truth is that the type never had any effect, it's been ignored from day one while accepting it may make users think that we actually do something with it.
This commit removes support for the type parameter from the REST layer and the Java API. Backwards compatibility is maintained on the transport layer though.
The new added serialization test also uncovered a bug in the java API where the `ClusterSearchShardsRequest` could be created with no arguments, but the indices were required to be not null otherwise the request couldn't be serialized as `writeTo` would throw NPE. Fixed by setting a default value (empty array) for indices.
When a fatal error tragically closes an index writer, such an error
never makes its way to the uncaught exception handler. This prevents the
node from being torn down if an out of memory error or other fatal error
is thrown in the Lucene layer. This commit ensures that such events
bubble their way up to the uncaught exception handler.
Relates #21721
The PR #21694 was initially planned to go into v6.0.0 and v5.1.0. Due to another PR relying on this one though for backport to v5.0.2, #21694 must go to v5.0.2
as well. As such, the initial backward compatibility rules established by the PR must be changed to include v5.0.2 and above.
As part of #20925 and #21341 we added an "all-fields" mode to the
`query_string` and `simple_query_string`. This would expand the query to
all fields and automatically set `lenient` to true.
However, we should still allow a user to override the `lenient` flag to
whichever value they desire, should they add it in the request. This
commit does that.
When Elasticsearch starts, we go through some initialization before we
install a security manager. Yet, the JVM makes internal policy decisions
on the basis of whether or not a security manager is present. This
commit installs a security manager immediately on startup so that the
JVM always thinks a security manager is present when making such policy
decisions.
Relates #21716
Today we read a vint from the stream to allocate the size of an array up-front
before we start reading the values. This can be dangerous if for instance we read
from a corrupted stream or if some manipulated bytes are send for instance from
an attacker or a fuzzer. In most of the cases we can apply some best effort and
validate the array size to be _sane_ by ensuring we can at read at least N bytes
where N is the expected size of the array.
* Add support for merging custom meta data in tribe node
Currently, when any underlying cluster has custom metadata
(via plugin), tribe node does not store custom meta data in its
cluster state. This is because the tribe node has no idea how to
select the appropriate custom metadata from one or many custom
metadata (corresponding to the number of underlying clusters).
This change adds an interface that custom metadata implementations
can extend to add support for merging mulitple custom metadata of
the same type for storing in the tribe state.
Relates to #20544
Supersedes #20791
* Simplify updating tribe state
* Add tests for merging multiple custom metadata types in tribe node
* cleanup merging custom md logic in tribe service
Today it's not possible to add exceptions to the serialization layer
without breaking BWC. This commit adds the ability to specify the Version
an exception was added that allows to fall back not NotSerializableExceptionWrapper
if the exception is not present in the streams version.
Relates to #21656
TransportSearchAction optimizes the search_type in certain cases, when for instance we are searching against a single shard, or when there is only a suggest section in the request. That optimization is wrapped in a try catch, and when an exception happens we log it and ignore it. This may be a leftover from the past though, as no exception is expected to be thrown in that code block, hence if there is any exception we are probably better off bubbling it up rather than ignoring it.
The `lookupPrototype` method is not used anywhere. Seems like we rather use its `lookupProrotypeSafe` variant (which also throws exception if the prototype is not found) is always. This commit makes the safer variant the default one, by renaming it to "lookupPrototype" and removes the previous "unsafe" variant.
Today we call `writeByte` up to 3x per character in each string written via
`StreamOutput#writeString` this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via `writeBytes` instead reducing the overhead of this opertion.
Closes#21660
The overflows were happening in two places, the parsing of the template that
implicitly truncates the `order` when its value does not fall into the `integer`
range, and the comparator that sorts templates in ascending order, since it
returns `order2-order1`, which might overflow.
Closes#21622
* Fix highlighting on a stored keyword field
The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.
Fixes#21636
* Replace BytesRef#utf8ToString with MappedFieldType#valueForDisplay
If the node name is explicitly set it's not derived from the node ID
meaning that it doesn't immediately appear in the logs. While it can be
tracked down in other places, it would be easier for info purposes if it
just showed up explicitly. This commit adds the node ID to the logs,
whether or not the node name is set.
Relates #21673
Plugins are closed if they implement java.io.Closeable but this is not
clear from the plugin interface. This commit clarifies this by declaring
that Plugins implement java.io.Closeable and adding an empty
implementation to the base Plugin class.
Relates #21669
AbstractScopedSettings has the ability to only apply updates/deletes
to dynamic settings. The flag is currently not respected when a setting
is reset/deleted which causes static node settings to be reset if a non-dynamic
key is reset via `null` value.
Closes#21593
This commit moves several allocation decider related inner classes
into their own top-level class, in order to use more easily in
the allocation explain API. This commit also renames some of those
decision related classes to more suitable names.
This is simply a cosmetic change - no functionality changes with this
commit whatsoever.
To summarize the changes:
1. ShardAllocationDecision renamed to AllocateUnassignedDecision
2. RelocationDecision moved to a top-level class
3. MoveDecision moved to a top-level class
4. RebalanceDecision moved to a top-level class
5. ShardAllocationDecisionTests renamed to AllocateUnassignedDecisionTests
6. NodeRebalanceResult moved to a top-level class
7. ShardAllocationDecision#WeightedDecision moved to a top-level class and renamed to NodeAllocationResult.
The "test" task can complete its execution with a timeout exception before the "block-task" actually starts executing. The test thus has to wait for both to be
completed before checking that the updateTasksPerExecutor map has been properly cleaned up.
According to the docs and our own tests we accept an ids query without specified
types and default to all types in the index mapping in this case. This changes
the builder to reflect this by making the types no longer a required constructor
argument and changes the parser to reflect that.
A first step moving away from the current parsing to use the generalized
Objectparser and ConstructingObjectParser. This PR start by making use of it in
MatchAllQueryBuilder and IdsQueryBuilder.
* Fix compilation in Eclipse
I'm not sure what the bug is, but ecj doesn't like this expression
unless the type is set explicitly.
* Add comment explaining why no diamond operator
This commit exposes the executor service interface from thread
pool. This will enable some high-level concurrency primitives that will
make some code cleaner and simpler.
Relates #21608
The default search timeout is not respected because the timeout is
unconditionally set from the query. This commit fixes this issue, and
adds a test that the default search timeout is correctly attached to the
search context.
Relates #21599
* master: (22 commits)
Add proper toString() method to UpdateTask (#21582)
Fix `InternalEngine#isThrottled` to not always return `false`. (#21592)
add `ignore_missing` option to SplitProcessor (#20982)
fix trace_match behavior for when there is only one grok pattern (#21413)
Remove dead code from GetResponse.java
Fixes date range query using epoch with timezone (#21542)
Do not cache term queries. (#21566)
Updated dynamic mapper section
Docs: Clarify date_histogram bucket sizes for DST time zones
Handle release of 5.0.1
Fix skip reason for stats API parameters test
Reduce skip version for stats API parameter tests
Strict level parsing for indices stats
Remove cluster update task when task times out (#21578)
[DOCS] Mention "all-fields" mode doesn't search across nested documents
InternalTestCluster: when restarting a node we should validate the cluster is formed via the node we just restarted
Fixed bad asciidoc in boolean mapping docs
Fixed bad asciidoc ID in node stats
Be strict when parsing values searching for booleans (#21555)
Fix time zone rounding edge case for DST overlaps
...
This change fixes the rnage query so that an exception is always thrown if the range query uses epoch time together with a time zone. Since epoch time is always UTC it should not be used with a time zone.
Closes#21501
There have been reports that the query cache did not manage to speed up search
requests when the query includes a large number of different sub queries since
a single request may manage to exhaust the whole history (256 queries) while
the query cache only starts caching queries once they appear multiple times in
the history (#16031). On the other hand, increasing the size of the query cache
is a bit controversial (#20116) so this pull request proposes a different
approach that consists of never caching term queries, and not adding them to the
history of queries either. The reasoning is that these queries should be fast
anyway, regardless of caching, so taking them out of the equation should not
cause any slow down. On the other hand, the fact that they are not added to the
cache history anymore means that other queries have greater chances of being
cached.
Adds a version constant for it, bwc indices, and a vagrant upgrade-from
version. Also bumps the "upgrade from" version for the backwards-5.0
test and adds `skip`s for tests that don't fail against 5.0 so we skip
them during the backwards testing.
Finally, this skips the "Shrink index via API" test because it fails
consistently for me. Inconsistently for CI, but consistently for me.
I'll work on making it consistent tomorrow.
A previous commit added strict level parsing for the node stats API, but
that commit missed adding the same for the indices stats API. This
commit rectifies this miss.
Relates #21577
Fixes an issue where the cluster service does not remove an update task from its internal data structures that are used for batching cluster state updates.
* review comments
* checkstyle
This changes only the query parsing behavior to be strict when searching on
boolean values. We continue to accept the variety of values during index time,
but searches will only be parsed using `"true"` or `"false"`.
Resolves#21545
When using TimeUnitRounding with a DAY_OF_MONTH unit, failing tests in #20833
uncovered an issue when the DST shift happenes just one hour after midnight
local time and sets back the clock to midnight, leading to an overlap.
Previously this would lead to two different rounding values, depending on
whether a date before or after the transition was rounded. This change detects
this special case and correct for it by using the previous rounding date for
both cases.
Closes#20833