Search section supports an ext section that is used to provide additional config needed from plugins. It is now tied to sub fetch phases because it is the only section that may need additional config, but there is no reason for the two to be tightly coupled.
It is now possible to register a searchExtParser independently from a sub fetch phase. All a search ext parser does is parsing some ext section of a search request, whose parsed resulting object is stored in the search context for later retrieval.
The context was an object where the parsed info are stored. That is more of what we call the builder since after the search refactoring. No need for generics in FetchSubPhaseParser then. Also the previous setHitsExecutionNeeded wasn't useful, it can be removed as well, given that once there is a parsed ext section, it will become a builder that can be retrieved by the sub fetch phase. The sub fetch phase is responsible for doing nothing in case the builder is not set, meaning that the fetch sub phase is plugged in but the request didn't have the corresponding section.
SearchParseElement is renamed to FetchSubPhaseParser and moved to the search.fetch package. Its parse method doesn't get the SearchContext as argument anymore, only the XContentParser, and the return type is what gets parsed (the fetch sub phase context which we may as well rename later).
It is the parser that initializes the FetchSubPhaseContext then. SearchService retrieves the parser by name, calls parse against it and stores the result of parsing by name. No need for FetchSubPhase.ContextFactory anymore, which can be removed.
Given that doc value fields is our own fetch sub phase, it doesn't need to be implemented like if it was plugged in from the outside. It doesn't need its own fetch sub phase context, but it can just be an instance member in SearchContext
Previously we would disable console logging in certain circumstances
(for example, if Elasticsearch is not in the foreground, or if
Elasticsearch is in the foreground but an exception was thrown during
bootstrap). This commit makes this handling work with Log4j 2. This will
prevent users from seeing double bootstrap check failure messages.
Relates #20387
By default, when an exception causes the JVM to terminate, the stack
trace is printed. In the case of failing bootstrap checks, this stack
trace is useless to the user, and might even distract them from seeing
that the bootstrap checks failed for reasons under their control. With
this commit, we cause the stack trace for a failing bootstrap check to
be truncated.
We also modify some methods to not declare that they throw the top level
checked exception type Exception, but instead explicitly declare the
exceptions that they throw. These exceptions are caught and wrapped in a
BootstrapException so that we can percolate only two exception types out
of Bootstrap#init as checked exception, BootstrapException and
NodeValidationException.
Relates #19989
This commit cleans most of the methods of XContentBuilder so that:
- Jackson's convenience methods are used instead of our custom ones (ie field(String,long) now uses Jackson's writeNumberField(String, long) instead of calling writeField(String) then writeNumber(long))
- null checks are added for all field names and values
- methods are grouped by type in the class source
- methods have the same parameters names
- duplicated methods like field(String, String...) and array(String, String...) are removed
- varargs methods now have the "array" name to reflect that it builds arrays
- unused methods like field(String,BigDecimal) are removed
- all methods now follow the execution path: field(String,?) -> field(String) then value(?), and value(?) -> writeSomething() method. Methods to build arrays also follow the same execution path.
Exposing lucene 6.x minhash tokenfilter
Generate min hash tokens from an incoming stream of tokens that can
be used to estimate document similarity.
Closes#20149
Jython shades `jansi` into it's classpath without changing it's package or
anything like that. This causes attempts to load native code on windows which
blows up tests. This change adds `log4j.skipJansi=true` system property to our
tests as well as to the JVM properties we set.
The BackgroundIndexer now uses auto-generated IDs randomly. This causes some problems
for tests that still rely on the fact that the IDs are increasing integers. This change
exposes all IDs via a Set<String> to iterate over for tests.
This commit configures test logging for Log4j 2. The default logger
configuration uses the console appender but at the error level, so most
tests are missing logging. Instead, this commit provides a configuration
for tests which is picked up from the classpath by Log4j 2 when it
initializes. However, this now means that we can no longer initialize
Log4j with a bare-bones configuration when tests run as doing so will
prevent Log4j 2 from attempting to configure logging via the
classpath. Consequently, we move this needed initialization (as
commented, to avoid a message about a status logger not being configured
when we are preparing to configure Log4j from properties files in the
config directory) to only run when we are explicitly configuring Log4j
from properties files.
Relates #20284
If elasticsearch controls the ID values as well as the documents
version we can optimize the code that adds / appends the documents
to the index. Essentially we an skip the version lookup for all
documents unless the same document is delivered more than once.
On the lucene level we can simply call IndexWriter#addDocument instead
of #updateDocument but on the Engine level we need to ensure that we deoptimize
the case once we see the same document more than once.
This is done as follows:
1. Mark every request with a timestamp. This is done once on the first node that
receives a request and is fixed for this request. This can be even the
machine local time (see why later). The important part is that retry
requests will have the same value as the original one.
2. In the engine we make sure we keep the highest seen time stamp of "retry" requests.
This is updated while the retry request has its doc id lock. Call this `maxUnsafeAutoIdTimestamp`
3. When the engine runs an "optimized" request comes, it compares it's timestamp with the
current `maxUnsafeAutoIdTimestamp` (but doesn't update it). If the the request
timestamp is higher it is safe to execute it as optimized (no retry request with the same
timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one
and update the `maxUnsafeAutoIdTimestamp` unless it's been updated already to a higher value
Relates to #19813
* master:
Avoid NPE in LoggingListener
Randomly use Netty 3 plugin in some tests
Skip smoke test client on JDK 9
Revert "Don't allow XContentBuilder#writeValue(TimeValue)"
[docs] Remove coming in 2.0.0
Don't allow XContentBuilder#writeValue(TimeValue)
[doc] Remove leftover from CONSOLE conversion
Parameter improvements to Cluster Health API wait for shards (#20223)
Add 2.4.0 to packaging tests list
Docs: clarify scale is applied at origin+offest (#20242)
* Params improvements to Cluster Health API wait for shards
Previously, the cluster health API used a strictly numeric value
for `wait_for_active_shards`. However, with the introduction of
ActiveShardCount and the removal of write consistency level for
replication operations, `wait_for_active_shards` is used for
write operations to represent values for ActiveShardCount. This
commit moves the cluster health API's usage of `wait_for_active_shards`
to be consistent with its usage in the write operation APIs.
This commit also changes `wait_for_relocating_shards` from a
numeric value to a simple boolean value `wait_for_no_relocating_shards`
to set whether the cluster health operation should wait for
all relocating shards to complete relocation.
* Addresses code review comments
* Don't be lenient if `wait_for_relocating_shards` is set
* master:
Increase visibility of deprecation logger
Skip transport client plugin installed on JDK 9
Explicitly disable Netty key set replacement
percolator: Fail indexing percolator queries containing either a has_child or has_parent query.
Make it possible for Ingest Processors to access AnalysisRegistry
Allow RestClient to send array-based headers
Silence rest util tests until the bogusness can be simplified
Remove unknown HttpContext-based test as it fails unpredictably on different JVMs
Tests: Improve rest suite names and generated test names for docs tests
Add support for a RestClient base path
This commit adds an empty test to ESLoggerUsageTests to avoid the test
suite from failing for having no tests after the existing tests were
marked as awaits fix in 1d197eddcc.
This commit modifies the call sites that allocate a parameterized
message to use a supplier so that allocations are avoided unless the log
level is fine enough to emit the corresponding log message.
Rest test suites are currently only the directory above the yaml test
file. That is confusing when there are more than one directory level
which contain yaml tests, as there are in generated docs tests. This
change makes rest tests use the full relative path to the rest test root
as the suite name, and also makes the test names for docs tests a little
clearer (that they are testing an example from a specific line number,
instead of just the line number as an opaque test name).
Removed null check for token, if we are outside the null it already means it is null.
Fixed typo in comment and remove leftover assignment to unused local variable.
This test is periodically failing. As I suspect that the GCDisruption scheme is somehow making the wrong node block on
its cluster state update thread, I've added some more logging and a thread dump once the given assertion triggers
again.
Adds an explicit recoverySource field to ShardRouting that characterizes the type of recovery to perform:
- fresh empty shard copy
- existing local shard copy
- recover from peer (primary)
- recover from snapshot
- recover from other local shards on same node (shrink index action)
Objects hierarchy must be tracked when entering/leaving an object so that it better knows if the "newField" has been inserted into an arbitrary holding object.
Can be reproduced with gradle :core:test -Dtests.seed=760F8BD0F7E46D45 -Dtests.class=org.elasticsearch.index.query.MoreLikeThisQueryBuilderTests -Dtests.method="testUnknownObjectException" -Dtests.security.manager=true -Dtests.locale=ko -Dtests.timezone=Etc/Zulu
When need to check the whole hierarchy of objects to know if the newly inserted "newField" object is part of an arbitrary holding object or not.
Reproduced with `gradle :modules:percolator:test -Dtests.seed=736B0B67DA7A3632 -Dtests.class=org.elasticsearch.percolator.PercolateQueryBuilderTests -Dtests.method="testUnknownObjectException" -Dtests.security.manager=true -Dtests.locale=es-ES -Dtests.timezone=ART`
This method fails when a randomized string value contains a double-quote. This commit changes the method so that it is not based on string concatenation anymore. It now use XContentGenerator & XContentParser to mutate the valid queries.
Related #19864
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.
To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:
````
POST _search
{
"stored_fields": "_none_"
}
````
Deprecates the optimize_bbox parameter on geodistance queries. This has no longer been needed since version 2.2 because lucene geo distance queries (postings and LatLonPoint) already optimize by bounding box.
This change converts AllocationDecider registration from push based on
ClusterModule to implementing with a new ClusterPlugin interface.
AllocationDecider instances are allowed to use only Settings and
ClusterSettings.
Adds a class that records changes made to RoutingAllocation, so that at the end of the allocation round other values can be more easily derived based on these changes. Most notably, it:
- replaces the explicit boolean flag that is passed around everywhere to denote changes to the routing table. The boolean flag is automatically updated now when changes actually occur, preventing issues where it got out of sync with actual changes to the routing table.
- records actual changes made to RoutingNodes so that primary term and in-sync allocation ids, which are part of index metadata, can be efficiently updated just by looking at the shards that were actually changed.
In addition to be an allocation decider, DiskThresholdDecider also
monitors the used disk in order to trigger a reroute when the thresholds
are crossed. This change splits out the settings for disk thresholds
into DiskThresholdSettings, and moves the monitoring to a new
DiskThresholdMonitor. DiskThresholdDecider is then in line with other
allocation deciders, needing only Settings and ClusterSettings for
construction, which will allow deguicing allocation deciders.
`LobObtainFailedException` should be reserved for on-disk locks that
Lucene attempts (like `write.lock`). This switches our in-memory
semaphore locks for shards to use a different exception. Additionally,
ShardLockObtainFailedException no longer subclasses IOException, since
no IO is being done is this case.
Resolves#19978