During adding the new settings infrastructure the option to specify the
size of the filter cache as a percentage of the heap size which accidentally
removed. This change adds that ability back.
In addition the `Setting` class had multiple `.byteSizeSetting` methods
which all except one used `ByteSizeValue.parseBytesSizeValue` to parse
the value. One method used `MemorySizeValue.parseBytesSizeValueOrHeapRatio`.
This was confusing as the way the value was parsed depended on how many
arguments were provided.
This change makes all `Setting.byteSizeSetting` methods parse the value
the same way using `ByteSizeValue.parseBytesSizeValue` and adds
`Setting.memorySizeSetting` methods to parse settings that express memory
sizes (i.e. can be absolute bytes values or percentages). Relevant settings
have been moved to use these new methods.
Closes#20330
Exposing lucene 6.x minhash tokenfilter
Generate min hash tokens from an incoming stream of tokens that can
be used to estimate document similarity.
Closes#20149
This changes DiskThresholdDecider to only factor in leaving shards when
checking if a shard can remain. Previously, leaving shards were factored
in for both the `canAllocate` and `canRemain` checks, however, this
makes only the leaving shard sizes subtracted in the `canRemain` check.
It was possible that multiple shards relocating away from the node would
have their entire size subtracted, and the node had a chance to go over
the disk threshold (or hit the disk full) because it subtracted space
that was still being used for other in-progress relocations.
** The default script language is now maintained in `Script` class.
* Added `script.legacy.default_lang` setting that controls the default language for scripts that are stored inside documents (for example percolator queries). This defaults to groovy.
** Added `QueryParseContext#getDefaultScriptLanguage()` that manages the default scripting language. Returns always `painless`, unless loading query/search request in legacy mode then the returns what is configured in `script.legacy.default_lang` setting.
** In the aggregation parsing code added `ParserContext` that also holds the default scripting language like `QueryParseContext`. Most parser don't have access to `QueryParseContext`. This is for scripts in aggregations.
* The `lang` script field is always serialized (toXContent).
Closes#20122
When Elasticsearch depended on Log4j 1, there was jar hell from the
log4j and the apache-log4j-extras jar. As these dependencies are gone,
the jar hell exemption for Log4j 1 can be removed.
Relates #20336
This commit expands on the message printed when config files are
preserved when removing a plugin to give the user an indication of the
reason the config files are preserved.
Replicated operation consist of a routing action (the original), which is in charge of sending the operation to the primary shard, a primary action which executes the operation on the resolved primary and replica actions which performs the operation on a specific replica. This commit adds the targeted shard's allocation id to the primary and replica actions and makes sure that those match the shard the actions end up executing on.
This helps preventing extremely rare failure mode where a shard moves off a node and back to it, all between an action is sent and the time it's processed.
For example:
1) Primary action is sent to a relocating primary on node A.
2) The primary finishes relocation to node B and start relocating back.
3) The relocation back gets to the phase and opens up the target engine, on the original node, node A.
4) The primary action is executed on the target engine before the relocation finishes, at which the shard copy on node B is still the official primary - i.e., it is executed on the wrong primary.
When removing a plugin with a config directory, we preserve the config
directory. This is because the workflow for upgrading a plugin involves
removing and then installing the plugin again and losing the plugin
config in this case would be terrible. This commit causes a message
regarding this to be printed in case the user wants to manually delete
these files.
This commit removes a line-length violation in RemovePluginCommand.java
and removes this file from the list of files for which the line-length
check is suppressed.
We have intentionally introduced leniency for ThrowableProxy from Log4j
to work around a bug there. Yet, a test for this introduced leniency was
not addded. This commit introduces such a test.
Relates #20329
Previously we had an exemption for Joda-Time BaseDateTime because we
forked this class to remove the usage of a volatile field. This hack is
no longer in place, so the exemption is no longer necessary. This commit
removes that exemption.
Relates #20328
The BackgroundIndexer now uses auto-generated IDs randomly. This causes some problems
for tests that still rely on the fact that the IDs are increasing integers. This change
exposes all IDs via a Set<String> to iterate over for tests.
A warning was introduced if old log config files are present (e.g.,
logging.yml). However, this check is executed unconditionally. This can
lead to no such file exceptions when logging configs are not being
resolved, for example when installing a plugin. This commit moves this
check to only execute when logging configs are being resolved.
Some assertions in MaxMapCountCheckTests assert that certain messages
are logged. These assertions pass everywhere except Windows where the
JVM seems confused. The issue is not the javac compiler as the bytecode
produced on OS X and Windows is identical for the relevant classes so
this leaves a possible JVM bug. It is not worth investigating the
ultimate cause of this bug so instead this commit introduces a
workaround.
Log4j has a bug where it does not handle a security exception that can
be thrown when it is rendering a stack trace. This commit intentionally
introduces jar hell with the ThrowableProxy class to work around this
bug until a fix is a released.
Relates #20306
To ensure we don't add documents more than once even if it's mostly paranoia
except of one case where we relocated a shards away and back to the same node
while an initial request is in flight but has not yet finished AND is retried.
Yet, this is a possible case and for that reason we ensure we pass on the
maxUnsafeAutoIdTimestamp on when we prepare for translog recovery.
Relates to #20211
Currently it does not because our parsers do not support big integers/decimals
(on purpose) but we do not have to ask our parser for the number type, we can
just ask the jackson parser for a number representation of the value with the
right type.
Note that I did not add similar tests for big decimals because Jackson seems to
never return big decimals, even for decimal values that are out of the range of
values that can be represented by doubles.
Closes#11508
This commit configures test logging for Log4j 2. The default logger
configuration uses the console appender but at the error level, so most
tests are missing logging. Instead, this commit provides a configuration
for tests which is picked up from the classpath by Log4j 2 when it
initializes. However, this now means that we can no longer initialize
Log4j with a bare-bones configuration when tests run as doing so will
prevent Log4j 2 from attempting to configure logging via the
classpath. Consequently, we move this needed initialization (as
commented, to avoid a message about a status logger not being configured
when we are preparing to configure Log4j from properties files in the
config directory) to only run when we are explicitly configuring Log4j
from properties files.
Relates #20284
Rather than checking that those values are greater than 0, we can sum up the values gotten from all nodes and check that what is returned is that same value.
The mem section was buggy in cluster stats and removed. It is now added back with the same structure as in node stats, containing total memory, available memory, used memory and percentages. All the values are the sum of all the nodes across the cluster (or at least the ones that we were able to get the values from).
If elasticsearch controls the ID values as well as the documents
version we can optimize the code that adds / appends the documents
to the index. Essentially we an skip the version lookup for all
documents unless the same document is delivered more than once.
On the lucene level we can simply call IndexWriter#addDocument instead
of #updateDocument but on the Engine level we need to ensure that we deoptimize
the case once we see the same document more than once.
This is done as follows:
1. Mark every request with a timestamp. This is done once on the first node that
receives a request and is fixed for this request. This can be even the
machine local time (see why later). The important part is that retry
requests will have the same value as the original one.
2. In the engine we make sure we keep the highest seen time stamp of "retry" requests.
This is updated while the retry request has its doc id lock. Call this `maxUnsafeAutoIdTimestamp`
3. When the engine runs an "optimized" request comes, it compares it's timestamp with the
current `maxUnsafeAutoIdTimestamp` (but doesn't update it). If the the request
timestamp is higher it is safe to execute it as optimized (no retry request with the same
timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one
and update the `maxUnsafeAutoIdTimestamp` unless it's been updated already to a higher value
Relates to #19813
* master:
Avoid NPE in LoggingListener
Randomly use Netty 3 plugin in some tests
Skip smoke test client on JDK 9
Revert "Don't allow XContentBuilder#writeValue(TimeValue)"
[docs] Remove coming in 2.0.0
Don't allow XContentBuilder#writeValue(TimeValue)
[doc] Remove leftover from CONSOLE conversion
Parameter improvements to Cluster Health API wait for shards (#20223)
Add 2.4.0 to packaging tests list
Docs: clarify scale is applied at origin+offest (#20242)
We have specific support for writing `TimeValue`s in the form of
`XContentBuilder#timeValueField`. Writing a `TimeValue` using
`XContentBuilder#writeValue` is a bug waiting to happen.
* Params improvements to Cluster Health API wait for shards
Previously, the cluster health API used a strictly numeric value
for `wait_for_active_shards`. However, with the introduction of
ActiveShardCount and the removal of write consistency level for
replication operations, `wait_for_active_shards` is used for
write operations to represent values for ActiveShardCount. This
commit moves the cluster health API's usage of `wait_for_active_shards`
to be consistent with its usage in the write operation APIs.
This commit also changes `wait_for_relocating_shards` from a
numeric value to a simple boolean value `wait_for_no_relocating_shards`
to set whether the cluster health operation should wait for
all relocating shards to complete relocation.
* Addresses code review comments
* Don't be lenient if `wait_for_relocating_shards` is set
* master:
Increase visibility of deprecation logger
Skip transport client plugin installed on JDK 9
Explicitly disable Netty key set replacement
percolator: Fail indexing percolator queries containing either a has_child or has_parent query.
Make it possible for Ingest Processors to access AnalysisRegistry
Allow RestClient to send array-based headers
Silence rest util tests until the bogusness can be simplified
Remove unknown HttpContext-based test as it fails unpredictably on different JVMs
Tests: Improve rest suite names and generated test names for docs tests
Add support for a RestClient base path
The deprecation logger is an important way to make visible features of
Elasticsearch that are deprecated. Yet, the default logging makes the
log messages for the deprecation logger invisible. We want these log
messages to be visible, so the default logging for the deprecation
logger should enable these log messages. This commit changes the log
level of deprecation log message to warn, and configures the deprecation
logger so that these log messages are visible out of the box.
Relates #20254
This commit enables CLI tools to have console logging. For the CLI
tools, we skip configuring the logging infrastructure via the config
file, and instead set the level only via a system property.
This commit modifies the call sites that allocate a parameterized
message to use a supplier so that allocations are avoided unless the log
level is fine enough to emit the corresponding log message.
While removing an index isn't actually an alias action, if we add
an alias action that deletes an index then we can delete and index
and add an alias with the same name as the index atomically, in
the same cluster state update.
Closes#20064
This commit adds the support for exclusion filter to the response filtering (filter_path) feature. It changes the XContentBuilder APIs so that it now accepts two types of filters: inclusive and exclusive. Filters are no more String arrays but sets of String instead.
Instead of get, set and remove we do get, remove and then set to avoid type conflicts in IngestDocument.
If the set still fails we try to restore the original field in ingest document.
Closes#19892
already has a file of the same name in the Store, but is different
in content (different checksum/length), then those files are first
deleted before restoring the files in question.
This changes Elasticsearch to automatically downgrade `text` and
`keyword` fields into appropriate `string` fields when changing the
mapping of indexes imported from 2.x. This allows users to use the
modern, documented syntax against 2.x indexes. It also makes it clear
that reindexing in order to recreate the index in 5.0 is required for
any long lived indexes. This change is useful for the times when you
can't (cluster is just starting, not stable enough for reindex) or
shouldn't (index will only live 90 days or something).
Now that #20081 is merged we can check that cacheKey is consistent across equal search requests, something that wasn't true before due to ordering of map keys when using index boost.
Relates to #19986
Netty3/4 TcpTransport implementations are creating thread factories with a "http_server" thread prefix whereas it should start with "transport_server" and let the "http_server" prefix for the HttpServerTransport implementations.
This test is periodically failing. As I suspect that the GCDisruption scheme is somehow making the wrong node block on
its cluster state update thread, I've added some more logging and a thread dump once the given assertion triggers
again.
today we fsync in a blocking fashion where all threads block while another
syncs. Yet, we can improve this and make use of the async infrastrucutre added
for `wait_for_refresh` and make fsyncing single threaded while all other threads
can continue indexing. The syncing thread then notifies a listener once the requests
location is synced. This also allows to send docs to replicas before its actually fsynced
allowing for cocurrent replica processing.
This patch has a significant impact on performance on slower discs. An initial single node benchmark
shows that on very fast SSDs there is no noticable impact but on slow spinning disk this
patch shows a ~32% performance improvement.
```
NVME SSD:
336ec0ac9a (master):
Total docs/sec: 47200.9
Total docs/sec: 46440.4
23543a97e3e7f72a31e26b50e00931919784426c (async wait for translog):
Total docs/sec: 47461.6
Total docs/sec: 46188.3
-------------------------------------------------------------------
Spinning disk:
336ec0ac9a (master):
Total docs/sec: 22733.0
Total docs/sec: 24129.8
23543a97e3e7f72a31e26b50e00931919784426c (async wait for translog):
Total docs/sec: 32724.1
Total docs/sec: 32845.4
--------------------------------------------------------------------
```
Currently, after at least one pipeline is registered it is getting rebuilt on every single cluster state update, even when this update is not related to ingest metadata. This change adds a check that the ingest metadata changed before trying to rebuild all pipelines.
Adds an explicit recoverySource field to ShardRouting that characterizes the type of recovery to perform:
- fresh empty shard copy
- existing local shard copy
- recover from peer (primary)
- recover from snapshot
- recover from other local shards on same node (shrink index action)
* Fix NPE during search with source filtering if the source is disabled.
Instead of throwing an NPE, a search response with source filtering will not contain the source if it is disabled in the mapping.
Closes#7758
* Created unit tests for FetchSourceSubPhase. Tests similar to SourceFetchingIT.
Removed SourceFetchingIT#testSourceDisabled (now covered via unit test FetchSourceSubPhaseTests#testSourceDisabled).
* Updated FetchSouceSubPhase unit tests per comments.
Renamed main unit test method.
Use assertEquals and assertNull instead of assertThat (less code).
Fixes an issue where the value for the `script.engine.<lang>.inline`
settings would be _set_ properly, but would not accurately be reflected
in the `include_defaults` output. Adds a test to ensure the default raw
setting is now correct.
Resolves#20159
* Make sure indexBoost is serialized in a consistent order
* remove hasIndexBoost by using indexBoost size
* Make sure phrase suggester's collateParams is serialized in consistent
order
* Make StreamOutput writer to serialize maps in consistent order
The class Setting holds a static reference to a deprecation logger
instance. When the class initializer for Setting runs, it starts
triggering log4j initialization. There is a chain of initializations
from InternalSettingsPreparer to Environment to Setting that triggers
this initialization before log4j configuration has occurred. This commit
modifies this initialization so that initialization is not done eagerly.
Relates #20170
This makes GET operations more consistent with `_search` operations which expect
`(stored_)fields` to work on stored fields and source filtering to work on the
`_source` field. This is now possible thanks to the fact that GET operations
do not read from the translog anymore (#20102) and also allows to get rid of
`FieldMapper#isGenerated`.
The `_termvectors` API (and thus more_like_this too) was relying on the fact
that GET operations would extract fields from either stored fields or the source
so the logic to do this that used to exist in `ShardGetService` has been moved
to `TermVectorsService`. It would be nice that term vectors do not rely on this,
but this does not seem to be a low hanging fruit.
Non-stale shard copies are currently tracked using their allocation ids in the cluster state. When a node leaves the cluster, shard copies of that node are marked as stale by removing their allocation ids from the active set in the cluster. For full cluster restarts, this can have the unwanted effect that only the last node holding a copy of the shard will be seen as non-stale. The other shard copies are not really stale though as long as no writes have happened on this shard copy. Shard copies should thus only be marked as stale (by the master in the cluster state) if other active shards have received writes.
This commit implements the above logic and also renames the persistent structure used to track non-stale shard copies from "active_allocations" to "in_sync_allocations" as we now also support tracking non-stale shard copies that have no active routing entries in the cluster state.
We need to get the string representation of numbers in order to include in
`_all`. However this has a cost and disabling `_all` is rather common so we
should look into skipping it.
The network types in use on a cluster can be useful information to have,
so this commit adds aggregate metrics for the network types in use in a
cluster to the cluster stats.
Relates #20144
This moves the Writer interface from StreamOutput into Writeable, as a peer of its inner Reader interface. This should hopefully help to avoid random functional interfaces being created for the same purpose.
It also makes use of the moved class by updating writeMapOfLists and readMapOfLists.
Previous to this change the nesting of aggregation profiling results
would be incorrect when the request contains a terms aggregation and the
collect mode is (implicitly or explicitly) set to `breadth_first`. This
was because the aggregation profiling has to make the assumption that
the `preCollection()` method of children aggregations is always called in
the `preCollection()` method of their parent aggregation. When the collect
mode is `breadth_first` the `preCollection` of the children aggregations
was delayed until the documents were replayed.
This change moves the `preCollection()` of deferred aggregations to run
during the `preCollection()` of the parent aggregation. This should have
no adverse impact on the breadth_first mode as there is no allocation of
memory in any of the aggregations.
We also apply the same logic to the diversified sampler aggregation as
we did to the terms aggregation to move the `preCollection()` of the
child aggregations method to be called during the `preCollection()` of
the parent aggregation.
This commit also includes a fix so that the `ProfilingLeafBucketCollector`
propagates the scorer to its delegate so the diversified sampler agg works
when profiling is enabled.
If they are specified by a mapping update, these properties are currently
ignored. This commit also fixes the handling of `dynamic_templates` so that it
is possible to remove templates (and so that it works more similarly to all
other mapping properties).
Closes#20111
This is a house cleaning commit that refactors GeoPointFieldMapperLegacy to LegacyGeoPointFieldMapper for consistency with Legacy Numerics and IP field mappers.
IndexedGeoBoundingBoxQuery and InMemoryGeoBoundingBoxQuery are also deprecated and refactored as Legacy classes.
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.
To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:
````
POST _search
{
"stored_fields": "_none_"
}
````
Today we do a lot of accounting inside the engine to maintain locations
of documents inside the transaction log. This is only needed to ensure
we can return the documents source from the engine if it hasn't been refreshed.
Aside of the added complexity to be able to read from the currently writing translog,
maintainance of pointers into the translog this also caused inconsistencies like different values
of the `_ttl` field if it was read from the tlog or not. TermVectors are totally different if
the document is fetched from the tranlog since copy fields are ignored etc.
This chance will simply call `refresh` if the documents latest version is not in the index. This
streamlines the semantics of the `_get` API and allows for more optimizations inside the engine
and on the transaction log. Note: `_refresh` is only called iff the requested document is not refreshed
yet but has recently been updated or added.
#Relates to #19787
I was writing tests for RAM usage estimation of LiveVersionMap and found a
couple issues:
- The BytesRef objects used as uids were oversized since they were created
via `new BytesRef(CharSequence)` which creates a `byte[]` whose size is 3x
the length of the provided char sequence. Given that our uids are most of
times ASCII sequences, this is a waste of memory.
- `VersionValue` was using `translogLocation.size` instead of
`translogLocation.ramBytesUsed()` for RAM estimation, which is completely
unrelated to the memory footprint of the `Translog.Location` object.
In particular, the latter issue could cause RAM usage estimation to be
significantly overestimated, especially on large documents.
I also added tests for ram accounting.
Deprecates the optimize_bbox parameter on geodistance queries. This has no longer been needed since version 2.2 because lucene geo distance queries (postings and LatLonPoint) already optimize by bounding box.
Catching and suppressing AlreadyClosedException from Lucene is dangerous because it can mean there is a bug in ES since ES should normally guard against invoking Lucene classes after they were closed.
I reviewed the cases where we catch AlreadyClosedException from Lucene and removed the ones that I believe are not needed, or improved comments explaining why ACE is OK in that case.
I think (@s1monw can you confirm?) that holding the engine's readLock means IW will not be closed, except if disaster strikes (failEngine) at which point I think it's fine to see the original ACE in the logs?
Closes#19861
When a SearchContext is closed it's reader / searcher reference is closed too.
If this happens while a search is accessing it's reader reference it can lead
to an unexpected `AlreadyClosedException` or worst case, an already closed MMapDirectory
is access causing a `SIGSEV` like in #20008 (even though the window for this is very small).
SearchContext can be closed concurrently if:
* an index is deleted / removed from the node
* a search context is idle for too long and is cleaned by the reaper
* an explicit freeContext message is received
This change adds reference counting to the SearchContext base class and it's used
inside SearchService each time the context is accessed.
Closes#20008
This includes:
- All regular numeric types such as int, long, scaled-float, double, etc
- IP addresses
- Dates
- Geopoints and Geoshapes
Relates to #19784
Adds ignoreUnavailable to the snapshot status API to be consistent
with the get snapshots API which has a similar parameter. If
ignoreUnavailable is set to true, then the snapshot status request
will ignore any snapshots that were not found in the repository,
instead of throwing a SnapshotMissingException.
Closes#18522
StartupException overrides Throwable#printStackTrace(PrintStream) but
not Throwable#printStackTrace(PrintWriter). The former override is used
when the JVM terminates with an exception, but the latter override can
be used in some logging frameworks when rendering an exception (e.g.,
log4j). This commit adds an override for the latter, with the behavior
for the two overrides being the same.
This commit renames StartupError to StartupException. This rename is due
to the fact that this class inherits from Exception not Error in the
Throwable class hierarchy.
This commit removes the minimum master nodes bootstrap check. The
motivation for this check was to raise awareness of the minimum master
nodes setting but this check gives a false sense of security because
it's too easy to set the setting to one when first standing up a cluster
and never update it when adding master-eligible nodes, or have it out of
sync on various nodes and still pass this check. Since this check does
not have the security that other bootstrap checks provide, it should be
removed in favor of a stronger guarantee in the future. We do log a
warning if an election occurs with minimum master nodes less than a
quorum of master-eligible nodes that participated in an election and
this is the best that we can do right now.
Relates #20082