- Randomizes the metric type between min/max/avg. Should have identical behavior, but good to test
- Fixes improper handling of gaps due to a bug in the production of the "expected" dataset. Due to this fix,
randomization of gap policy was re-enabled
- Bunch of renaming to be more descriptive and less verbose
This commit adds the ability for moving average models to output a "prediction" based on the current
moving average model. For simple, linear and single, this prediction is simply converges on the
moving average's mean at the last point, leading to a straight line. For double, this will
predict in the direction of the linear trend (either globally or locally, depending on beta).
Also adds some more tests.
Closes#10545
Added an initial script construct to unify the parameters typically
passed to methods in the ScriptService. This changes the way several public
methods are called in the ScriptService along with all the callers
since they must wrap the parameters passed in into a script object. In the
future, parsing parameters can also be moved into this construct along with
ToXContent.
closes#10649
When using a filesystem that may have lag between an index being created
on the primary and a on the replica, creation of the ShadowEngine can
fail because there are no segments in the directory.
In these situations, we retry during engine creation to wait until an
index is present in the directory. The number wait delay is
configurable, defaulting to waiting for 5 seconds from an index to
become available.
Resolves#10637
Nested classes have the advantage of organizing the hack in a way
where its easy to see what is happening overall, but they have
the downside of class names with $ in them.
These names work just fine, but can require shell escaping
or other annoyances, which is the last thing you want if
you are trying to just reproduce.
Also changed the stash logger to not log all stashed values under debug (it does trace now) but do dump the stash content upon failure (under info as a XContent)
Closes#10397
When putting new templates to an index they are added to the cache
of compiled templates as a side effect of the validate method. When
updating templates they are also validated but the scripts that are
already in the cache never get updated.
As per comments on PR #10526 adding more tests around updating scripts
and templates.
Extends ShardStats with commit specific information. We currently expose commit id, generation and the user data map.
The information is also retrievable via the Rest API by using `GET _stats?level=shards`
Closes#10687
We no longer support overriding field index names, but the lookup
data structures still optimize for this use case. This complicates
the work for #8871. Instead, we can use a lookup structure
by making the legacy case slower.
This change simplifies the field mappers lookup to only
store a single map, keyed by the field's full name. It also
changes a lot of tests to decrease the uses of the older api
(looking up by index name where the index name is different
than the field name).
closes#10705
If a user explicitly defined the tree_level or precision parameter in a geo_shape mapping their specification was always overridden by the default_error_pct parameter (even though our docs say this parameter is a 'hint'). This lead to unexpected accuracy problems in the results of a geo_shape filter. (example provided in issue #9691)
This simple patch fixes the unexpected behavior by setting the default distance_error_pct parameter to zero when the tree_level or precision parameters are provided by the user. Under the covers the quadtree will now use the tree level defined by the user. The docs will be updated to alert the user to exercise caution with these parameters. Specifying a precision of "1m" for an index using large complex shapes can quickly lead to OOM issues.
closes#9691
Today we have duplicated logic in the MockInternal and MockShadowEngine
since they need to subclass the actual engine. This commit shares the most of
the code making it easier to add mock engines in the future.
In Lucene 5.1 lots of filters got deprecated in favour of equivalent queries.
Additionally, random-access to filters is now replaced with approximations on
scorers. This commit
- replaces the deprecated NumericRangeFilter, PrefixFilter, TermFilter and
TermsFilter with NumericRangeQuery, PrefixQuery, TermQuery and TermsQuery,
wrapped in a QueryWrapperFilter
- replaces XBooleanFilter, AndFilter and OrFilter with a BooleanQuery in a
QueryWrapperFilter
- removes DocIdSets.isBroken: the new two-phase iteration API will now help
execute slow filters efficiently
- replaces FilterCachingPolicy with QueryCachingPolicy
Close#8960
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
- a failing index request on a non-existing type does not implicitely create
the type anymore
- dynamic mappings updates cannot create inconsistent mappings on different
shards
- indexing requests that introduce new fields might induce latency spikes
because of the overhead to update the mappings on the master node
Close#8688
Because the fetch phase now has nested doc, the logic that deals with detecting if a named nested query/filter matches with a hit can be removed.
Closes#10661
Currently the error message is the same when index is closed and when it is missing shards. This commit will generate a specific failure message when a user tries to create a snapshot of a closed index.
Related to #10579
This commit moves away from using stripe RAID-0 simumlation across multiple
data paths towards using a single path per shard. Multiple data paths are still
supported but shards and it's data is not striped across multiple paths / disks.
This will for instance prevent to loose all shards if a single disk is corrupted.
Indices that are using this features already will automatically upgraded to a single
datapath based on a simple diskspace based heuristic. In general there must be enough
diskspace to move a single shard at any time otherwise the upgrade will fail.
Closes#9498
The existing DEB/RPM packages have a lot of differences: they don't execute the same actions when installing or removing the package. They also don't declare exactly the same environment variables at the same place. At the end of the day the global behavior and configuration is *almost* the same but it's very difficult to maintain the scripts.
This commits unifies the package behavior:
- DEB/RPM use the same package scripts (pre installation, post installation etc) in order to execute exactly the same actions
- Use of a unique environment vars file that declares everything needed by scripts (the goal is to delete vars declaration in init.d and systemd scripts, this will be done in another PR)
- Variables like directory paths are centralized and replaced according to the target platform (using #10330)
- Move /etc/rc.d/init.d to standard /etc/init.d (RPM only)
- Add PID_DIR env var
- Always set ES_USER, ES_GROUP,MAX_MAP_COUNT and MAX_OPEN_FILES in env vars file
- Create log, data, work and plugins directories with DEB/RPM packaging system
- Change to elastic.co domain in copyright and control files
- Add Bats files to automate testing of DEB and RPM packages
- Update TESTING.asciidoc
More info on Bats here: https://github.com/sstephenson/bats
Add back UpgradeReallyOldIndexTest from 1.x, but test 0.90.6 index
(Lucene 4.x) instead of 0.20 (Lucene 3.x), and make sure
only_ancient_segments works.
Closes#10639
This option defaults to false, because it is also important to upgrade
the "merely old" segments since many Lucene improvements happen within
minor releases.
But you can pass true to do the minimal work necessary to upgrade to
the next major Elasticsearch release.
The HTTP GET upgrade request now also breaks out how many bytes of
ancient segments need upgrading.
Closes#10213Closes#10540
Conflicts:
dev-tools/create_bwc_index.py
rest-api-spec/api/indices.upgrade.json
src/main/java/org/elasticsearch/action/admin/indices/optimize/OptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/ShardOptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/TransportOptimizeAction.java
src/main/java/org/elasticsearch/index/engine/InternalEngine.java
src/test/java/org/elasticsearch/bwcompat/StaticIndexBackwardCompatibilityTest.java
src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
src/test/java/org/elasticsearch/rest/action/admin/indices/upgrade/UpgradeReallyOldIndexTest.java
We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.
The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as #9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (#8688).
The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.
However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.
One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.
One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
Also added a couple nocommits for some issues with tests after mockfs is
working again. But I also re-enabled the mockfs suppression in the base
test case for now.
Allowing tests writing to the working directory can mask problems.
For example, multiple tests running in the same jvm, and using the
same relative path, may cause issues if the first test to run
leaves data in the directory, and the second test does not remember
to cleanup the path before using it.
This change adds security manager rules to disallow tests writing
to the working directory. Instead, tests create a temp dir with
the existing test framework.
closes#10605
This adds a new feature to the Term Vectors API which allows for filtering of
terms based on their tf-idf scores. With `dfs` option on, this could be useful
for finding out a good characteric vector of a document or a set of documents.
The parameters are similar to the ones used in the MLT Query.
Closes#9561
We need to preserve settings (yet transient) even though the engine is not yet
started. This commit moves back to a single EngineConfig to simplify IndexShard
and settings state.
Closes#10584
Local execution of transport messages failures can create a more detailed remote transport exceptions. Also, when failing to handle an exception, the error should be logged, and not call the handler again with another exception
closes#10554
This commit adds a `rewrite` parameter to the validate API in order to shown
how the given query is re-written into primitive queries. For example, an MLT
query is re-written into a disjunction of the selected terms. Other use cases
include `fuzzy`, `common_terms`, or `match` query especially with a
`cutoff_frequency` parameter. Note that the explanation is only given for a
single randomly chosen shard only, so the output may vary from one shard to
another.
Relates #1412Closes#10147
Today the engine writes the transaction log itself as well as manages
all the commit / translog mapping internally. Yet, if an engine is closed
and reopend it doesn't replay it's translog or does anything to be consistent
with it's latest state again.
This change moves the transaction log replay code into the Engine / InternalEngine
and adds unittests for replaying and consistency.
Closes#10452
At the moment, we are very strict when handling data folders containing corrupted shards and will fail any recovery attempt into it. Typically this wouldn't be a problem as the shard will be assigned to another node (which we try first anyway when a shard fails). However, it has been proven to be too strict for smaller clusters which may not have an extra node available (either because of allocation filtering, disk space issues etc.). This commit changes the behavior to force a full recovery. Once all the new files are verified we remove the old corrupted data and start the shard.
This also fixes a small issue where the shard state file wasn't deleted on an engine failure (we had a protection against deleting the state file on an active shard, but in this case the shard is still active but will be removed). The state deletion is also moved to before the failure handlers are called, to avoid race conditions when calling the master (it will potentially try to read it when allocating the shard)
Closes#10558
This changes adds the ability to specify the units for the x-axis for derivative values and calculate the derivative based on those units rather than the original histograms x-axis units
ShapeBuilder's coordinate parser expected 2 double values for every coordinate array. If > 2 doubles were provided the parser terminated parsing of the coordinate array. This resulted in an invalid Shape state leaving LineStrings, LinearRings, and Polygons with a single coordinate. An incorrect parse exception was thrown. This corrects the parser to ignore those values in the 3rd+ dimension, correctly parsing the rest of the coordinate array.
Unit tests have been updated to verify the fix.
closes#10510
Prevents the user from changing strategies, tree, tree_level or precision. distance_error_pct changes are allowed as they do not compromise the integrity of the index. A separate issue is open for allowing users to change tree_level or precision.
OGC SFA 2.1.10 assertion 3 allows interior boundaries to touch exterior boundaries provided they intersect at a single point. Issue #9511 provides an example where a valid shape is incorrectly interpreted as invalid (a false violation of assertion 3). When the intersecting point appears as the first and last coordinate of the interior boundary in a polygon, the ShapeBuilder incorrectly counted this as multiple intersecting vertices. The fix required a little more than just a logic check. Passing the duplicate vertices resulted in a connected component in the edge graph causing an invalid self crossing polygon. This required additional logic to the edge assignment in order to correctly segment the connected components. Finally, an additional hole validation has been added along with proper unit tests for testing valid and invalid conditions (including dateline crossing polys).
closes#9511
This is really a Collector instead of a filter. This commit deprecates the
`limit` filter, makes it a no-op and recommends to use the `terminate_after`
parameter instead that we introduced in the meantime.
Most tests don't "really" need to fsync, and this is costly (makes
tests slower, wears out our SSDs).
This change makes it uncommon to actually fsync when Lucene asks for
it. It's just a workaround (in MockDirectoryHelper) until we can
cutover Elasticseach to use MockFileSystem like Lucene.
Closes#10516
Today we force a flush before check index to ensure we have an index
to check on. Yet if the index is large and the FS is slow this can have
significant impact on the index deletion performance. This commit introduces
a check if there are any uncommitted changes in order to skip the additional commit.
Closes#10505
Today we check every regular expression eagerly against every possible term.
This can be very slow if you have lots of unique terms, and even the bottleneck
if your query is selective.
This commit switches to Lucene regular expressions instead of Java (not exactly
the same syntax yet most existing regular expressions should keep working) and
uses the same logic as RegExpQuery to intersect the regular expression with the
terms dictionary. I wrote a quick benchmark (in the PR) to make sure it made
things faster and the same request that took 750ms on master now takes 74ms with
this change.
Close#7526
The refactoring in #9544 introduced a regression that broke multi-level
aggregations using breadth-first. This was due to sub-aggregators creating
deferred collectors before their parent aggregator and then the parent
aggregator trying to collect sub aggregators directly instead of going through
the deferred wrapper.
This commit fixes the issue but we should try to simplify all the pre/post
collection logic that we have.
Also `breadth_first` is now automatically ignored if the sub aggregators need
scores (just like we ignore `execution_mode` when the value does not make sense
like using ordinals on a script).
Close#9823
Allows the user to calculate a Moving Average over a histogram of buckets. Provides four different
moving averages:
- Simple
- Linear weighted
- Single Exponentially weighted (aka EWMA)
- Double Exponentially weighted (aka Holt-winters)
Closes#10024
Plugins can now define multiple operations/contexts that they use scripts for. Fine-grained settings can then be used to enable/disable scripts based on each single registered context.
Also added a new generic category called `plugin`, which will be used as a default when the context is not specified. This allows us to restore backwards compatibility for plugins on `ScriptService` by restoring the old methods that don't require the script context and making them internally use the `plugin` context, as they can only be called from plugins.
Closes#10347Closes#10419
This tests adds a mappings with {"fielddata": {"format": "doc_values"}} but the
default mapping has {"doc_values": false} so when the document mapper parsing
logic merges both we have {"doc_values": false,"fielddata": {"format": "doc_values"}}
and {"doc_values": false} wins, so the test is not using doc values while it
thought it would.