Due to timing issues, mappings that are required to index a document might not
be available on the replica at indexing time. In that case the replica starts
listening to cluster state changes and re-parses the document until no dynamic
mappings updates are generated.
MergeContext currently exists to store conflicts, and providing
a mechanism to add dynamic fields. MergeResults store the same
conflicts. This change merges the two classes together, as well
as removes the MergeFlags construct.
This is in preparation for simplifying the callback structures
to dynamically add fields, which will require storing the mapping
updates in the results, instead of having a sneaky callback to
the DocumentMapper instance. It also just makes more sense that
the "results" of a merge are conflicts that occurred, along with
updates that may have occurred. For MergeFlags, any future needs
for parameterizing the merge (which seems unlikely) can just be
added directly to the MergeResults as simlulate is with this change.
This refactoring and cleanup is that each request handler ends up
implementing too many methods that can be provided when the request handler itself
is registered, including a prototype like class that can be used to instantiate
new request instances for streaming.
closes#10730
Today we have a lot of bloat in the IndexStore and related classes. THe IndexStore interface
is unneeded as we always subclass AbstractIndexStore and it hides circular dependencies
that are problematic when added. Guice proxies them if you have an interface which is bad in
general. This commit removes most of the bloat classes and unifies all the classes we have
into a single one since they all just structural and don't encode any functionality.
Refactor TransportShardReplicationOperationAction state management into clear separate Primary phase and Replication phase. The primary phase is responsible for routing the request to the node holding the primary, validating it and performing the operation on the primary. The Replication phase is responsible for sending the request to the replicas and managing their responses.
This also adds unit test infrastructure for this class, and some basic tests. We can extend later as we continue developing.
Closes#10749
This commit adds support for structural errors / failures / exceptions
on the elasticsearch REST layer. Exceptions are rendering with at least
a `type` and a `reason` corresponding to the exception name and the message.
Some expcetions like the ones associated with an index or a shard will have
additional information about the index the exception was triggered on or the
shard respectivly.
Each rendered response will also contain a list of root causes which is a list
of distinct shard level errors returned for the request. Root causes are the lowest
level elasticsearch exception found per shard response and are intended to be displayed
to the user to indicate the soruce of the exception.
Shard level response are by-default grouped by their type and reason to reduce the amount
of duplicates retunred. Yet, the same exception retunred from different indices will not be
grouped.
Closes#3303
Disabling doc values or trying to index hash values are not
correct uses of this the murmur3 field type, and just cause
problems. This disallows changing doc values or index options
for 2.0+.
closes#10465
This commit splits the current ClusterBlockLevel.METADATA into two disctins ClusterBlockLevel.METADATA_READ and ClusterBlockLevel.METADATA_WRITE blocks. It allows to make a distinction between
an operation that modifies the index or cluster metadata and an operation that does not change any metadata.
Before this commit, many operations where blocked when the cluster was read-only: Cluster Stats, Get Mappings, Get Snapshot, Get Index Settings, etc. Now those operations are allowed even when
the cluster or the index is read-only.
Related to #8102, #2833Closes#3703Closes#5855Closes#10521Closes#10522
While dynamic mappings updates are using the same code path as updates from the
API when applied on a data node since #10593, they were still using a different
code path on the master node. This commit makes dynamic updates processed the
same way as updates from the API, which also seems to do a better way at
acknowledgements (I could not reproduce the ConcurrentDynamicTemplateTests
failure anymore). It also adds more checks, like for instance that indexing on
replicas should not trigger dynamic mapping updates since they should have been
handled on the primary before.
Close#10720
The field stats api returns field level statistics such as lowest, highest values and number of documents that have at least one value for a field.
An api like this can be useful to explore a data set you don't know much about. For example you can figure at with the lowest and highest response times are, so that you can create a histogram or range aggregation with sane settings.
This api doesn't run a search to figure this statistics out, but rather use the Lucene index look these statics up (using Terms class in Lucene). So finding out these stats for fields is cheap and quick.
The min/max values are based on the type of the field. So for a numeric field min/max are numbers and date field the min/max date and other fields the min/max are term based.
Closes#10523
- Randomizes the metric type between min/max/avg. Should have identical behavior, but good to test
- Fixes improper handling of gaps due to a bug in the production of the "expected" dataset. Due to this fix,
randomization of gap policy was re-enabled
- Bunch of renaming to be more descriptive and less verbose
This commit adds the ability for moving average models to output a "prediction" based on the current
moving average model. For simple, linear and single, this prediction is simply converges on the
moving average's mean at the last point, leading to a straight line. For double, this will
predict in the direction of the linear trend (either globally or locally, depending on beta).
Also adds some more tests.
Closes#10545
Added an initial script construct to unify the parameters typically
passed to methods in the ScriptService. This changes the way several public
methods are called in the ScriptService along with all the callers
since they must wrap the parameters passed in into a script object. In the
future, parsing parameters can also be moved into this construct along with
ToXContent.
closes#10649
When using a filesystem that may have lag between an index being created
on the primary and a on the replica, creation of the ShadowEngine can
fail because there are no segments in the directory.
In these situations, we retry during engine creation to wait until an
index is present in the directory. The number wait delay is
configurable, defaulting to waiting for 5 seconds from an index to
become available.
Resolves#10637
Nested classes have the advantage of organizing the hack in a way
where its easy to see what is happening overall, but they have
the downside of class names with $ in them.
These names work just fine, but can require shell escaping
or other annoyances, which is the last thing you want if
you are trying to just reproduce.
Also changed the stash logger to not log all stashed values under debug (it does trace now) but do dump the stash content upon failure (under info as a XContent)
This improves the NodeEnvironment code that walks through all mount
points looking for the one matching the file store for a specified
path, to make it a bit more defensive. We currently rely on this to
log the correct file system type of the path.data paths.
Closes#10696
Closes#10397
When putting new templates to an index they are added to the cache
of compiled templates as a side effect of the validate method. When
updating templates they are also validated but the scripts that are
already in the cache never get updated.
As per comments on PR #10526 adding more tests around updating scripts
and templates.
Extends ShardStats with commit specific information. We currently expose commit id, generation and the user data map.
The information is also retrievable via the Rest API by using `GET _stats?level=shards`
Closes#10687
We no longer support overriding field index names, but the lookup
data structures still optimize for this use case. This complicates
the work for #8871. Instead, we can use a lookup structure
by making the legacy case slower.
This change simplifies the field mappers lookup to only
store a single map, keyed by the field's full name. It also
changes a lot of tests to decrease the uses of the older api
(looking up by index name where the index name is different
than the field name).
closes#10705
If a user explicitly defined the tree_level or precision parameter in a geo_shape mapping their specification was always overridden by the default_error_pct parameter (even though our docs say this parameter is a 'hint'). This lead to unexpected accuracy problems in the results of a geo_shape filter. (example provided in issue #9691)
This simple patch fixes the unexpected behavior by setting the default distance_error_pct parameter to zero when the tree_level or precision parameters are provided by the user. Under the covers the quadtree will now use the tree level defined by the user. The docs will be updated to alert the user to exercise caution with these parameters. Specifying a precision of "1m" for an index using large complex shapes can quickly lead to OOM issues.
closes#9691
This is currently submitted as a patch in LUCENE-6422. It removes unnecessary transient memory usage for QuadPrefixTree and, for 1.6.0+ shape indexes adds a new compact bit encoded representation for each quadcell. This is the heart of numerous false positive matches, OOM exceptions, and all around poor shape indexing performance. The compact bit representation will also allows for encoding 3D shapes in future enhancements.
Today we have duplicated logic in the MockInternal and MockShadowEngine
since they need to subclass the actual engine. This commit shares the most of
the code making it easier to add mock engines in the future.
In Lucene 5.1 lots of filters got deprecated in favour of equivalent queries.
Additionally, random-access to filters is now replaced with approximations on
scorers. This commit
- replaces the deprecated NumericRangeFilter, PrefixFilter, TermFilter and
TermsFilter with NumericRangeQuery, PrefixQuery, TermQuery and TermsQuery,
wrapped in a QueryWrapperFilter
- replaces XBooleanFilter, AndFilter and OrFilter with a BooleanQuery in a
QueryWrapperFilter
- removes DocIdSets.isBroken: the new two-phase iteration API will now help
execute slow filters efficiently
- replaces FilterCachingPolicy with QueryCachingPolicy
Close#8960
This commit changes dynamic mappings updates so that they are synchronous on the
entire cluster and their validity is checked by the master node. There are some
important consequences of this commit:
- a failing index request on a non-existing type does not implicitely create
the type anymore
- dynamic mappings updates cannot create inconsistent mappings on different
shards
- indexing requests that introduce new fields might induce latency spikes
because of the overhead to update the mappings on the master node
Close#8688
Because the fetch phase now has nested doc, the logic that deals with detecting if a named nested query/filter matches with a hit can be removed.
Closes#10661
Currently the error message is the same when index is closed and when it is missing shards. This commit will generate a specific failure message when a user tries to create a snapshot of a closed index.
Related to #10579
Fix typo in JVM checker user help.
When checking the JVM we provide the user with help on which environment variable to use to disable the check in case the check fails. Fixing the variable we point the user to - should be JAVA_OPTS
This commit moves away from using stripe RAID-0 simumlation across multiple
data paths towards using a single path per shard. Multiple data paths are still
supported but shards and it's data is not striped across multiple paths / disks.
This will for instance prevent to loose all shards if a single disk is corrupted.
Indices that are using this features already will automatically upgraded to a single
datapath based on a simple diskspace based heuristic. In general there must be enough
diskspace to move a single shard at any time otherwise the upgrade will fail.
Closes#9498
The existing DEB/RPM packages have a lot of differences: they don't execute the same actions when installing or removing the package. They also don't declare exactly the same environment variables at the same place. At the end of the day the global behavior and configuration is *almost* the same but it's very difficult to maintain the scripts.
This commits unifies the package behavior:
- DEB/RPM use the same package scripts (pre installation, post installation etc) in order to execute exactly the same actions
- Use of a unique environment vars file that declares everything needed by scripts (the goal is to delete vars declaration in init.d and systemd scripts, this will be done in another PR)
- Variables like directory paths are centralized and replaced according to the target platform (using #10330)
- Move /etc/rc.d/init.d to standard /etc/init.d (RPM only)
- Add PID_DIR env var
- Always set ES_USER, ES_GROUP,MAX_MAP_COUNT and MAX_OPEN_FILES in env vars file
- Create log, data, work and plugins directories with DEB/RPM packaging system
- Change to elastic.co domain in copyright and control files
- Add Bats files to automate testing of DEB and RPM packages
- Update TESTING.asciidoc
More info on Bats here: https://github.com/sstephenson/bats
Now that we handle automatically the local execution within the transport service, we can remove parts of the code that handle it in actions.
closes#10582
Add back UpgradeReallyOldIndexTest from 1.x, but test 0.90.6 index
(Lucene 4.x) instead of 0.20 (Lucene 3.x), and make sure
only_ancient_segments works.
Closes#10639
This option defaults to false, because it is also important to upgrade
the "merely old" segments since many Lucene improvements happen within
minor releases.
But you can pass true to do the minimal work necessary to upgrade to
the next major Elasticsearch release.
The HTTP GET upgrade request now also breaks out how many bytes of
ancient segments need upgrading.
Closes#10213Closes#10540
Conflicts:
dev-tools/create_bwc_index.py
rest-api-spec/api/indices.upgrade.json
src/main/java/org/elasticsearch/action/admin/indices/optimize/OptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/ShardOptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/TransportOptimizeAction.java
src/main/java/org/elasticsearch/index/engine/InternalEngine.java
src/test/java/org/elasticsearch/bwcompat/StaticIndexBackwardCompatibilityTest.java
src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
src/test/java/org/elasticsearch/rest/action/admin/indices/upgrade/UpgradeReallyOldIndexTest.java
We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.
The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as #9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (#8688).
The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.
However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.
One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.
One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.