The local DocumentMapper is updated while parsing and dynamic fields are added before
parsing has finished. If parsing fails after a dynamic field has been added already
then the field was not added to the cluster state but was present in the local mapper of this
node. New documents with the same field would not necessarily cause an update either and
after restarting the node the mapping for these fields were lost. Instead the new fields
should always be updated.
closes#9851closes#9874
Since the method can be called in an #execute event of the cluster service, we need the ability to use the cluster state that will be provided in the ClusterChangedEvent, have the ClusterState be provided as a parameter
Previously it was ignored and the publish cluster state timeout would kick in. In that case a stale master node would just wait for the inevitable and waste valuable time.
This issue was discovered by the DiscoveryWithServiceDisruptionsTests#testStaleMasterNotHijackingMajority test.
Also only perform cluster state versions and wrong master node check inside cluster state update task.
Will allow many reducers to share the same helper functionality without repeating code. Chose
to put these in static helpers instead of adding to Reducer base class. I can imagine other reducers
that aren't time-based (or don't care about contiguous buckets), which would make things like
gap policy useless.
Since these seemed more like helpers than inherent traits of a Reducer, they went into their own
static class.
Closes#9954
If a folder for an index was created that folder is never deleted from that node unless the index is deleted.
Data only nodes therefore can have empty folders for indices that they do not even have shards for.
This commit makes sure empty folders are cleaned up after all shards have moved away from a data only
node. The behavior is unchanged for master eligible nodes.
closes#9985
We used to handle truncated translogs in a better manner (assuming that
the node was killed halfway through writing an operation and discarding
the last operation). This brings back that behavior by catching an
`EOFException` during the stream reading and throwing a
`TruncatedTranslogException` which can be safely ignored in
`IndexShardGateway`.
Fixes#9699
We've been relying on URI for url encoding, but it turns out it has some problems. For instance '+' stays as is while it should be encoded to `%2B`. If we go and manually encode query params we have to be careful though not to run into double encoding ('+'=>'%2B'=>'%252B'). The applied solution relies on URI encoding for the url path, but manual url encoding for the query parameters. We prevent URI from double encoding query params by using its single argument constructor that leaves everything as is.
We can also revert back the expression script REST test that revealed this to its original content (which contains an addition).
Closes#9769Closes#9946
It may take some time for the old master node to step down anf for it to rejoin and that all nodes have it in the nodes list.
By waiting for the old master node to have stepped down, we can again rely on assertDiscoveryCompleted() to make sure that it has joined.
If the isolated unicast host is also a master node then its local cluster state gets unusable a source for pinging when the disruption stops.
All the nodes in the cluster state node list can be removed and at that time it will only ping itself and never find out about the other nodes.
(these nodes will not ping, because they are already following a new master)
CurrentTestFailedMarker is a RunListener that gets notified whenever a test fails, and we were using it to be able to restart the suite cluster after each failure. We were checking whether a test had failed in the @After method though, which runs before the listener gets notified, so the failed flag would always be false.
This commit makes sure that the suite cluster gets restarted not only when there are problems in the afterInternal method, but also after each test failure. In order to achieve this, we need to reset the cluster afterwards, when we get to know about both of the events (problem in afterInternal and test failure), and before resetting the currentCluster. Introduced a TestRule that keeps track of test failures and allows to execute arbitrary tasks when a test fails and when a test is completed (regardless of its result). Allows also to force the execution of the failure task (used in case of afterInternal issues rather than actual test failure).
Also updated ElasticsearchRestTests to make sure that the RestClient gets re-initialized in case we restart the suite cluster, otherwise all the subsequent tests fail. Improved this mechanism also to relate it directly to the restart of the cluster instead of checking whether the addresses have changed, which doesn't work anyway as the new cluster will use the same addresses but the client needs to be recreated anyway.
Closes#9015
This commit modifies the Kernel32Library to use direct mapping instead of a proxy class when doing native calls on Windows platforms. It also adds the "createSecurityManager" permission to the tests.policy file, and adds unit tests that should have failed when the Java security manager is enabled.
Closes#9802
We keep track of the current stage of recovery using an instance of RecoveryState which is stored on the relevant IndexShard. At the moment changes to this object are made in many places of the code, which are charged of doing it in the right order, keeping track of timers and many more. Also the changes to shard state are decoupled from the recovery stages which caused #9503.
This PR refactors this and brings all of the changes into IndexShard. It also makes all recovery follow the exact same stages and shortcut some. This is in order to keep things simple and always the same (those shortcuts didn't add anything, we ended doing it all anyway).
Also, all timer management is now folded into RecoveryState and unit tests are added.
This closes#9503 by moving the shard to post recovery only once the recovery is done (before they were decoupled), meaning that master promotion of the target shard to started can not cancel the recovery.
Closes#9902
#9760 was a fix for translog leaking due to measing a delete flag. This is not needed here as we have a better solution to not loose the flag. This commit takes the changes from 1x in order to keep the code base similar and enjoy the extra tests.
Closes#9760
While the parser allowed changing field type settings, these would never
have been serialized. So this change simply removes parsing using
parseField. Backcompat will still work if a user uploads old settings
(they just would never have worked anyways, so we continue ignoring
them with 1.x, and 2.x will now error).
see #8143closes#9914
This setting is used by the release script to run rest tests against
the version being released. It used to work only for tests using
the global cluster. Now it supercedes both SUITE and TEST scope
test clusters.
closes#9916
Closes#9915.
Squashed commit of the following:
commit cfa59f5a3f03d9d1b432980dcee6495447c1e7ea
Author: Robert Muir <rmuir@apache.org>
Date: Fri Feb 27 12:10:16 2015 -0500
add missing null check
commit 62fe5403068c730c0e0b6fd1ab1a0246eeef6220
Author: Robert Muir <rmuir@apache.org>
Date: Fri Feb 27 11:31:53 2015 -0500
Disable ExtrasFS for now, until we hook all this in properly in a separate issue.
commit 822795c57c5cf846423fad443c2327c4ed0094ac
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Feb 27 10:12:02 2015 +0100
Fix PercolatorTests.
commit 98b2a0a7d8298648125c9a367cb7e31b3ec7d51b
Author: Adrien Grand <jpountz@gmail.com>
Date: Fri Feb 27 09:27:11 2015 +0100
Fix ChildrenQueryTests.
commit 9b99656fc56bbd01c9afe22baffae3c37bb48a71
Author: Robert Muir <rmuir@apache.org>
Date: Thu Feb 26 20:50:02 2015 -0500
cutover apis, no work on test failures yet.
Some tests failures are seen when a node attempts to use a port that is already bound
by some other process on the test machine. This commit adds a bind to test port availability
and iterates over the port range until an available port is found. This reduces the likelihood
of a test node failing to start up due to the port already being bound.
Today we have two ways of getting a setting, either with the full settings key or with only
the last part of the key where the prefix is implicit depending on the package the class is in via
component settings. this is trappy as well as confusing for users and can break easily if a class is moved
to a new package since the prefix then implicitly changes.
This commit removes the component settings from the codebase.
Almost all of our meta fields that allow enabling/disabling have an `enabled`
setting. However, _field_names is enabled by default, and disabling
requires setting `index=no`. This change adds a flag similar to that
with other meta fields.
closes#9893
Random geo shape testing periodically fails on a known issue within Spatial4j core. A simple patch in ES will fix the issue. For now this random test will be disabled until the patch can be applied.
The request tracer logs in TRACE level under the `transport.tracer` log and is dynamically configurable with include and exclude arrays to filter out unneeded info. By default all requests are logged with the exception of fault detection pings (fired every second).
add the notion of tracers in the MockTransportService for testing purposes
Closes#9286
Currently rounding in DateMathParser This always done in UTC, even
when another time zone is specified. This is fixed by passing the time zone
down to the rounding logic when it is specified.
Closes#9814Closes#9885
To support the `_recovery` API, the recovery process keeps track of current progress in a class called RecoveryState. This class currently have some issues, mostly around concurrency (see #6644 ). This PR cleans it up as well as other issues around it:
- Make the Index subsection API cleaner:
- remove redundant information - all calculation is done based on the underlying file map
- clearer definition of what is what: total files, vs reused files (local files that match the source) vs recovered files (copied over). % based progress is reported based on recovered files only.
- cleaned up json response to match other API (sadly this breaks the structure). We now properly report human values for dates and other units.
- Add more robust unit testing
- Detail flag was passed along as state (it's now a ToXContent param)
- State lookup during reporting is now always done via the IndexShard , no more fall backs to many other classes.
- Cleanup APIs around time and move the little computations to the state class as opposed to doing them out of the API
I also improved error messages out of the REST testing infra for things I run into.
Closes#6644Closes#9811
Together with #8782 it should help in the situations simliar to #8887 by adding an ability to get information about currently running snapshot without accessing the repository itself.
Closes#8887
The number of current pending tasks is useful to detect and overloaded master. This commit adds it to the cluster health API. The complete list can be retrieved from the dedicated pending tasks API.
It also adds rest tests for the cluster health variants.
Closes#9877
Today we fail the shard if we need to upgrade a replica to a primary on shadow replicas
on shared filesystem. Yet, this commit allows promotion by re-initializing on the master preventing
reallocation of all replicas.
We try to lock all shards when an index is deleted but likely not
succeeding since shards are still active. To ensure that shards
that used to be allocated on that node get cleaned up as well we have
to retry or block on the delete until we get the locks. This is not desirable
since the delete happens on the cluster state processing thread. Instead of blocking
this commit schedules a pending delete for the index just like if we can't delete shards.
There are two implications to this change.
First, percolator now uses _uid internally, extracting the id portion
when needed. Second, sorting on _id is no longer possible, since you
can no longer index _id. However, _uid can still be used to sort, and
is better anyways as indexing _id just to make it available to
fielddata for sorting is wasteful.
see #8143closes#9842
Today if we delete files from the index directory we never acquire the
write lock. Yet, for safety reasons we should get the write lock before
we modify / delete any files. Where we can we should leave the deletion
to the index writer and only delete that are necessary to delete ourself.
Refactor how settings filters are handled. Instead of specifying settings filters as filtering class, settings filters are now specified as list of settings that needs to be filtered out. Regex syntax is supported. This is breaking change and will require small change in plugins that are using settingsFilters. This change is needed in order to simplify cluster state diff implementation.
Contributes to #6295
When indexing of a document with a type that is not in the mappings fails,
for example because "dynamic": "strict" but doc contains a new field,
then the type is still created on the node that executed the indexing request.
However, the change was never added to the cluster state.
This commit makes sure mapping updates are always added to the cluster state
even if indexing of a document fails.
closes#8692
relates to #8650
Now that the global cluster is gone, we shoudln't need to ignore
thread leaks across tests. We unfortunately still need suite level
scope, since most tests are using suite scope clusters (although
test clope clusters should really switch back to test scope thread
leaks).
closes#9843
These help a lot when refactoring, upgrading lucene, etc, and
can prevent code duplication (as you get a compile error for outdated stuff).
Closes#9832.
This change removes the deprecated script parameter names ('file', 'id', and 'scriptField').
It also removes the ability to load file scripts using the 'script' parameter. File scripts should be loaded using the 'script_file' parameter only.
If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes.
This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually.
Closes#9632
Currently many meta field mappers do not take index settings in their
simple constructor that DocumentMapper uses, and instead pass null or
empty settings to the parent abstract mapper. This change fixes them to
pass through index settings, and adds an assertion in AbstractFieldMapper
that settings are not null.
closes#9780
This was previously attempted in #8854. I revived that branch and did
some performance testing as was suggested in the comments there.
I fixed all the errors, mostly just the rest tests, which
needed to have http enabled on the node settings (the global cluster
previously had this always enabled). I also addressed the comments from
that issue.
My performance tests involved running the entire test suite on my
desktop which has 6 cores, 16GB of ram, and nothing else was being
run on the box at the time. I ran each set of settings 3 times and
took the average time.
| mode | master | patch | diff |
| ------- | ------ | ----- | ---- |
| local | 409s | 417s | +2% |
| network | 368s | 380s | +3% |
This increase in average time is clearly worthwhile to pay to achieve
isolation of tests. One caveat is the way I fixed the rest tests
is still to have one cluster for the entire suite, so all the rest
tests can still potentially affect each other, but this is an
issue for another day.
There were some oddities that I noticed while running these tests
that I would like to point out, as they probably deserve some
investigation (but orthogonal to this PR):
* The total test run times are highly variable (more than a minute between the min and max)
* Running in network mode is on average actually *faster* than local mode. How is this possible!?
Today locking all shards only locks the shards that are present on
the node or that still have a shard directory. This can lead to odd
behavior if another shard that doesn't exist yet is allocated while
all shards are supposed to be locked.
Adds RandomShapeGenerator for creating random shape types. This adds a level of randomized testing to the Geospatial logic. An initial randomized GeometryCollection test is added to the GeoShapeIntegrationTest suite for validating and verifying geo_shape filter behavior. The RandomShapeGenerator can/should be used in Unit and Integration testing to avoid biased testing.
closes#9588
Squashed commit of the following:
commit 07391388715ed1f737e8acc391cea0bce5d79db9
Merge: a71cc45 b61b021
Author: Robert Muir <rmuir@apache.org>
Date: Fri Feb 20 06:58:11 2015 -0500
Git really sucks
Merge branch 'lucene_r1660560' of github.com:elasticsearch/elasticsearch into lucene_r1660560
commit b61b02163f62ad8ddd9906cedb3d57fed75eb52d
Author: Adrien Grand <jpountz@gmail.com>
Date: Wed Feb 18 19:03:49 2015 +0100
Try to improve TopDocs.merge usage.
commit bf8e4ac46d7fdaf9ae128606d96328a59784f126
Author: Ryan Ernst <ryan@iernst.net>
Date: Wed Feb 18 07:43:37 2015 -0800
reenable scripting test for accessing postings pieces. commented out
parts that fail because of bad assumptions
commit 6d4d635b1a23b33c437a6bae70beea70ad52d91c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Feb 18 09:41:46 2015 -0500
add some protection against broken asserts, but, also disable crappy test
commit c735bbb11f38782dfea9c4200fcf732564126bf5
Author: Robert Muir <rmuir@apache.org>
Date: Wed Feb 18 02:21:30 2015 -0500
cutover remaining stuff from old postings api
commit 11c9c2bea3db3ff1cd2807bd43e77b500b167aed
Author: Robert Muir <rmuir@apache.org>
Date: Wed Feb 18 01:46:04 2015 -0500
cut over most DocsEnum usage
commit bc18017662f6abddf3f074078f74e582494c88e2
Author: Robert Muir <rmuir@apache.org>
Date: Wed Feb 18 01:19:35 2015 -0500
upgrade to lucene_r1660560, modulo one test fail
Today if a shard deletion fails we simply ignore it and move on. On system like
windows where a virus scanner can hold on to files or any other process ie. the admins
explorer window we fail to delete shards leaving large amout of data behind. We should try
best effort to clean those shards up before we ack the delete.
Today we restore files by running through the directory removeing all files
not in the snapshot. Some files in that direcotry might belong there even though
we remove them. This commit moves the responsiblity of cleaning up pending files
to lucene by utilizing IndexWriter#IndexFileDeleter
This commit makes the `postings_format` and `doc_values_format` options of
mappings illegal on 2.0 and ignored on 1.x (meaning that the default postings
and doc values formats from the codec will be used in such a case).
This removes a fair amount of code.
Close#8746#9741
Squashed commit of the following:
commit 20835037c98e7d2fac4206c372717a05a27c4790
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 15:27:17 2015 -0700
Use Enum for "_primary" preference
commit 325acbe4585179190a959ba3101ee63b99f1931a
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 14:32:41 2015 -0700
Use ?preference=_primary automatically for realtime GET operations
commit edd49434af5de7e55928f27a1c9ed0fddb1fb133
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 14:32:06 2015 -0700
Move engine creation into protected createNewEngine method
commit 67a797a9235d4aa376ff4af16f3944d907df4577
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 13:14:01 2015 -0700
Factor out AssertingSearcher so it can be used by mock Engines
commit 62b0c28df8c23cc0b8205b33f7595c68ff940e2b
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 11:43:17 2015 -0700
Use IndexMetaData.isIndexUsingShadowReplicas helper
commit 1a0d45629457578a60ae5bccbeba05acf5d79ddd
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 09:59:31 2015 -0700
Rename usesSharedFilesystem -> isOnSharedFilesystem
commit 73c62df4fc7da8a5ed557620a83910d89b313aa1
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 09:58:02 2015 -0700
Add MockShadowEngine and hook it up to be used
commit c8e8db473830fce1bdca3c4df80a685e782383bc
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 09:45:50 2015 -0700
Clarify comment about pre-defined mappings
commit 60a4d5374af5262bd415f4ef40f635278ed12a03
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 09:18:22 2015 -0700
Add a test for shadow replicas that uses field data
commit 7346f9f382f83a21cd2445b3386fe67472bc3184
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 08:37:14 2015 -0700
Revert changes to RecoveryTarget.java
commit d90d6980c9b737bd8c0f4339613a5373b1645e95
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 08:35:44 2015 -0700
Rename `ownsShard` to `canDeleteShardContent`
commit 23001af834d66278ac84d9a72c37b5d1f3a10a7b
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 08:35:25 2015 -0700
Remove ShadowEngineFactory, add .newReadOnlyEngine method in EngineFactory
commit b64fef1d2c5e167713e869b22d388ff479252173
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 18 08:25:19 2015 -0700
Add warning that predefined mappings should be used
commit a1b8b8cf0db49d1bd1aeb84e51491f7f0de43b59
Author: Lee Hinman <lee@writequit.org>
Date: Tue Feb 17 14:31:50 2015 -0700
Remove unused import and fix index creation example in docs
commit 0b1b852365ceafc0df86866ac3a4ffb6988b08e4
Merge: b9d1fed a22bd49
Author: Lee Hinman <lee@writequit.org>
Date: Tue Feb 17 10:56:02 2015 -0700
Merge remote-tracking branch 'refs/remotes/origin/master' into shadow-replicas
commit b9d1fed25ae472a9dce1904eb806702fba4d9786
Merge: 4473e63 41fd4d8
Author: Lee Hinman <lee@writequit.org>
Date: Tue Feb 17 09:02:27 2015 -0700
Merge remote-tracking branch 'refs/remotes/origin/master' into shadow-replicas
commit 4473e630460e2f0ca2a2e2478f3712f39a64c919
Author: Lee Hinman <lee@writequit.org>
Date: Tue Feb 17 09:00:39 2015 -0700
Add asciidoc documentation for shadow replicas
commit eb699c19f04965952ae45e2caf107124837c4654
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 16:15:39 2015 +0100
remove last nocommit
commit c5ece6d16d423fbdd36f5d789bd8daa5724d77b0
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 16:13:12 2015 +0100
simplify shadow engine
commit 45cd34a12a442080477da3ef14ab2fe7947ea97e
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 11:32:57 2015 +0100
fix tests
commit 744f228c192602a6737051571e040731d413ba8b
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 11:28:12 2015 +0100
revert changes to IndexShardGateway - these are leftovers from previous iterations
commit 11886b7653dabc23655ec76d112f291301f98f4a
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 11:26:48 2015 +0100
Back out non-shared FS code. this will go in in a second iteration
commit 77fba571f150a0ca7fb340603669522c3ed65363
Merge: e8ad614 2e3c6a9
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 11:16:46 2015 +0100
Merge branch 'master' into shadow-replicas
Conflicts:
src/main/java/org/elasticsearch/index/engine/Engine.java
commit e8ad61467304e6d175257e389b8406d2a6cf8dba
Merge: 48a700d 1b8d8da
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 10:54:20 2015 +0100
Merge branch 'master' into shadow-replicas
commit 48a700d23cff117b8e4851d4008364f92b8272a0
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 10:50:59 2015 +0100
add test for failing shadow engine / remove nocommit
commit d77414c5e7b2cde830a8e3f70fe463ccc904d4d0
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 17 10:27:56 2015 +0100
remove nocommits in IndexMetaData
commit abb696563a9e418d3f842a790fcb832f91150be2
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Feb 16 17:05:02 2015 +0100
remove nocommit and simplify delete logic
commit 82b9f0449108cd4741568d9b4495bf6c10a5b019
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Feb 16 16:45:27 2015 +0100
reduce the changes compared to master
commit 28f069b6d99a65e285ac8c821e6a332a1d8eb315
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Feb 16 16:43:46 2015 +0100
fix primary relocation
commit c4c999dd61a44a7a0db9798275a622f2b85b1039
Merge: 2ae80f9 455a85d
Author: Simon Willnauer <simonw@apache.org>
Date: Mon Feb 16 15:04:26 2015 +0100
Merge branch 'master' into shadow-replicas
commit 2ae80f9689346f8fd346a0d3775a6341874d8bef
Author: Lee Hinman <lee@writequit.org>
Date: Fri Feb 13 16:25:34 2015 -0700
throw UnsupportedOperationException on write operations in ShadowEngine
commit 740c28dd9ef987bf56b670fa1a8bcc6de2845819
Merge: e5bc047 305ba33
Author: Lee Hinman <lee@writequit.org>
Date: Fri Feb 13 15:38:39 2015 -0700
Merge branch 'master' into shadow-replicas
commit e5bc047d7c872ae960d397b1ae7b4b78d6a1ea10
Author: Lee Hinman <lee@writequit.org>
Date: Fri Feb 13 11:38:09 2015 -0700
Don't replicate document request when using shadow replicas
commit 213292e0679d8ae1492ea11861178236f4abd8ea
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Feb 13 13:58:05 2015 +0100
add one more nocommit
commit 83d171cf632f9b77cca9de58505f7db8fcda5599
Merge: aea9692 09eb8d1
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Feb 13 13:52:29 2015 +0100
Merge branch 'master' into shadow-replicas
commit aea96920d995dacef294e48e719ba18f1ecf5860
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Feb 13 09:56:41 2015 +0100
revert unneeded changes on Store
commit ea4e3e58dc6959a92c06d5990276268d586735f3
Author: Lee Hinman <lee@writequit.org>
Date: Thu Feb 12 14:26:30 2015 -0700
Add documentation to ShadowIndexShard, remove nocommit
commit 4f71c8d9f706a0c1c39aa3a370efb1604559d928
Author: Lee Hinman <lee@writequit.org>
Date: Thu Feb 12 14:17:22 2015 -0700
Add documentation to ShadowEngine
commit 28a9d1842722acba7ea69e0fa65200444532a30c
Author: Lee Hinman <lee@writequit.org>
Date: Thu Feb 12 14:08:25 2015 -0700
Remove nocommit, document canDeleteIndexContents
commit d8d59dbf6d0525cd823d97268d035820e5727ac9
Author: Lee Hinman <lee@writequit.org>
Date: Thu Feb 12 10:34:32 2015 -0700
Refactor more shared methods into the abstract Engine
commit a7eb53c1e8b8fbfd9281b43ae39eacbe3cd1a0a6
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Feb 12 17:38:59 2015 +0100
Simplify shared filesystem recovery by using a dedicated recovery handler that skip
most phases and enforces shard closing on the soruce before the target opens it's engine
commit a62b9a70adad87d7492c526f4daf868cb05018d9
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Feb 12 15:59:54 2015 +0100
fix compile error after upstream changes
commit abda7807bc3328a89fd783ca7ad8c6deac35f16f
Merge: f229719 35f6496
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Feb 12 15:57:28 2015 +0100
Merge branch 'master' into shadow-replicas
Conflicts:
src/main/java/org/elasticsearch/index/engine/Engine.java
commit f2297199b7dd5d3f9f1f109d0ddf3dd83390b0d1
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Feb 12 12:41:32 2015 +0100
first cut at catchup from primary
make flush to a refresh
factor our ShadowIndexShard to have IndexShard be idential to the master and least intrusive
cleanup abstractions
commit 4a367c07505b84b452807a58890f1cbe21711f27
Author: Simon Willnauer <simonw@apache.org>
Date: Thu Feb 12 09:50:36 2015 +0100
fix primary promotion
commit cf2fb807e7e243f1ad603a79bc9d5f31a499b769
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 16:45:41 2015 -0700
Make assertPathHasBeenCleared recursive
commit 5689b7d2f84ca1c41e4459030af56cb9c0151eff
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 15:58:19 2015 -0700
Add testShadowReplicaNaturalRelocation
commit fdbe4133537eaeb768747c2200cfc91878afeb97
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 15:28:57 2015 -0700
Use check for shared filesystem in primary -> primary relocation
Also adds a nocommit
commit 06e2eb4496762130af87ce68a47d360962091697
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 15:21:32 2015 -0700
Add a test checking that indices with shadow replicas clean up after themselves
commit e4dbfb09a689b449f0edf6ee24222d7eaba2a215
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 15:08:18 2015 -0700
Fix segment info for ShadowEngine, remove test nocommit
commit 80cf0e884c66eda7d59ac5d59235e1ce215af8f5
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 14:30:13 2015 -0700
Remove nocommit in ShadowEngineTests#testFailStart()
commit 5e33eeaca971807b342f9be51a6a566eee005251
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 14:22:59 2015 -0700
Remove overly-complex test
commit 2378fbb917b467e79c0262d7a41c23321bbeb147
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 13:45:44 2015 -0700
Fix missing import
commit 52e9cd1b8334a5dd228d5d68bd03fd0040e9c8e9
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 13:45:05 2015 -0700
Add a test for replica -> primary promotion
commit a95adbeded426d7f69f6ddc4cbd6712b6f6380b4
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 12:54:14 2015 -0700
Remove tests that don't apply to ShadowEngine
commit 1896feda9de69e4f9cf774ef6748a5c50e953946
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 10:29:12 2015 -0700
Add testShadowEngineIgnoresWriteOperations and testSearchResultRelease
commit 67d7df41eac5e10a1dd63ddb31de74e326e9d38b
Author: Lee Hinman <lee@writequit.org>
Date: Wed Feb 11 10:06:05 2015 -0700
Add start of ShadowEngine unit tests
commit ca9beb2d93d9b5af9aa6c75dbc0ead4ef57e220d
Merge: 2d42736 57a4646
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Feb 11 18:03:53 2015 +0100
Merge branch 'master' into shadow-replicas
commit 2d42736fed3ed8afda7e4aff10b65d292e1c6f92
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Feb 11 17:51:22 2015 +0100
shortcut recovery if we are on a shared FS - no need to compare files etc.
commit 24d36c92dd82adce650e7ac8e9f0b43c83b2dc53
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Feb 11 17:08:08 2015 +0100
utilize the new delete code
commit 2a2eed10f58825aae29ffe4cf01aefa5743a97c7
Merge: 343dc0b 173cfc1
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Feb 11 16:07:41 2015 +0100
Merge branch 'master' into shadow-replicas
Conflicts:
src/main/java/org/elasticsearch/gateway/GatewayMetaState.java
commit 343dc0b527a7052acdc783ac5abcaad1ef78dbda
Author: Simon Willnauer <simonw@apache.org>
Date: Wed Feb 11 16:05:28 2015 +0100
long adder is not available in java7
commit be02cabfeebaea74b51b212957a2a466cfbfb716
Author: Lee Hinman <lee@writequit.org>
Date: Tue Feb 10 22:04:24 2015 -0700
Add test that restarts nodes to ensure shadow replicas recover
commit 7fcb373f0617050ca1a5a577b8cf32e32dc612b0
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 10 23:19:21 2015 +0100
make test more evil
commit 38135af0c1991b88f168ece0efb72ffe9498ff59
Author: Simon Willnauer <simonw@apache.org>
Date: Tue Feb 10 22:25:11 2015 +0100
make tests pass
commit 05975af69e6db63cb95f3e40d25bfa7174e006ea
Author: Lee Hinman <lee@writequit.org>
Date: Mon Jan 12 18:44:29 2015 +0100
Add ShadowEngine
Change bucket key_as_string to reflect `time_zone` parameter. Currently `time_zone`
shifts bucket boundaries to other time zone, but keys are displayed in UTC, so e.g.
daily buckets in "+01:00" time zone have key_as_string like "2014-01-01T23:00:00Z". With this
change the default is to format this dates according to the local time zone, so the
above bucket key would be "2014-01-02T00:00:00+01:00".
Closes#9710Closes#9744
Today we trash everything that has been indexed but not flushed to disk
if the engine is closed. This might not be desired if we shutting down a
node for restart / upgrade or if we close / archive an index. In such a
case we would like to flush the transaction log and commit everything to
disk. This commit adds a flag to the close method that is set on close
and shutdown but not when we remove the shard due to relocations
The nested scope is set by any nested feature, so that sub nested queries and filters know about their context and these sub nested queries and filters can construct the right parent filter.
Removed the LateBindingParentFilter workaround in the nested query parser in favour of the nested scope maintained in the query parse context.
Due to this change nested queries and filters can now also be included in nested sorting and inner hits, because those features also now use the nested scope.
This change doesn't fix the usage of nested filters in nested and reverse_nested aggregations. The `nested` filter shouldn't be used inside these aggregations and instead the `nested` and `reverse_nested` aggs should be used to query on the right level. In a different change `nested` inside a `nested` and `reverse_nested` aggregation should result in a parse error.
Closes#9305
Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of
the simpler `time_zone` option. Previously, specifying different values for these could
lead to confusing scenarios where ES would return bucket keys that are not UTC.
Now `time_zone` is the only option setting, the calculation of date buckets to take place in the
preferred time zone, but after rounding converting the bucket key values back to UTC.
Closes#9062Closes#9637
When asking for term statistics, generating term vectors on the fly or with
`dfs` set to `true`, some requests may take a while, so it is useful to know
exactly how long.
Closes#9583
Negative settings for interval in date_histogram could lead to OOM errors in conjunction
with min_doc_count=0. This fix raises exceptions in the histogram builder and the
TimeZoneRounding classes so that the query fails before this can happen.
Closes#9634Closes#9690
Today we sometimes have to transfer files without verifying the checksum
ie. if the file had an old alder32 checksum but was using random access
while writing such that we can only verify they files length. We will likely
not detect corruptions there and with the new checks during recovery finalization
we might run into corrupt index exceptions in that stage. This causes
the primary to be failed as well since we don't handle the exception today. This commit
adds better handling and a test for this scenario.
Groovy was disabled by default, but we turn it on in our test infra. We can then declare support for it so we go and execute script related tests as part of the REST tests suite.
With #9629 we introduced REST spec validation, which barfs whenever the REST spec don't follow the defined conventions. That said, we sometimes execute tests against previous branches and tags which have spec that needs fixing but we can't go back and fix them. We now support the `-Dtests.rest.validate_spec` system property that allows to turn off REST spec validation (enabled by default) so that we can still run tests against old branches/tags.
When multiple fields under object fields share the same name, accessing
by short name is ambiguous. This removes support for short names,
always requiring the full name when used in queries.
closes#8872
This moves the rule, so it is made available in the test.jar. In
addition, you can now specify the exception, which triggers a rerun
of the test in order to make it reusable for others.
Also ensured that the NettyTransportTest frees all resources inside
of its testing method instead of pre/post running methods, as those
are still called only once, even though a failed test might be repeated.
Using '_cat/segments' or the indices segments api without matching any index
now returns empty result instead of throwing IndexMissingException.
Closes#9219
Aggregators now return a new collector instance per segment, like Lucene 5 does
with its oal.search.Collector API. This is important for us because things like
knowing whether the field is single or multi-valued is only known at a segment
level.
In order to do that I had to change aggregators to notify their sub aggregators
of new incoming segments (pretty much in the spirit of #6477) while everything
used to be centralized in the AggregationContext class. While this might slow
down a bit deeply nested aggregation trees, this also makes the children
aggregation and the `breadth_first` collection mode much better options since
they can now only replay what they need while they used to have to replay the
whole aggregation tree.
I also took advantage of this big refactoring to remove some abstractions that
were not really required like ValuesSource.MetaData or BucketAnalysisCollector.
I also splitted Aggregator into Aggregator and AggregatorBase in order to
separate the Aggregator API from implementation helpers.
Close#9544
Whenever we have an api that supports GET with a body, we always support the POST method too, as well as providing the body as a query_string parameter called `source`. Our REST spec should reflect this convention. FIxed them and introduced a hard check at parse time in our Java REST tests runner, which will cause the tests to fail if spec are not compliant.
Closes#9629
When using the CLI tool infrastructure, a command can potentially write
a new file. In case it overwrites an existing one, you may want to ensure
that the permissions, the owner and the group are kept the same and do not
accidentally change when overwriting those files.
This PR introduces a command that allows you to execute this check per path.
It also adds a new testing dependency, namely jimfs, which allows you to create
in-memory filesystems with certain properties (like supporting or not posix permissions
on this filesystem), so that you can test those features, without executing
tests on a certain operating system.
The FileSystemUtils class has a helper method to create files with
a .new suffix, in case the file, which should be created already
exists. If you install plugins and those have configuration files,
even without changes, you will end up with tons of .new files.
This commit checks the file size and sha-256 sum, and only if those
differ, a .new file is actually being created.
On CI machines node recovery sometimes takes up to 2 seconds. When it happens an update cluster state task gets stuck behind the recovery and tests fail with 1 second timeout. This commit makes sure that we wait for recovery to complete before starting the clock.
This has been very trappy. Rather than continue to allow buggy behavior
of having upgrade/optimize requests sidestep the single shard per node
limits optimize is supposed to be subject to, this removes
the ability to run the upgrade/optimize async.
closes#9638
Today the logic related to deleting an index is spread across several
classes which makes changes to this rather delicate part of the code-base
very difficult. This commit consolidates this logic into the IndicesService
and moves the handling of ack-ing the delete to the master entirely into
`IndicesClusterStateService`.
_id and _routing now no longer support the 'path' setting on indexes
created with 2.0. Indexes created before 2.0 still support this
setting for backcompat.
closes#6730
Improve cleanup of updateTask timeout handlers. The timeout handlers should be removed as soon as a corresponding update task is processed. Otherwise, timeout handlers might keep old updateTasks and all objects that they are pointing to in memory for the duration of timeout (15 minutes by default).
Fixes#9621
The engine is already pretty complex, it's still confulated with
code that doesn't necessarily belong there. Updateing the settings from
the settings service can be done on the level above. This commit cleans up
the settings code in the engine and moves it to the IndexShard.
Until lately we couldn't close the engine in a tragic event due to
some the lock order and all it's complications. Now that the engine
is much more simplified in terms of having a single IndexWriter etc.
we don't necessarily need the write-lock on close anymore and can
easily just close and continue.
InternalEngine contains a number of inner classes that it uses, however,
this makes the class overly large and hard to extend. In order to be
able to easily add other Engines (such as the ShadowEngine), these
helping methods have been extracted into an AbstractEngine class. The
classes that were previously in `InternalEngine` have been moved to
separate classes, which will allow for better unit testing as well.
None of the functionality of InternalEngine has been changed, this is
only refactoring.
Note that this is a change I originally made on my shadow-replica
branch, however it is easier to review piecemeal so I extracted it into
a separate PR.
Sometimes by the time update settings is called the second node is not in the cluster yet. As a result change of minimum master node settings to 2 is ignored making this test to fail.
Add offset option to 'date_histogram' replacing and simplifying the previous 'pre_offset' and 'post_offset' options.
This change is part of a larger clean up task for `date_histogram` from issue #9062.
Due to the possibility of ports being already used when choosing a
random port, it makes sense to simply repeat a unit test upon a bind
exception.
This commit adds a junit rule, which does exactly this and does not
require you to change the test code and add loops.
Closes#9010
Closes#9587
Squashed commit of the following:
commit 23ac91dca4b949638ca1d3842fd6db2e00ee1d36
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Feb 5 18:42:28 2015 +0100
Do not compute scores if aggregations do not need it (like top_hits) or use a script (which might compute scores).
commit 51262fe2681c067337ca41ab88096ef80a2e8ebb
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Feb 5 15:58:38 2015 +0100
Fix more compile errors.
commit a074895d55b8b3c898d23f7f5334e564d5271a56
Author: Robert Muir <rmuir@apache.org>
Date: Thu Feb 5 09:31:22 2015 -0500
fix a few more obvious ones
commit 399c41186cb3c9be70107f6c25b51fc4844f8fde
Author: Robert Muir <rmuir@apache.org>
Date: Thu Feb 5 09:28:32 2015 -0500
fix some collectors and queries
commit 5f46c2f846c5020d5749233b71cbe66ae534ba51
Author: Robert Muir <rmuir@apache.org>
Date: Thu Feb 5 09:24:24 2015 -0500
upgrade to lucene r1657571
After phase1 of recovery is completed, we check that all pending mapping changes have been sent to the master and processed by the other nodes. This is needed in order to make sure that the target node has the latest mapping (we just copied over the corresponding lucene files). To make sure we do not miss updates, we do so under a local cluster state update task. At the moment we don't have a timeout when waiting on the task to be completed. If the local node update thread is very busy, this may stall the recovery for too long. This commit adds a timeout (equal to `indices.recovery.internal_action_timeout`) and upgrade the task urgency to `IMMEDIATE`. If we fail to perform the check, we fail the recovery.
Closes#9575
That method checks that files were release properly, but also clears a static map holding references to mock directories. Since we iterate on many indexes this created memory pressure.
This commit removes the FlushType entirely and replaces it in the most places with
a simple `Engine#flush()` call. Flushing without committing the translog is now
entirely private to the engine and is only called in one place.
The `full` option and `FlushType.NEW_WRITER` only exists to allow
realtime changes to two settings (`index.codec` and `index.concurrency`).
Those settings are very expert and don't really need to be updateable
in realtime.
When the master publishes a new cluster state it waits (by default) for up to 30s for all nodes to respond. If not it continues to process other pending tasks. At the moment, this timeout is logged under DEBUG but it typically represent a serious issue with one or more of the nodes. We should log it in WARN and give the nodes that failed to respond in a timefly fashion
Closes#9551
We currently have the IndicesRequest interface to mark indices related requests and be able to retrieve the indices they relate to in a generic way. This commit introduces a similar abstraction for requests that manage aliases, to be able to retrieve/replace the aliases they relate to.
Also, IndicesAliasesRequest becomes a CompositeIndicesRequest, as it allows to perform multiple operations (e.g. add/remote multiple aliases). Each single operation (AliasActions) implements now the newly introduced AliasesRequest.
AliasesRequest is also implemented by GetAliasesRequest, which allows to retrieve aliases information.
Closes#9460
In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.
Closes#9489Closes#9541
When closing an instance of RestClient, the connection manager gets shutdown, which makes it not usable anymore. If that is static, like it is now, no RestClient will work anymore from that moment on. Each instance of RestClient should have its own instance of connection manager
This method is heavy as it builds a bitset out of a DocIdSet in order to be
able to provide random-access. Now that Lucene has removed out-of-order scoring
true random-access is very rarely needed and we could instead return an Bits
instance that wraps the iterator. Ideally, we would use the DISI API directly
but I have to admit that the Bits API is more friendly.
Close#9546
We had a REST test that relied on matching a json response against a regex. It worked but the match wasn't done against the actual json object, but its java map representation converted into a string by calling `toString`. Since all other clients test runners don't work in this case, as they try to match a json object against a regex, we should do the same and prevent it from working.
The current "checkindex" on startup is very very expensive. This is
like running one of the old school hard drive diagnostic checkers and
usually not a good idea.
But we can do a CRC32 verification of files. We don't even need to
open an indexreader to do this, its much more lightweight.
This option (as well as the existing true/false) are randomized in
tests to find problems.
Also fix bug where use of the current option would always leak
an indexwriter lock.
Closes#9183
Histogram aggregation supports an 'offset' option to move bucket boundaries.
In a histogram with buckets of size X these can be moved from 0, X, 2X, 3X,...
by an offset value of Y to Y, X+Y, 2X+Y, 3X+Y... by using the 'offset' option.
The previous 'pre_offset' and 'post_offset' options are removed in favour of
the simplified 'offset' option.
Closes#9417Closes#9505
Until now, there was no possibility to expose infos about configured
transport profiles. This commit adds the ability to expose those
information in the TransportInfo class.
The channel was well as the netty pipeline handler now also contain
the profile they were configured for, as this information cannot be
extracted elsewhere.
In addition, each profile now can set its own publish host and port,
which might be needed in case of portforwarding or using docker.
Closes#9134
This callback is executed only once, on the master node during an
index's creation. An exception thrown during this listener will cancel
the index creation.
This also adds checks in `IndicesClusterStateService` for the
indexService being null as well as if the `indicesService.createIndex`
throws an exception on data nodes after an index has already been
created.
#8720 introduced a timeout mechanism for ongoing recoveries, based on a last access time variable. In the many iterations on that PR the update of the access time was lost. This adds it back, including a test that should have been there in the first place.
Closes#9506
The query-cache has an optimization to not deserialize the bytes at the shard
level. However this is a bit fragile since it assumes that serialized streams
can be concatenanted (which is not the case with shared strings) and also does
not update the QueryResult object that is held by the SearchContext. So you
need to make sure to use the right one.
With this change, the query cache just deserializes bytes into the QueryResult
object from the context.
Close#9500
Due to some unreleased refactorings we lost the persitence of
a perviously set values in MergePolicyProvider. This commit adds this
back and adds a simple unittest.
Closes#8890
provided in the search
Currently, doing a field lookup within a terms agg will restrict the
fields available to those within the types passed into the search
request. However, when doing sub aggs within a children agg, the
fields available should not be restricted to those of the search.
This change makes the field lookup use the index level mapper service.
The optimization we do in the HandlesStreamInput / Output
adds a lot of complexity with a rather unknown benefit. It tries
to compress commonly used strings and write ids instead. This
should rather be done on a lower level if at all necessary for
the small message we send over the network.
Today we have a dirty flag indicating that a refresh must
be executed. We also allow users to bypass this by setting
a force=true boolean on the refresh request / command. All
these flags are unneeded since the SearcherManager has all
the information to do the right thing if it's dirty or not.
This pr removes the optimization for auto generated ids.
Previously, when ids were auto generated by elasticsearch then there was no
check to see if a document with same id already existed and instead the new
document was only appended. However, due to lucene improvements this
optimization does not add much value. In addition, under rare circumstances it might
cause duplicate documents:
When an indexing request is retried (due to connect lost, node closed etc),
then a flag 'canHaveDuplicates' is set to true for the indexing request
that is send a second time. This was to make sure that even
when an indexing request for a document with autogenerated id comes in
we do not have to update unless this flag is set and instead only append.
However, it might happen that for a retry or for the replication the
indexing request that has the canHaveDuplicates set to true (the retried request) arrives
at the destination before the original request that does have it set false.
In this case both request add a document and we have a duplicated a document.
This commit adds a workaround: remove the optimization for auto
generated ids and always update the document.
The asumtion is that this will not slow down indexing more than 10 percent,
see: http://benchmarks.elasticsearch.org/closes#8788closes#9468
Additionally, this setting can be specified in elasticsearch.yml if
desired, to pre-populate the list of methods to be added to the default
blacklist.
When making a change to this setting dynamically, the entire blacklist
is logged as well.
The `analyzer` setting is now the base setting, and `search_analyzer`
is simply an override of the search time analyzer. When setting
`search_analyzer`, `analyzer` must be set.
closes#9371
Using the `script.groovy.sandbox.method_blacklist_patch` setting, the
blacklist can be dynamically *added* to by specifying a comma-separated
list of methods (for example, "toString,size" would add .toString and
.size to the blacklist).
When the `script.groovy.sandbox.method_blacklist_patch` setting is
changed, the script cache is cleared to force new scripts to be
recompiled. Additionally the on-disk cache is cleared so that scripts in
the `config/scripts` directory are re-compiled as well.
This also fixes an issue where script engines were injected more than
once, which can cause multiple instances of the script engine per node.
Extended_stats now displays the upper and lower bounds on standard deviations (e.g. avg +/- std).
Default is to show 2 std above/below, but can be changed using the `sigma` parameter.
Accepts non-negative doubles
Closes#9356
To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0.
I also added a simple recovery benchmark.
Closes#9439
If an index is deleted during initial state of the snapshot operation, the entire snapshot can fail with NPE. This commit improves handling of this situation and allows snapshot to continue if partial snapshots are allowed.
Closes#9024
PR #8672 addresses ambiguous polygons - those that either cross the dateline or span the map - by complying with the OGC standard right-hand rule. Since ```GeoPolygonFilter``` is self contained logic, the fix in #8672 did not address the issue for the ```GeoPolygonFilter```. This was identified in issue #5968
This fixes the ambiguous polygon issue in ```GeoPolygonFilter``` by moving the dateline crossing code from ```ShapeBuilder``` to ```GeoUtils``` and reusing the logic inside the ```pointInPolygon``` method. Unit tests are added to ensure support for coordinates specified in either standard lat/lon or great-circle coordinate systems.
closes#5968closes#9304
The InternalClusterInfoService reaches out to the nodes to get information about their disk usage and shard store size. Upon a node level error we currently remove the node info from the local cache. We should also clear the cache when we run into an error on the action level (excluding any info from all nodes).
This also adds settings for the timeout used when waiting for nodes.
Closes#9449
This change makes the response API object for Histogram Aggregations the same for all types of Histogram, and does the same for all types of Ranges.
The change removes getBucketByKey() from all aggregations except filters and terms. It also reduces the methods on the Bucket class to just getKey() and getKeyAsString().
The getKey() method returns Object and the actual Type is returns will be appropriate for the type of aggregation being run. e.g. date_histogram will return a DateTime for this method and Histogram will return a Number.
Apparently some filesystems such as ZFS and occasionally NTFS can report
filesystem usages that are negative, or above the maximum total size of
the filesystem. This relaxes the constraints on `DiskUsage` so that an
exception is not thrown.
If 0 is passed as the totalBytes, `.getFreeDiskAsPercentage()` will
always return 100.0% free (to ensure the disk threshold decider fails
open)
Fixes#9249
Relates to #9260
This bug was introduced by #8454 which allowed the childFilter to only be consumed once. By adding the child docid buffering multiple buckets can now be emitted by the same doc id. This child docid buffering only happens in the scope of the current root document, so the amount of child doc ids buffered is small.
Closes#9317Closes#9346
The fix is the move the parent filter resolving from the nextReader(...) method to the collect(...) method, because only then any parent nested filter's parent filter is then properly instantiated.
Closes#9280Closes#9335
We want to check if at least the primaries succeeded if we do not
wait for green and not if all succeeded if we wait for green.
That was a misconception in c617af37e8
Requests are sent to two shard copies in case a shard is relocating.
This will show up in the the _shards header. Therefore we must check
with greaterThanOrEqualTo(..).
Adding missing support for the multi-index query parameters 'ignore_unavailable',
'allow_no_indices' and 'expand_wildcards' to '_cluster/state' API. These
parameters are supposed to be supported for APIs that work across multiple indices.
So far overwriting the default settings per REST call was not possible which is
fixed here.
Closes#5229Closes#9295
These two tests are confusing because they have the same class name in
different packages. This results in accidentally looking at the wrong
file when trying to open the test by class name. They are also
not "simple"..
Related to #9049.
By default, the default value for `timestamp` is `now` which means the date the document was processed by the indexing chain.
You can now reject documents which not provide a `timestamp` value by setting `ignore_missing` to false (default to `true`):
```js
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"ignore_missing" : false
}
}
}
```
When you update the cluster to 1.5 or master, this index created with 1.4 we automatically migrate an index created with 1.4 to the 1.5 syntax.
Let say you have defined this in elasticsearch 1.4.x:
```js
DELETE test
PUT test
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
PUT test/type/_mapping
{
"type" : {
"_timestamp" : {
"enabled" : true,
"default" : null
}
}
}
```
After migration, the mapping become:
```js
{
"test": {
"mappings": {
"type": {
"_timestamp": {
"enabled": true,
"store": false,
"ignore_missing": false
},
"properties": {}
}
}
}
}
```
Closes#8882.
At startup the tribe node ignores closed indices, but if you closed an index that was part of the tribe node cluster state, its state change was not currently handled. A NullPointerException could be seen in the logs instead as the routing table for the closed index was null. As a result, the index stayed in the tribe node cluster state in open state, although that didn't reflect reality. Also, subsequent cluster state updates happening in the tribe node kept failing, affecting updates related to any other index. The only way to recover from this was to restart the tribe node every time an index is closed on any tribe.
This commit properly handles index state changes, making sure that when an index gets closed it gets removed from the tribe node cluster state. Note that it makes little sense to keep the closed index around in the tribe node, as from the tribe node you can't do anything with it. The tribe node simply doesn't see any closed index, it's the same as if they didn't exist.
Closes#6411Closes#9334
This allows a plugin or user that registers a listener to be able to
stop actions like creating an index or starting a shard by throwing an
exception. Previously all exceptions were logged without being rethrown.
Before, if filter and query was defined for function_score, then the
filter was silently ignored. Now, if both is defined then function score
query wraps this in a filtered_query.
closes#8638closes#8675
ShapeBuilder threw a NPE when a polygon coordinate array consisted of a single LinearRing. This PR fixes the error handling to throw a more useful ElasticsearchParseException to provide the user with better insight into the problem.
This adds a new boolean (index.merge.scheduler.auto_throttle) dynamic
setting, default true (matching Lucene), to adaptively set the IO rate
limit for merges over time.
This is more flexible than the previous fixed rate throttling because
it responds depending on the incoming merge rate, so search-heavy
applications that are not doing much indexing will see merges heavily
throttled while indexing-heavy cases will lighten the throttle so
merges can keep up within incoming indexing.
The fixed rate throttling is still available as a fallback if things
go horribly wrong.
Closes#9243Closes#9133
previous push was partial by mistake, we still need the wrapped dirs around after being closed for the test infra, for now, explicitly clear it in the leak test (which is still bad apple)
I ran the bad apple test for index memory leaks and still saw leaks, it seems like we don't properly clean the dirs from the static mock test dir wrapper
validation tests for constants
Currently the snapshot flag for Version constants is only set to true
for CURRENT. However, this means that the snapshot state changes from
branch to branch. Instead, snapshot should be "is this version
released?". This change also adds a validation test checking that
ID -> constant and vice versa are correct, and fixes one bug found there
(for an unreleased version).
- don't allow for soft references anymore in the recycler
- remove some abusive thread locals
- don't recycle independently float/double and int/long pages, they are the
same and just interpret bits differently.
Close#9272
The query cache has a mechanism that disables it automatically when
SearchContext.nowInMillis() is used. One issue with that is that the date math
parser always evaluates the current timestamp when parsing a date, even if it
is not needed. As a consequence, whenever you use a date expression in your
queries, the query cache would not be used.
Close#9225
This change fixes _timestamp's serialization method to write out
`doc_values` and `doc_values_format`, which could already be set,
but would not be written out.
closes#8893closes#8967
This commits adds a test that simulate disconnecting nodes and dropping requests during the various stages of recovery and solves all the issues that were raised by it. In short:
1) On going recoveries will be scheduled for retry upon network disconnect. The default retry period is 5s (cross node connections are checked every 10s by default).
2) Sometimes the disconnect happens after the target engine has started (but the shard is still in recovery). For simplicity, I opted to restart the recovery from scratch (where little to no files will be copied again, because there were just synced).
3) To protected against dropped requests, a Recovery Monitor was added that fails a recovery if no progress has been made in the last 30m (by default), which is equivalent to the long time outs we use in recovery requests.
4) When a shard fails on a node, we try to assign it to another node. If no such node is available, the shard will remain unassigned, causing the target node to clean any in memory state for it (files on disk remain). At the moment the shard will remain unassigned until another cluster state change happens, which will re-assigned it to the node in question but if no such change happens the shard will remain stuck at unassigned. The commits adds an extra delayed reroute in such cases to make sure the shard will be reassinged
5) Moved all recovery related settings to the RecoverySettings.
Closes#8720
You can now specify `format` in the request definition for most numeric metric aggregations. The exceptions are Percentile_Ranks, Cardinality and Value_Count as the response type of these can be different from the field type so the formatter won't work.
Closes#6812
I have a field with a `null` [default `_timestamp` value](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html#mapping-timestamp-field-default) and when I try to update the mapping I get a server error caused by a `NullPointerException`
```
[2015-01-08 17:28:56,040][DEBUG][action.admin.indices.mapping.put] [...] failed to put mappings on indices [[feed_170_v1, feed_204_v1, feed_229_v1, feed_232_v1, feed_239_v1, feed_248_v1, feed_268_v1, feed_256_v1, feed_272_v1, feed_159_v1, feed_255_v1, feed_164_v1, feed_259_v1, feed_266_v1, feed_188_v1, feed_240_v1, feed_233_v1, feed_13_v1, feed_184_v1, feed_261_v1, feed_267_v1, feed_271_v1, feed_257_v1, feed_172_v1, feed_238_v1, feed_254_v1, feed_223_v1, feed_274_v1, feed_203_v1, feed_269_v1, feed_262_v1, feed_205_v1, feed_168_v1, feed_219_v1, feed_253_v1, feed_251_v1, feed_173_v1, feed_252_v1, feed_210_v1, feed_216_v1, feed_218_v1, feed_118_v1, feed_273_v1, feed_227_v1, feed_166_v1, feed_213_v1, feed_226_v1]], type [history]
java.lang.NullPointerException
at org.elasticsearch.index.mapper.internal.TimestampFieldMapper.merge(TimestampFieldMapper.java:287)
at org.elasticsearch.index.mapper.object.ObjectMapper.merge(ObjectMapper.java:936)
at org.elasticsearch.index.mapper.DocumentMapper.merge(DocumentMapper.java:693)
at org.elasticsearch.cluster.metadata.MetaDataMappingService$4.execute(MetaDataMappingService.java:508)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:329)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
```
https://github.com/elasticsearch/elasticsearch/blob/v1.4.2/src/main/java/org/elasticsearch/index/mapper/internal/TimestampFieldMapper.java#L286
Looks like the existence of default timestamp is not checked before use. The next line also has the same issue -- uses of default timestamp without checked to see if it's not null.
To reproduce:
```
$ curl -XPUT localhost:9200/twitter2
$ curl -XPUT localhost:9200/twitter2/tweet/_mapping -d '{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"default" : null
}
}
}'
$ curl -XPUT localhost:9200/twitter2/tweet/_mapping -d '{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"default" : null
},
"properties": {
"user": {"type": "string"}
}
}
}'
```
Closes#9204.
(cherry picked from commit 62c6d63)
Before Elasticsearch 1.0, the type was allowed to be passed as the root
element when uploading a document. However, this was ambiguous if the
mappings also contained a field with the same name as the type. The
behavior was changed in 1.0 to not allow this, but a setting was added
for backwards compatibility. This change removes the setting for 2.0.
The header indicates to how many shard copies (primary and replicas shards) a write was supposed to go to, to how many
shard copies to write succeeded and potentially captures shard failures if writing into a replica shard fails.
For async writes it also includes the number of shards a write is still pending.
Closes#7994
This commit removes most of the Engine abstractions and removes
Engine exposure via dependency injection. It also removes the Holder
abstraction and makes the engine itself start at constrcution time.
It removes the start method from the engine entire which means no engine
instances exists until it's started. There is also no way to stop the
engine to restart, it needs to be an entire new Engine
The multi percolate shard responses are collected in an atomic array which uses the shard id is used as index, but the number of shards the multi percolate request was meant to go to was used as size of this array instead the total number of shards an index has. This caused the exception when routing was used.
Closes#6214
Once we delete the the index on a node we are closing all resources
and subsequently need to delete all shards contents from disk. Yet
this happens today under a lock (the shard lock) that needs to be
acquried in order to execute any operation on the shards data
path. We try to delete all the index meta-data once we acquired
all the shard lock but this operation can run into a timeout which causes
the index to remain on disk. Further, all shard data will be left on
disk if the timeout is reached.
This commit removes all the shards data just before the shard lock
is release as the last operation on a shard that belongs to a deleted
index.
additional element per segment.
This commit adds a verbose flag to the _segments api. Currently the
only additional information returned when set to true is the full
ram tree from lucene for each segment.
The `cluster.routing.allocation.balance.primary` setting has caused
a lot of confusion in the past while it has very little benefit form a
shard allocatioon point of view. Users tend to modify this value to
evently distribute primaries across the nodes which is dangerous since
a prmiary flag on it's own can trigger relocations. The primary flag for a shard
is should not have any impact on cluster performance unless the high level feature
suffereing from primary hotspots is buggy. Yet, this setting was intended to be a
tie-breaker which is not necessary anymore since the algorithm is deterministic.
This commit removes this setting entriely.
In some situations the shard balanceing weight delta becomes negative. Yet,
a negative delta is always treated as `well balanced` which is wrong. I wasn't
able to reproduce the issue in any way other than useing the real world data
from issue #9023. This commit adds a fix for absolute deltas as well as a base
test class that allows to build tests or simulations from the cat API output.
Closes#9023
A couple of changes that triggerred a refactoring in Elasticsearch:
- LUCENE-6148: Accountable.getChildResources returns a collection instead of
a list.
- LUCENE-6121: CachingTokenFilter now propagates reset(), as a result
SimpleQueryParser.newPossiblyAnalyzedQuery has been fixed to not reset both
the underlying stream and the wrapper (otherwise lucene would barf because of
a doubl reset).
- LUCENE-6119: The auto-throttle issue changed a couple of method
names/parameters. It also made
`UpdateSettingsTests.testUpdateMergeMaxThreadCount` dead slow so I muted this
test until we clea up merge throttling to use LUCENE-6119.
Close#9145
In the case you try to merge two settings, one being an array and one being
a field, together, the settings were merged instead of being overridden.
First config
my.value: 1
Second config
my.value: [ 2, 3 ]
If you execute
settingsBuilder().put(settings1).put(settings2).build()
now only values 2,3 will be in the final settings
Closes#8381
If bulk index request fails due to a disconnect, unavailable shard etc, the request is
retried once before actually failing. However, even in case of failure the documents
might already be indexed. For autogenerated ids the request must not add the
documents again and therfore canHaveDuplicates must be set to true.
closes#8788
A recent situation occured where a MultiPolygon coordinate array was accidentally defined as a single polygon with multiple holes. Since the intent was a MultiPolygon, the holes of the unintended Polygon fell outside the outer shell. This exposed a bug in the contains logic inside BasePolygonBuilder. An ArrayIndexOutOfBoundsException was being thrown instead of a more useful ElasticsearchParseException( "hole is not within polygon" ). This pull request fixes the bug and adds additional unit tests for verifying proper MultiPolygon type parsing.
closes#9071
if we reopen an index and the majority of the replicas where
not created the reopen will fail sicne on master this runs with
local gatway all the time.
ShapeBuilder expected coordinates for Envelope types in strict Top-Left, Bottom-Right order. Given that GeoJSON does not enforce coordinate order (as seen in #8672) clients could specify envelope bounds in any order and be compliant with the GeoJSON spec but not the ES ShapeBuilder logic. This change loosens the ShapeBuilder requirements on envelope coordinate order, reordering where necessary.
closes#2544closes#9067closes#9079closes#9080
Cleans up the testReusePeerRecovery test as well
The actual fix is in TransportNodesListShardStoreMetaData.java, which
needs to use `nodeEnv.shardDataPaths` instead of `nodeEnv.shardPaths`.
Due to the difficulty in tracking this down, I've added a lot of
additional logging. This also fixes a logging issue in GatewayAllocator
This allows specifying the path an index will be at.
`index.data_path` is specified in the settings when creating an index,
and can not be dynamically changed.
An example request would look like:
POST /myindex
{
"settings": {
"number_of_shards": 2,
"data_path": "/tmp/myindex"
}
}
And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1
Since this can be used to write data to arbitrary locations on disk, it
requires enabling the `node.enable_custom_paths` setting in
elasticsearch.yml on all nodes.
Relates to #8976
Running a terms filter on a single term is equivalent to loading a postings
list into a bit set and then returning the bit set instead of reading the
postings list on the fly.
Close#9014
This feature adds an optional orientation parameter to the GeoJSON document and geo_shape mapping enabling users to explicitly define how they want Elasticsearch to interpret vertex ordering. The default uses the right-hand rule (counterclockwise for outer ring, clockwise for inner ring) complying with OGC Simple Feature Access standards. The parameter can be explicitly specified for an entire index using the geo_shape mapping by adding "orientation":{"left"|"right"|"cw"|"ccw"|"clockwise"|"counterclockwise"} and/or overridden on each insert by adding the same parameter to the GeoJSON document.
closes#8764
When I originally wrote the transform feature I didn't think that the
XContentType of the reencoded source mattered. It actually matters because
payloads for the completion suggester are stored and returned exactly
as encoded by this XContentType.
This revision changes the transform feature from always reencoding with smile
to always reencoding with the provided XContentType to support the completion
suggester.
Closes#8959
This commit adds the support for the Ctrl-Close event on Windows using native system calls. This way, it is possible to catch the Ctrl-Close event sent by a 'taskill /pid' command (or when the user closes the console window where elasticsearch.bat was started) and gracefully close the node. Before this commit, the node was simply killed on taskkill/window closing.
There was a race condition in the test in the case where the nodes fault detection would manage to send and initial ping, followed by 2 attempts before the target service was disconnected.
RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded.
RecoverySource already had a handy utility called CancellableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well.
Closes#9000
Up to now, all filters could be cached using the `_cache` flag that could be
set to `true` or `false` and the default was set depending on the type of the
`filter`. For instance, `script` filters are not cached by default while
`terms` are. For some filters, the default is more complicated and eg. date
range filters are cached unless they use `now` in a non-rounded fashion.
This commit adds a 3rd option called `auto`, which becomes the default for
all filters. So for all filters a cache wrapper will be returned, and the
decision will be made at caching time, per-segment. Here is the default logic:
- if there is already a cache entry for this filter in the current segment,
then return the cache entry.
- else if the doc id set cannot iterate (eg. script filter) then do not cache.
- else if the doc id set is already cacheable and it has been used twice or
more in the last 1000 filters then cache it.
- else if the filter is costly (eg. multi-term) and has been used twice or more
in the last 1000 filters then cache it.
- else if the doc id set is not cacheable and it has been used 5 times or more
in the last 1000 filters, then load it into a cacheable set and cache it.
- else return the uncached set.
So for instance geo-distance filters and script filters are going to use this
new default and are not going to be cached because of their iterators.
Similarly, date range filters are going to use this default all the time, but
it is very unlikely that those that use `now` in a not rounded fashion will get
reused so in practice they won't be cached.
`terms`, `range`, ... filters produce cacheable doc id sets with good iterators
so they will be cached as soon as they have been used twice.
Filters that don't produce cacheable doc id sets such as the `term` filter will
need to be used 5 times before being cached. This ensures that we don't spend
CPU iterating over all documents matching such filters unless we have good
evidence of reuse.
One last interesting point about this change is that it also applies to compound
filters. So if you keep on repeating the same `bool` filter with the same
underlying clauses, it will be cached on its own while up to now it used to
never be cached by default.
`_cache: true` has been changed to only cache on large segments, in order to not
pollute the cache since small segments should not be the bottleneck anyway.
However `_cache: false` still has the same semantics.
Close#8449
Add a new ignore_idle_threads boolean option (default true) to
/_nodes/hot_threads, to filter out threads in known idle places like
waiting on a socket select or on pulling the next task from an empty
queue.
Closes#8985Closes#8908
This commit adds support for version and version_type to the Term Vectors API.
This could be useful in the following case whereby the user gets a document
and later wants to generate its TVs. With version, this would ensure that only
the TVs of that particular document are generated, and error out if the
document has been updated in between.
Closes#7480
This allows specifying the path an index will be at.
`index.data_path` is specified in the settings when creating an index,
and can not be dynamically changed.
An example request would look like:
POST /myindex
{
"settings": {
"number_of_shards": 2,
"data_path": "/tmp/myindex"
}
}
And would put data in /tmp/myindex/0/index/0 and /tmp/myindex/0/index/1
Since this can be used to write data to arbitrary locations on disk, it
requires enabling the `node.enable_custom_paths` setting in
elasticsearch.yml on all nodes.
This commit adds the logic necessary for supporting polygon vertex ordering per OGC standards. Exterior rings will be treated in ccw (right-handed rule) and interior rings will be treated in cw (left-handed rule). This feature change supports polygons that cross the dateline, and those that span the globe/map. The unit tests have been updated and corrected to test various situations. Greater test coverage will be provided in future commits.
Addresses #8672
This feature branch implements OGC compliance for Polygon/Multi-polygon. That is, vertex order for the exterior ring follows the right-hand rule (ccw) and all holes follow the left-hand rule (cw). While GeoJSON imposes no restrictions, a user that wants to specify a complex poly across the dateline must do so in compliance with the OGC spec, otherwise a polygon that spans the globe will be assumed.
Reference issue #8672
Fix orientation of outer and inner ring for polygon with holes. Updated unit tests. Bug exists in boundary condition on negative side of dateline.
This provides a fix to issue #7644. A new Stats object must be created, and
not a reference to the retrieved stats, before we can add stats to it.
Otherwise, we would keep on adding to the same object on subsequent calls to
IndicesStatsResponse#getPrimaries() or IndicesStatsResponse#getTotal().
Closes#7644 and #8950
The "compressed" format was removed, so this caused warnings in the log
like:
```
[WARN ][index.fielddata ] [node_0] [test] failed to find format
[compressed] for field [test-num], will use default
```
Now that we do not automatically call .cleanUp() when clearing the field
data cache, we need to call it after the cache clear in
RandomExceptionCircuitBreakerTests
The setting `mapping.date.round_ceil` (and the undocumented setting
`index.mapping.date.parse_upper_inclusive`) affect how date ranges using
`lte` are parsed. In #8556 the semantics of date rounding were
solidified, eliminating the need to have different parsing functions
whether the date is inclusive or exclusive.
This change removes these legacy settings and improves the tests
for the date math parser (now at 100% coverage!). It also removes the
unnecessary function `DateMathParser.parseTimeZone` for which
the existing `DateTimeZone.forID` handles all use cases.
Any user previously using these settings can refer to the changed
semantics and change their query accordingly. This is a breaking change
because even dates without datemath previously used the different
parsing functions depending on context.
closes#8598closes#8889
In cases of heavy contention, it's possible for more than 2 threads
to race to a circuit breaking exception.
Essentially this means that if we have 3 threads all trying to add 3 and
simultaneously cause a circuit breaking exception (due to retry), when
adjusting after circuit breaking we can "rewind" past what this test
expects the child breaker to be at.
This adds leeway into the check, where it's okay to be within
NUM_THREADS from the parentLimit, because each thread should only add 1
to the breaker at a time.
We only have a single gatweway since es 1.3. There is no need to keep all
these abstractsion and nested packages. We can fold most of it into simpler
structures.
IndexEngine was an abstraction where we had index-level engines (instead
of shard-level) that could store meta information about the index. It
was never actually used by Elasticsearch, and only there for plugins.
This removes it, because it is a confusing abstraction and not needed,
no plugins should be implementing their own IndexEngines.
When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.
Closes#8804Closes#8933
When we close a node all pending / active search requests need to be
cleared otherwise a node will wait up to 30 sec for shutdown sicne there
could be open scroll requests. This behavior was introduces in 1.5 such that
versions <= 1.4.x are not affected.
Closes#8940
Occasionally a the join thread successfully connected to a just closed node and which causes the subsequent join request to time out. It's default timeout 60s throws the test off when it waits for a cluster to form.
Calling cache.cleanUp() is kind of like calling System.gc(), meaning
that we should never have (non-test) things that rely on this
functionality.
For the field data and filter cache, we already have a periodic process
that runs this .cleanUp(), so there is no need to block index
closing/clearing on it. Instead, we can clean the field data cache in
InternalTestCluster before we check the circuit breaker.
This can help tests that time out because cleaning the cache is taking
too long
We have lots of boilerplate code that is unnecessarily abstracting
services ie InternalIndexShard and IndexShard or InternalIndexService and
IndexService. It's enough to have concrete classes for these core classes.
Closes#8904
This change adds a 'http.publish_port' setting to the HTTP module to configure
the port which HTTP clients should use when communicating with the node. This
is useful when running on a bridged network interface or when running behind
a proxy or firewall.
Closes#8807Closes#8137
Upgrades lucene to latest, and supports the BEST_COMPRESSION parameter
now supported (with backwards compatibility, etc) in Lucene.
This option uses deflate, tuned for highly compressible data.
index.codec::
The default value compresses stored data with LZ4 compression, but
this can be set to best_compression for a higher compression ratio,
at the expense of slower stored fields performance.
IMO its safest to implement as a named codec here, because ES already
has logic to handle this correctly, and because its unrealistic to have
a plethora of options to Lucene's default codec... we are practically
limited in Lucene to what we can support with back compat, so I don't
think we should overengineer this and add additional unnecessary plumbing.
See also:
https://issues.apache.org/jira/browse/LUCENE-5914https://issues.apache.org/jira/browse/LUCENE-6089https://issues.apache.org/jira/browse/LUCENE-6090https://issues.apache.org/jira/browse/LUCENE-6100Closes#8863
This fix adds better error handling for parsing multipoint, linestring, and polygon GeoJSONs. Current logic throws a NPE when parsing a multipoint, linestring, or polygon that does not comply with the GeoJSON specification. That is, if a user provides a single coordinate instead of an array of coordinates, or array of linestrings, the ShapeParser throws a NPE wrapped in a SearchParseException instead of a more useful error message.
Closes#8432
After the refactoring in #8784 some settings didn't get passed to the
actual engine and there exists a race if the settings are updated while
the engine is started such that the actual starting engine doesn't see
the latest settings. This commit fixes the concurrency issue as well as
adds tests to ensure the settings are reflected.
After upgrading shard might start relocating again. If there are no
replicas the cluster state of a node might not be up to data for
a few miliseconds and direct a search request to a node that does not
have the shard anymore. This result in the following test failures:
1> java.lang.AssertionError: Count is 99 but 101 was expected. Total shards: 13 Successful shards: 12 & 0 shard failures:
1> __randomizedtesting.SeedInfo.seed([1932F73B458703CA:6F4FAD3DAC55591C]:0)
1> [...org.junit.*]
1> org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount(ElasticsearchAssertions.java:184)
1> org.elasticsearch.bwcompat.BasicBackwardsCompatibilityTest.testIndexRollingUpgrade(BasicBackwardsCompatibilityTest.java:358)
Waiting for relocation finished should fix this.
Modifications to LoggingListener pushed with #8820 caused the original logger levels not to be reset after modifications, as the new state was saved for restore instead of the previous one.
Added unit tests for LoggingListener as well.
Closes#8845
Restrict use of java.io.File to 5 methods (excluded), but otherwise ban.
This is a prerequisite to do any mocking here.
I don't try to do any heavy cleanup on these tests, I am not familiar with them.
So this is mostly a rote straightforward conversion.
Closes#8836
This is a start to exposing memory stats improvements from Lucene 5.0.
This adds the following categories of Lucene index pieces to index stats:
* Terms
* Stored fields
* Term Vectors
* Norms
* Doc values
This commit add the engines reference to the store out of the actual
implementation into the hodler since the holder manages the actual lifcycle.
Engine internal references like per searcher or per recovery are kept inside
the actual implemenation since the have a different lifecycle.
Once the current engine is started you can only close it once. Once closed the engine cannot be started again. This commit adds a stop method which signals the engine to free it's resources but in a way that allows restarting.
This is done by introducing InternalEngineHolder which is a wrapper around InternalEngine. This allows to add the stop() method without adding complexity the engine implementation. InternalEngineHolder also serves an entry point for listeners (incoming and outgoing) to other ES components, which removes the needs add/remove them if the engine is stopped.
Closes#8784
Today we try to fetch a shard Id for a given IndexReader / LeafReader
by walking it's tree until the lucene internal SegmentReader and then
casting the directory into a StoreDirecotory. This class is fully internal
to Elasticsearch and should not be exposed outside of the Store.
This commit makes StoreDirectory a private inner class and adds dedicated
ElasticsearchDirectoryReader / ElasticserachLeafReader exposing a ShardId
getter to obtain information about the shard the index / segment belogs to.
These classes can be used to expose other segment specific information in
the future more easily.
When a scoring script returns not a number, the current message is confusing (IllegalArgumentException[docID must be >= 0 and < maxDoc=3 (got docID=2147483647)]). This commit adds the error message ScriptException[script score function returns a wrong score: NaN].
Closes#2426
Previously it was possible for the field data clearing in this test to
take too long, causing the test to time out.
This also switches to using `scaledRandomIntBetween` for the number of
fields.
the recovery diff can return file in the `different` category
since it's conservative if it can't tell if the files are the same.
Yet, the cleanup code only needs to ensure both ends of the recovery
are consistent. If we have a very old segments_N file no checksum is present
but for the delete files they might be such that the segments file passes
the consistency check but the .del file doesn't sicne it's in-fact the same
but this check was missing in the last commit.
Original indices are optional in ShardDeleteByQueryRequest only for backwards compatibility, see #7406. We can remove this in master since 2.0 will require a full cluster restart.
Closes#8777
This commit factors out the PID file creation from bootstrap and adds
tests for error conditions etc. We also can't rely on DELETE_ON_CLOSE
since it might not even write the file depending on the OS and JVM implementation.
This impl uses a shutdown hook to best-effort remove the pid file if it was written.
Closes#8771
This commit adds a very lightweight action to the transport
serivce that allows to fetch clustername and the discovery node
from a node. This is used by transport clients to test liveness of
a node without using the nodesinfo API which can be blocking if management
threadpools are busy.
Closes#8763
The bwc layer added with #7105 is not needed in master as a full cluster restart will be required, thus from 2.0 on the only supported action names are compliant to the defined conventions and don't need to be converted to the old format
Closes#8758
REST tests are being shuffled before their execution. To guarantee their repeatability given the seed, their order needs to be always the same before the shuffling happens.
Closes#8745
The conversion to the Path API doesn't work if the path points
to a file inside a JAR like a config. These path must be read
while the ZIP filesystem is opened which can't be guaranteed across
the board. This commit reverts back the relevant changes to java.net.URL
and adds a util method to read UTF-8 Encoded files from URLs correctly.
This commit cuts over all of core (not quite all tests) to java.nio.Path
It also adds the file class to the core forbidden APIs to prevent its usage.
This commit also resolves#8254 since we now consistently useing the NIO Path
API. The Changes in this commit allow for more information if IO operations fail
since the NIO API throws exceptions instead of boolean return values. The build-in
methods used in this commit are also more resillient to encodeing errors like
unmappable characters and throw exceptions if those chars are present in a file.
Closes#8254Closes#8666
Today we don't check if the recovery target has all the
files that we expect there after the recovery. This commit
adds aditional safety to ensure all files are present with the
correct checksums on recovery finalization.
Closes#8723
Today we wait 500ms before we retry a recovery if the target node is not ready.
This happens if the source starts the recovery before the target has
processed the clusterstate moving the target shard into the right state.
This can cause a 500ms delay each time it happens while the shard is ready
way earlier on the target node. This commit makes this delay configurable
to mainly speed up test processing and shard allocation in tests.
Inner hits allows to embed nested inner objects, children documents or the parent document that contributed to the matching of the returned search hit as inner hits, which would otherwise be hidden.
Closes#8153Closes#3022Closes#3152
Some QueryBuilders are missing or have a different naming than the other ones.
This patch is applied to branch 1.x and master (elasticsearch 1.5 and 2.0):
Added
-----
* `templateQuery(...)`
* `commonTermsQuery(...)`
* `queryStringQuery(...)`
* `simpleQueryStringQuery(...)`
Deprecated
----------
* `commonTerms(...)`
* `queryString(...)`
* `simpleQueryString(...)`