Commit Graph

6048 Commits

Author SHA1 Message Date
Alexander Reelsen dfc2e6381b CliTool: CheckFileCommand checks for file existence
As a CliTool command could potentially also delete files, the
CheckFileCommand needs to check if those files exist, before
trying to get permissions/owners/groups from that path.
2015-02-12 11:36:47 +01:00
Alexander Reelsen 9cd14a5c29 CliTool: Add command to warn on permission/owner change
When using the CLI tool infrastructure, a command can potentially write
a new file. In case it overwrites an existing one, you may want to ensure
that the permissions, the owner and the group are kept the same and do not
accidentally change when overwriting those files.

This PR introduces a command that allows you to execute this check per path.

It also adds a new testing dependency, namely jimfs, which allows you to create
in-memory filesystems with certain properties (like supporting or not posix permissions
on this filesystem), so that you can test those features, without executing
tests on a certain operating system.
2015-02-12 10:10:11 +01:00
Alexander Reelsen 30a9d97a71 FileSystemUtils: Only create backup copies if files differ
The FileSystemUtils class has a helper method to create files with
a .new suffix, in case the file, which should be created already
exists. If you install plugins and those have configuration files,
even without changes, you will end up with tons of .new files.

This commit checks the file size and sha-256 sum, and only if those
differ, a .new file is actually being created.
2015-02-12 10:08:14 +01:00
Igor Motov 9b75d3ef98 Test: wait for the cluster to recover in ClusterServiceTests before waiting for update state task results
On CI machines node recovery sometimes takes up to 2 seconds. When it happens an update cluster state task gets stuck behind the recovery and tests fail with 1 second timeout. This commit makes sure that we wait for recovery to complete before starting the clock.
2015-02-11 19:11:00 -05:00
Ryan Ernst f735baf306 Core: Remove ability to run optimize and upgrade async
This has been very trappy. Rather than continue to allow buggy behavior
of having upgrade/optimize requests sidestep the single shard per node
limits optimize is supposed to be subject to, this removes
the ability to run the upgrade/optimize async.

closes #9638
2015-02-11 11:30:27 -08:00
Martijn van Groningen 26b9d14443 Added 1.3.9-SNAPSHOT and 1.4.4-SNAPSHOT versions 2015-02-11 18:02:01 +01:00
Ryan Ernst 54c1813920 Tests: Add forgotten files for static bwc tests 2015-02-11 07:07:40 -08:00
Ryan Ernst 7328aa1c15 Tests: Add static bwc tests for new releases 1.3.8 and 1.4.3 2015-02-11 07:06:17 -08:00
Martijn van Groningen 173cfc14d6 Marked 1.4.3 as released 2015-02-11 15:58:39 +01:00
Simon Willnauer 764fda6420 [TEST] make sandbox settings explicit in Tests 2015-02-11 13:21:43 +01:00
Simon Willnauer 0b0cd1c46c [ENGINE] Fix deadlock problems when API flush and finish recovery happens concurrently
Unfortunately the lock order is important in the current flush codehe. We have to acquire the readlock fist otherwise
if we are flushing at the end of the recovery while holding the write lock we can deadlock if:
 * Thread 1: flushes via API and gets the flush lock but blocks on the readlock since Thread 2 has the writeLock
 * Thread 2: flushes at the end of the recovery holding the writeLock and blocks on the flushLock owned by Thread 2

This commit acquires the read lock first which would be done further down anyway for the time of the flush.
As a sideeffect we can now safely flush on calling close() while holding the writeLock.
2015-02-11 13:17:44 +01:00
Simon Willnauer 2f0d158692 [CORE] Consolidate index / shard deletion in IndicesService
Today the logic related to deleting an index is spread across several
classes which makes changes to this rather delicate part of the code-base
very difficult. This commit consolidates this logic into the IndicesService
and moves the handling of ack-ing the delete to the master entirely into
`IndicesClusterStateService`.
2015-02-11 09:05:20 +01:00
Simon Willnauer d3762d6427 [TEST] Make tests pass while flying 2015-02-11 09:05:18 +01:00
Igor Motov 00b5c6431c Test: testSortMinValueScript - use unmappedType to handle slow propagation of mapping 2015-02-10 19:59:51 -05:00
javanna 9c847db8af Percolate api: support encoded body as query string param consistently
The percolate api doesn't parse the encoded body provided as `source` query string parameter, when percolating an existing document. Fixed and added REST test that would have caught this since we randomly use GET + encoded `source` param instead of GET + request body in our java runner (the perl runner does the same too).

Closes #9628
2015-02-11 08:53:04 +11:00
Ryan Ernst b3474f6b25 Mappings: Remove ability to set path for _id and _routing on 2.0+ indexes
_id and _routing now no longer support the 'path' setting on indexes
created with 2.0.  Indexes created before 2.0 still support this
setting for backcompat.

closes #6730
2015-02-10 10:53:44 -08:00
Igor Motov 6544890e14 Internal: promptly cleanup updateTask timeout handler
Improve cleanup of updateTask timeout handlers. The timeout handlers should be removed as soon as a corresponding update task is processed. Otherwise, timeout handlers might keep old updateTasks and all objects that they are pointing to in memory for the duration of timeout (15 minutes by default).

Fixes #9621
2015-02-10 13:00:40 -05:00
Nicholas Knize 5b96595854 [GEO] Updating javadoc for XShapeCollection
XShapeCollection has an incorrect description left over from when the relate method was overridden. This one line commit corrects the description.
2015-02-10 07:23:00 -06:00
Simon Willnauer 401e6c6b06 [ENGINE] Factor out settings updates from Engine
The engine is already pretty complex, it's still confulated with
code that doesn't necessarily belong there. Updateing the settings from
the settings service can be done on the level above. This commit cleans up
the settings code in the engine and moves it to the IndexShard.
2015-02-10 12:59:12 +01:00
Nicholas Knize c9893ba0c2 [GEO] Correct bounding box logic for GeometryCollection type
"The OpenGIS Abstract Specification: An Object Model for Interoperable Geoprocessing" published by the OGC defines "The boundary of a geometric object is a set of geometric objects of the next lower dimension." The bounding box of a GeometryCollection is therefore the set of bounding rectangles derived from the geometric objects of the next lower dimension. This commit updates the computeBoundingBox and relate methods for the ShapeCollection base class to correctly determine the prefixTree detail level used in Lucene's FilterCellIterator.

closes #9360
2015-02-09 17:40:05 -06:00
Simon Willnauer de7461efd0 [ENGINE] Close Engine immediately if a tragic event strikes.
Until lately we couldn't close the engine in a tragic event due to
some the lock order and all it's complications. Now that the engine
is much more simplified in terms of having a single IndexWriter etc.
we don't necessarily need the write-lock on close anymore and can
easily just close and continue.
2015-02-09 23:21:53 +01:00
Lee Hinman 622d2c8e42 [CORE] Refactor InternalEngine into AbstractEngine and classes
InternalEngine contains a number of inner classes that it uses, however,
this makes the class overly large and hard to extend. In order to be
able to easily add other Engines (such as the ShadowEngine), these
helping methods have been extracted into an AbstractEngine class. The
classes that were previously in `InternalEngine` have been moved to
separate classes, which will allow for better unit testing as well.

None of the functionality of InternalEngine has been changed, this is
only refactoring.

Note that this is a change I originally made on my shadow-replica
branch, however it is easier to review piecemeal so I extracted it into
a separate PR.
2015-02-09 13:28:55 -07:00
Igor Motov dcc15a6460 Test: add wait for nodes to restorePersistentSettingsTest
Sometimes by the time update settings is called the second node is not in the cluster yet. As a result change of minimum master node settings to 2 is ignored making this test to fail.
2015-02-09 12:51:48 -05:00
Christoph Büscher d2f852a274 Aggregations: Add 'offset' option to date_histogram, replacing 'pre_offset' and 'post_offset'
Add offset option to 'date_histogram' replacing and simplifying the previous 'pre_offset' and 'post_offset' options.
This change is part of a larger clean up task for `date_histogram` from issue #9062.
2015-02-09 14:03:28 +01:00
Simon Willnauer 93df178469 Remove unneeded bwc code 2015-02-09 11:45:56 +01:00
Alexander Reelsen 98a2482825 Testing: Add test rule to repeat tests on binding exceptions
Due to the possibility of ports being already used when choosing a
random port, it makes sense to simply repeat a unit test upon a bind
exception.

This commit adds a junit rule, which does exactly this and does not
require you to change the test code and add loops.

Closes #9010
2015-02-09 11:18:00 +01:00
Simon Willnauer b3b1a11a64 Mapping update task back references already closed index shard
In the ShardRecoveryHandler we issue cluster update tasks to update the
mapping. The annonymous inner class backreferences the ShardRecoveryHandler
which holds a potentially large IndexShard object (which references buffers & caches etc)
If the queue of update tasks piles up and recoveries get cancled and/or shards are closed
the ShardRecoveryHandler can't be GCed. This commit moves the update task into a static
inner class to allos the GC to do its job.
2015-02-09 09:21:55 +01:00
Boaz Leskes 1167beed48 Test: testRelocationWithBusyClusterUpdateThread - listener should wait for replicas to be created 2015-02-07 10:54:21 +01:00
Boaz Leskes e684d7fde4 Logging: improve logging messages added in #9562 & #9562
Closes #9603
2015-02-06 22:26:16 +01:00
Robert Muir 66b5ed86f7 fix typo 2015-02-06 09:07:08 -05:00
Robert Muir 9c9b5c27d3 Upgrade to Lucene r1657571.
Closes #9587

Squashed commit of the following:

commit 23ac91dca4b949638ca1d3842fd6db2e00ee1d36
Author: Adrien Grand <jpountz@gmail.com>
Date:   Thu Feb 5 18:42:28 2015 +0100

    Do not compute scores if aggregations do not need it (like top_hits) or use a script (which might compute scores).

commit 51262fe2681c067337ca41ab88096ef80a2e8ebb
Author: Adrien Grand <jpountz@gmail.com>
Date:   Thu Feb 5 15:58:38 2015 +0100

    Fix more compile errors.

commit a074895d55b8b3c898d23f7f5334e564d5271a56
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Feb 5 09:31:22 2015 -0500

    fix a few more obvious ones

commit 399c41186cb3c9be70107f6c25b51fc4844f8fde
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Feb 5 09:28:32 2015 -0500

    fix some collectors and queries

commit 5f46c2f846c5020d5749233b71cbe66ae534ba51
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Feb 5 09:24:24 2015 -0500

    upgrade to lucene r1657571
2015-02-06 08:53:20 -05:00
Boaz Leskes 487ef80c35 Test: testRelocationWithBusyClusterUpdateThread - CountDownLatch.countDown should be await 2015-02-06 14:31:19 +01:00
Boaz Leskes 45ecb49a09 Test: testRelocationWithBusyClusterUpdateThread - use cluster state listener instead of assertBusy 2015-02-06 14:27:27 +01:00
Boaz Leskes 1b7920f202 Test: add document indexing back to testCancellationCleansTempFiles
It was lost during a merge conflict in 796aa5c3fe2424390a8edee604cd292b8afdf514
2015-02-06 12:54:30 +01:00
Boaz Leskes 23022227d4 Recovery: add a timeout to local mapping change check
After phase1 of recovery is completed, we check that all pending mapping changes have been sent to the master and processed by the other nodes. This is needed in order to make sure that the target node has the latest mapping (we just copied over the corresponding lucene files). To make sure we do not miss updates, we do so under a local cluster state update task. At the moment we don't have a timeout when waiting on the task to be completed. If the local node update thread is very busy, this may stall the recovery for too long. This commit adds a timeout (equal to `indices.recovery.internal_action_timeout`) and upgrade the task urgency to `IMMEDIATE`. If we fail to perform the check, we fail the recovery.

Closes #9575
2015-02-06 10:06:47 +01:00
Ryan Ernst c6968883a7 Mappings: Remove support for new indexes using path setting in
object/nested fields or index_name in any field

Backcompat is still here for indexes created before 2.0.

closes #6677
2015-02-05 12:44:43 -08:00
Boaz Leskes 9362ba200d Gateway: add logging around gateway shard allocation
This commit adds more logs around the gateway shard allocation. Any errors while reaching out to nodes to list the local shards are logged in `WARN`. Shard info loading time is logged under DEBUG. Also, we log a `WARN` message if an exception forces a full checksum check during reading the store metadata

Closes #9562
2015-02-05 18:06:33 +01:00
Boaz Leskes f7fe6b7461 Test: add awaitFix to testFullRollingRestart 2015-02-05 13:00:03 +01:00
Boaz Leskes 1b8c0056d3 Test: StaticIndexBackwardCompatibilityTest.unloadIndex should call assertAllFilesClosed
That method checks that files were release properly, but also clears a static map holding references to mock directories. Since we iterate on many indexes this created memory pressure.
2015-02-05 12:12:25 +01:00
Boaz Leskes 97ac2f5144 Test: add awaitFix to SearchWithRandomExceptionsTests
disabling this until further discussion. Recent failures probably relate to #9211 & #8720 (+ friends)
2015-02-05 11:41:02 +01:00
Masaru Hasegawa b4f7d26723 Fielddata: Change threshold value of fielddata.filter.frequency.max/min
Make it consider 1.0 as 100% instead of aboslute count 1.

Closes: #9327
2015-02-05 13:27:42 +09:00
Reuben Sutton 2436552840 Raise an exception on an array of values being sent as the factor for a field_value_factor query
closes #7408
2015-02-04 14:17:09 -07:00
Simon Willnauer 4732ef3484 [ENGINE] Remove FlushType and make resources final in InternalEngine
This commit removes the FlushType entirely and replaces it in the most places with
a simple `Engine#flush()` call. Flushing without committing the translog is now
entirely private to the engine and is only called in one place.
2015-02-04 18:42:58 +01:00
Simon Willnauer 0c5599e1d1 [ENGINE] Remove full flush / FlushType.NEW_WRITER
The `full` option and `FlushType.NEW_WRITER` only exists to allow
realtime changes to two settings (`index.codec` and `index.concurrency`).
Those settings are very expert and don't really need to be updateable
in realtime.
2015-02-04 17:38:05 +01:00
Boaz Leskes 7beaaaef62 Discovery: publishing timeout to log at WARN and indicate pending nodes
When the master publishes a new cluster state it waits (by default) for up to 30s for all nodes to respond. If not it continues to process other pending tasks. At the moment, this timeout is logged under DEBUG but it typically represent a serious issue with one or more of the nodes. We should log it in WARN and give the nodes that failed to respond in a timefly fashion

Closes #9551
2015-02-04 16:39:01 +01:00
Adrien Grand 8b76cd76f9 Internal: Avoid unnecessary utf8 conversion when creating ScriptDocValues for a string field.
This regression was introduced in #6908: the conversion from RandomAccessOrds
to SortedBinaryDocValues goes through Strings while both impls actually work
on BytesRef, so the SortedBinaryDocValues instance could directly return the
BytesRefs returned by the RandomAccessOrds.

Close #9306
2015-02-04 09:53:34 +01:00
javanna 74c7b5a197 Internal: add AliasesRequest interface to mark requests that manage aliases
We currently have the IndicesRequest interface to mark indices related requests and be able to retrieve the indices they relate to in a generic way. This commit introduces a similar abstraction for requests that manage aliases, to be able to retrieve/replace the aliases they relate to.

Also, IndicesAliasesRequest becomes a CompositeIndicesRequest, as it allows to perform multiple operations (e.g. add/remote multiple aliases). Each single operation (AliasActions) implements now the newly introduced AliasesRequest.

AliasesRequest is also implemented by GetAliasesRequest, which allows to retrieve aliases information.

Closes #9460
2015-02-04 07:59:33 +01:00
Boaz Leskes 896e8657ea Discovery: check index uuid when merging incoming cluster state into local
In big deployment ClusterState can be large. To make sure we keep reusing objects that were promoted to the Old Gen, ZenDiscovery has an optimization where it tries to reuse existing IndexMetaData object (containing among other things the mappings) from the current cluster state if they didn't change. The comparison currently uses the index name and the metadata version. This is however not enough and we should also check the index uuid. In extreme cases, where cluster state processing is slow and the index in question is deleted and recreated and these operations are batch processed together, we can use the wrong meta data if the version is also identical. This can happen if people create the index with all meta data predefined and no settings were changed.

Closes #9489
Closes #9541
2015-02-03 21:36:05 +01:00
Adrien Grand 6cdde31e64 Search: Reuse Lucene's MultiCollector.
We could reuse Lucene's MultiCollector instead of implementing our own.

Close #9549
2015-02-03 18:12:15 +01:00
Adrien Grand 13b64cc362 Aggs: Make the nested aggregation call sub aggregators with doc IDs in order.
Close #9547
2015-02-03 16:51:36 +01:00
javanna ebb7ecb00e [TEST] RestClient to use a non static pooling connection manager
When closing an instance of RestClient, the connection manager gets shutdown, which makes it not usable anymore. If that is static, like it is now, no RestClient will work anymore from that moment on. Each instance of RestClient should have its own instance of connection manager
2015-02-03 16:46:54 +01:00
Adrien Grand 8540a863aa Search: Avoid calling DocIdSets.toSafeBits.
This method is heavy as it builds a bitset out of a DocIdSet in order to be
able to provide random-access. Now that Lucene has removed out-of-order scoring
true random-access is very rarely needed and we could instead return an Bits
instance that wraps the iterator. Ideally, we would use the DISI API directly
but I have to admit that the Bits API is more friendly.

Close #9546
2015-02-03 16:16:19 +01:00
javanna e5b174ff77 [TEST] Move SimpleNettyTransportTests to expected exception
Replaced try catch with expected exception, since no additional check was done on the exception thrown.
2015-02-03 15:51:51 +01:00
javanna 338766fd4d [TEST] Remove needless ClusterScope annotation from NettyTransportMultiPortTests
NettyTransportMultiPortTests is not an integration test, it doesn't rely on the test cluster thus the ClusterScope annotation doesn't have any effect.
2015-02-03 15:51:44 +01:00
javanna 0e67dda15d [TEST] Make sure that match assertion throws error if run against an object
We had a REST test that relied on matching a json response against a regex. It worked but the match wasn't done against the actual json object, but its java map representation converted into a string by calling `toString`. Since all other clients test runners don't work in this case, as they try to match a json object against a regex, we should do the same and prevent it from working.
2015-02-03 10:18:18 +01:00
javanna dfe67da013 [TEST] support stashed values within property names in our REST tests
Closes #9533
2015-02-03 10:17:50 +01:00
Boaz Leskes 4342237acf Test: reduce load in RecoveryWhileUnderLoadTests 2015-02-03 09:32:42 +01:00
Robert Muir 027730006b core: add 'checksum' option for index.shard.check_on_startup
The current "checkindex" on startup is very very expensive. This is
like running one of the old school hard drive diagnostic checkers and
usually not a good idea.

But we can do a CRC32 verification of files. We don't even need to
open an indexreader to do this, its much more lightweight.

This option (as well as the existing true/false) are randomized in
tests to find problems.

Also fix bug where use of the current option would always leak
an indexwriter lock.

Closes #9183
2015-02-03 00:10:08 -05:00
Ryan Ernst 6079d88d43 Mappings: Remove type prefix support from field names in queries
This is the first part of #8872.
2015-02-02 13:10:56 -08:00
Lee Hinman 0f405e9710 Merge branch 'pr/8795' 2015-02-02 11:49:45 -07:00
Michael McCandless e29cf903c8 Core: upgrade to Lucene snapshot r1656366
* IndexWriter deadlock and DV update concurrency fix
  * BytesRef reuse bug with SortedSetDVTermsEnum
  * Int overflow skip data corruption bug
  * Compound file API cleanups
  * IndexWriter doesn't accept per-doc Analyzer anymore

Closes #9524
2015-02-02 13:37:45 -05:00
Christoph Büscher 44193e7ba5 Aggregations: Add 'offset' option to histogram aggregation
Histogram aggregation supports an 'offset' option to move bucket boundaries.
In a histogram with buckets of size X these can be moved from 0, X, 2X, 3X,...
by an offset value of Y to Y, X+Y, 2X+Y, 3X+Y... by using the 'offset' option.
The previous 'pre_offset' and 'post_offset' options are removed in favour of
the simplified 'offset' option.

Closes #9417
Closes #9505
2015-02-02 18:23:01 +01:00
Alexander Reelsen a55476bf70 Tests: Ensure no use of potentially resolving internal ips 2015-02-02 09:45:42 +01:00
Boaz Leskes 79c8621a47 Test: add trace logging to testNodeFailuresAreProcessedOnce 2015-02-02 09:32:53 +01:00
Alexander Reelsen 59f8c0951a Netty Transport: Add profiles to transport infos
Until now, there was no possibility to expose infos about configured
transport profiles. This commit adds the ability to expose those
information in the TransportInfo class.

The channel was well as the netty pipeline handler now also contain
the profile they were configured for, as this information cannot be
extracted elsewhere.

In addition, each profile now can set its own publish host and port,
which might be needed in case of portforwarding or using docker.

Closes #9134
2015-02-02 08:17:55 +01:00
Martijn van Groningen 3ce05b6919 inner hits: Fix bug that resolves parent docs properly as inner hit when inner hit is defined on has_parent query. 2015-02-01 22:29:21 +01:00
Martijn van Groningen d038f372d4 cleanup: Move catching of IOException higher op the stack to reduce the number of try-catch clauses. 2015-02-01 22:27:00 +01:00
Lee Hinman 25f944009c Remove unneeded null checks from IndicesClusterStateService 2015-02-01 12:13:57 -07:00
Simon Willnauer 42bb5deca2 Revert "[ENGINE] Fail engine if Lucene commit fails"
This reverts commit dda7242848.
2015-01-31 23:48:34 +01:00
Simon Willnauer dda7242848 [ENGINE] Fail engine if Lucene commit fails
This is similar to refresh, if we fail to commit the data we have to fail the
engine since in-ram data is likely discarded. Yet, it's still in translog and might
be recoverable when the node is restarted but we have to treat the engine as failed.
2015-01-31 16:45:38 +01:00
Lee Hinman 9557625ae7 Disallow method pointer expressions in Groovy scripting 2015-01-30 15:55:19 -07:00
Lee Hinman 9fe84062a1 Add `beforeIndexAddedToCluster` callback
This callback is executed only once, on the master node during an
index's creation. An exception thrown during this listener will cancel
the index creation.

This also adds checks in `IndicesClusterStateService` for the
indexService being null as well as if the `indicesService.createIndex`
throws an exception on data nodes after an index has already been
created.
2015-01-30 15:25:58 -07:00
Adrien Grand b2010f788d [TESTS] IndicesQueryCacheTests: Ensure that shards are searchable before starting to query them. 2015-01-30 23:22:27 +01:00
Boaz Leskes eabc3cde98 Recovery: update access time of ongoing recoveries
#8720 introduced a timeout mechanism for ongoing recoveries, based on a last access time variable. In the many iterations on that PR the update of the access time was lost. This adds it back, including a test that should have been there in the first place.

Closes #9506
2015-01-30 21:06:28 +01:00
Adrien Grand 00d54fabb2 Search: Remove query-cache serialization optimization.
The query-cache has an optimization to not deserialize the bytes at the shard
level. However this is a bit fragile since it assumes that serialized streams
can be concatenanted (which is not the case with shared strings) and also does
not update the QueryResult object that is held by the SearchContext. So you
need to make sure to use the right one.

With this change, the query cache just deserializes bytes into the QueryResult
object from the context.

Close #9500
2015-01-30 20:02:18 +01:00
Simon Willnauer fb377d48bd Remove dead code 2015-01-30 13:52:26 +01:00
Simon Willnauer 380fcd1d02 Reset MergePolicProvider settings only if the value actually changed
Due to some unreleased refactorings we lost the persitence of
a perviously set values in MergePolicyProvider. This commit adds this
back and adds a simple unittest.

Closes #8890
2015-01-30 13:24:08 +01:00
Ryan Ernst 1ebc95ee28 Tests: Add type-unrestricted version of field mapper getter to SearchContext.
This fixes an NPE when using TestSearchContext in SignificanceHeuristicTests.
2015-01-29 13:42:07 -08:00
Michael McCandless ecc8b702d3 also remove force option from logger.trace 2015-01-29 16:18:21 -05:00
Ryan Ernst 4e0e5e7328 Aggs: Remove limitation on field access within aggs to the types
provided in the search

Currently, doing a field lookup within a terms agg will restrict the
fields available to those within the types passed into the search
request.  However, when doing sub aggs within a children agg, the
fields available should not be restricted to those of the search.

This change makes the field lookup use the index level mapper service.
2015-01-29 10:49:38 -08:00
Simon Willnauer c0fa60eb26 Remove HandlesStreamInput/Output
The optimization we do in the HandlesStreamInput / Output
adds a lot of complexity with a rather unknown benefit. It tries
to compress commonly used strings and write ids instead. This
should rather be done on a lower level if at all necessary for
the small message we send over the network.
2015-01-29 17:43:32 +01:00
Simon Willnauer 1d77c3af82 Fix compilation 2015-01-29 17:41:53 +01:00
Simon Willnauer 03f1fcc85e [ENGINE] Remove dirty flag and force boolean for refresh
Today we have a dirty flag indicating that a refresh must
be executed. We also allow users to bypass this by setting
a force=true boolean on the refresh request / command. All
these flags are unneeded since the SearcherManager has all
the information to do the right thing if it's dirty or not.
2015-01-29 17:30:00 +01:00
Simon Willnauer b275e917b7 [CACHE] Use a smaller expected size when serializing query results
BytesStreamOutput allows to pass the expected size but by default uses
BigArrays.PAGE_SIZE_IN_BYTES which is 16k. A common cached result ie.
a date histogram with 3 buckets is ~100byte so 16k might be very wasteful
since we don't shrink to the actual size once we are done serializing.
By passing 512 as the expected size we will resize the byte array in the stream
slowly until we hit the page size and don't waste too much memory for small query
results.
2015-01-29 17:27:08 +01:00
Britta Weber 0a07ce8916 core: disable auto gen id optimization
This pr removes the optimization for auto generated ids.
Previously, when ids were auto generated by elasticsearch then there was no
check to see if a document with same id already existed and instead the new
document was only appended. However, due to lucene improvements this
optimization does not add much value. In addition, under rare circumstances it might
cause duplicate documents:

When an indexing request is retried (due to connect lost, node closed etc),
then a flag 'canHaveDuplicates' is set to true for the indexing request
that is send a second time. This was to make sure that even
when an indexing request for a document with autogenerated id comes in
we do not have to update unless this flag is set and instead only append.

However, it might happen that for a retry or for the replication the
indexing request that has the canHaveDuplicates set to true (the retried request) arrives
at the destination before the original request that does have it set false.
In this case both request add a document and we have a duplicated a document.
This commit adds a workaround: remove the optimization for auto
generated ids and always update the document.
The asumtion is that this will not slow down indexing more than 10 percent,
see: http://benchmarks.elasticsearch.org/

closes #8788
closes #9468
2015-01-29 16:26:04 +01:00
Simon Willnauer 15a766084d [CACHE] Use correct number of bytes in query cache accounting
today we use the length of the BytesReference which is misleading since
the reference is paged such that the length != ramBytesUsed. This can lead
to a way higher memory consuption than expected if query results are tiny
since each query result requires at least 16kb. Yet, we should rethink this
strategy for query results that are very small ie. less than 20% of the ramBytesUsed
but this commit first tries to make the acocunting correct.
2015-01-29 10:59:36 +01:00
Simon Willnauer 4917121de2 Remove Unused code and remove unnecessary abstraction
HashedBytesArray is not used anymore and Releable makes only sense on
Paged implementation such that the marker interface is unneeded.
2015-01-29 09:51:14 +01:00
Lee Hinman 86e52c30a1 Make `script.groovy.sandbox.method_blacklist_patch` truly append-only
Additionally, this setting can be specified in elasticsearch.yml if
desired, to pre-populate the list of methods to be added to the default
blacklist.

When making a change to this setting dynamically, the entire blacklist
is logged as well.
2015-01-28 17:09:27 -07:00
Ryan Ernst afcedb94ed Mappings: Remove `index_analyzer` setting to simplify analyzer logic
The `analyzer` setting is now the base setting, and `search_analyzer`
is simply an override of the search time analyzer.  When setting
`search_analyzer`, `analyzer` must be set.

closes #9371
2015-01-28 13:43:15 -08:00
Lee Hinman cc461a837f Avoid NullPointerException if optional Groovy jar is removed 2015-01-28 13:49:50 -07:00
Lee Hinman c610524392 Make groovy sandbox method blacklist dynamically additive
Using the `script.groovy.sandbox.method_blacklist_patch` setting, the
blacklist can be dynamically *added* to by specifying a comma-separated
list of methods (for example, "toString,size" would add .toString and
.size to the blacklist).

When the `script.groovy.sandbox.method_blacklist_patch` setting is
changed, the script cache is cleared to force new scripts to be
recompiled. Additionally the on-disk cache is cleared so that scripts in
the `config/scripts` directory are re-compiled as well.

This also fixes an issue where script engines were injected more than
once, which can cause multiple instances of the script engine per node.
2015-01-28 12:26:09 -07:00
Zachary Tong a4eb1d5505 Aggregations: Add standard deviation bounds to extended_stats
Extended_stats now displays the upper and lower bounds on standard deviations (e.g. avg +/- std).
Default is to show 2 std above/below, but can be changed using the `sigma` parameter.
Accepts non-negative doubles

Closes #9356
2015-01-28 11:47:20 -05:00
gmarz 3e4fc2659d Nodes Stats: Fix open file descriptors count on Windows
Closes #1563
2015-01-28 10:30:02 -05:00
Nicholas Knize 9622f78fe6 Revert "[GEO] Update GeoPolygonFilter to handle ambiguous polygons"
This reverts commit 06667c6aa8 which introduces an undesireable dependency on JTS.
2015-01-28 08:03:26 -06:00
Colin Goodheart-Smithe 29c24d75e7 Aggregations: Unify histogram implementations
This change makes InternalHistogram the only InternalAggregation used by the Histogram Aggregator. There is still a separate Bucket implementation and Factory implementation. All buckets are created through the factory passed into the InternalHistogram meaning and the correct factory implementation is serialised as part of the aggregation to make sure the correct bucket types are always generate.

This is needed by the Transformers (namely the derivative transformer) to allow it to generate buckets of the right type without having to know what the underlying bucket implementation is.
2015-01-28 10:45:28 +00:00
Boaz Leskes 1695f76f68 Test: testOldIndexes should disable merging
It verifies some segments need to be upgraded, but if they are merged away, there are upgraded implicitly
2015-01-28 11:34:58 +01:00
Boaz Leskes 22a576d5ba Recovery: flush immediately after a remote recovery finishes (unless there are ongoing ones)
To properly replicate, we currently stop flushing during recovery so we can repay the translog once copying files are done. Once recovery is done, the translog will be flushed by a background thread that, by default, kicks in every 5s. In case of a recovery failure and a quick re-assignment of a new shard copy, we may fail to flush before starting a new recovery, causing it to deal with potentially even longer translog. This commit makes sure we flush immediately when the ongoing recovery count goes to 0.

I also added a simple recovery benchmark.

Closes #9439
2015-01-28 09:14:23 +01:00
Igor Motov 13ef7d73b9 Snapshot/Restore: better handling of index deletion during snapshot
If an index is deleted during initial state of the snapshot operation, the entire snapshot can fail with NPE. This commit improves handling of this situation and allows snapshot to continue if partial snapshots are allowed.

Closes #9024
2015-01-27 21:06:29 -05:00
Boaz Leskes 3512860956 Test: always use replicas in testClusterInfoServiceInformationClearOnError
It assume the local node always has a shard
2015-01-28 00:23:03 +01:00
Nicholas Knize 06667c6aa8 [GEO] Update GeoPolygonFilter to handle ambiguous polygons
PR #8672 addresses ambiguous polygons - those that either cross the dateline or span the map - by complying with the OGC standard right-hand rule. Since ```GeoPolygonFilter``` is self contained logic, the fix in #8672 did not address the issue for the ```GeoPolygonFilter```. This was identified in issue #5968

This fixes the ambiguous polygon issue in ```GeoPolygonFilter``` by moving the dateline crossing code from ```ShapeBuilder``` to ```GeoUtils``` and reusing the logic inside the ```pointInPolygon``` method.  Unit tests are added to ensure support for coordinates specified in either standard lat/lon or great-circle coordinate systems.

closes #5968
closes #9304
2015-01-27 15:45:05 -06:00