Commit Graph

4485 Commits

Author SHA1 Message Date
Simon Willnauer fd1d02fd07 [TEST] Prevent usage of System Properties in the InternalTestCluster
All settings should be passes as settings and the enviroment should not
influence the test cluster settings. The settings we care about ie.
`es.node.mode` and `es.logger.level` should be passed via settings.
This allows tests to override these settings if they for instance need
`network` transport to operate at all.

Closes #6663
2014-07-01 18:05:44 +02:00
Simon Willnauer c9b7bec3cc [INDEX] Ensure `index.version.created` is consistent
Today `index.version.created` depends on the version of the master
node in the cluster. This is potentially causing new features to be
expected on shards that didn't exist when the index was created.
There is no notion of `where was the shard allocated first` such that
`index.version.created` can't be reliably used as a feature flag.

With this change the `index.version.created` can be reliably used to
determin the smallest nodes version at the point in time when the index
was created. This means we can safely use certain features that would
for instance require reindeing and / or would not work if not the
entire index (all shards and segments) have been created with a certain
version or newer.

Closes #6660
2014-07-01 18:00:13 +02:00
Igor Motov f14edefc9d [TEST] Fix possible race condition in checksum name generator
When three threads are trying to write checksums at the same time, it's possible for all three threads to obtain the same checksum file name A. Then the first thread enters the synchronized section, creates the file with name A and exits. The second thread enters the synchronized section, checks that A exists, creates file A+1 and exits the critical section. Then it proceeds to clean up  and deletes all checksum files including A. If it happens before the third thread enters the synchronized section, it's possible for the third thread to check for A and since it no longer exists create the checksum file A the second time, which triggers "file _checksums-XXXXXXXXXXXXX was already written to" exception in MockDirectoryWrapper and fails recovery.
2014-07-01 09:51:42 -04:00
Martijn van Groningen ec74a7e76f Core: Prevent non segment readers from entering the filter cache and the field data caches.
Percolator: Never cache filters and field data in percolator for the percolator query parsing part.

Closes #6553
2014-07-01 15:05:31 +02:00
Adrien Grand 2ed73bb4f7 [TEST] Improve reproducibility of mappings propagation delays related issues. 2014-07-01 13:31:54 +02:00
Martijn van Groningen 85bea22bc8 Core: The ignore_unavailable=true setting also ignores indices that are closed.
Closes #6471
Closes #6475
2014-07-01 13:09:24 +02:00
Shay Banon f0817c31d9 start mapping service earlier to be available for recovery 2014-07-01 11:39:26 +02:00
Adrien Grand 6a1e7b6ad0 [TEST] Fix ExistsMissingTests failures.
They were due to a combination of mappings propagation delays and the behavior
of MapperService.smartName(String) so mappings are now configured up-front.
2014-07-01 11:25:37 +02:00
Igor Motov 8a20bfcdd5 [TEST] Turn off double write check for restore 2014-06-30 23:12:29 -04:00
Igor Motov 2149a9403d Improve deletion of corrupted snapshots
Makes it possible to delete snapshots that are missing some of the metadata files. This can happen if snapshot creation failed because repository drive ran out of disk space.

Closes #6383
2014-06-30 21:03:46 -04:00
Igor Motov 1425e28639 Add ability to restore partial snapshots
Closes #5742
2014-06-30 20:18:02 -04:00
Shay Banon 46f1e30fa9 Recovery from local gateway should re-introduce new mappings
The delayed mapping intro tests exposed a bug where if a new mapping is introduced, yet not updated on the master, and a full restart occurs, reply of the transaction log will not cause the new mapping to be re-introduced.
closes #6659

add comment on the method
2014-07-01 01:53:44 +02:00
Shay Banon e8519084c9 [TEST] properly wait for mapping on master node
add helper method to do so, by not assuming that the mapping will exists right away by waiting for green or refreshing...
2014-06-30 23:11:23 +02:00
Shay Banon 5c5e13abce [TEST] properly wait for mappings when needed 2014-06-30 22:32:43 +02:00
Shay Banon 5273410be6 Update mapping on master in async manner
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).

Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
closes #6648

allow to change the additional time window dynamically

better sorting on mappers when refreshing source
also, no need to call nodes info in test, we already have the node names

clean calls to mapping update to provide doc mapper and UUID always
also use the internal cluster support method to get the list of nodes an index is on

reverse the order to pick the latest change first

remove unused field

and fix constructor param

move to start/stop on mapping update action

randomize INDICES_MAPPING_ADDITIONAL_MAPPING_CHANGE_TIME
2014-06-30 22:08:39 +02:00
Lee Hinman 761ef5d9f1 Wrap groovy script exceptions in a serializable Exception object
Fixes #6598
2014-06-30 16:50:34 +02:00
Shay Banon c9ff9a6930 [TEST] Randomize netty worker and connection parameters
Try and push our system to a state where there is only a single worker, trying to expose potential deadlocks when we by mistake execute blocking operations on the worker thread
closes #6635
2014-06-30 14:57:36 +02:00
Boaz Leskes c907ce325e [Test] make recovery slow down in rerouteRecoveryTest aware of index size 2014-06-30 10:54:45 +02:00
Boaz Leskes a72c167be2 [Test] improved recovery slow down in rerouteRecoveryTest
only change recovery throttling to slow down recoveries. The recovery file chunk size updates are not picked up by ongoing recoveries. That cause the recovery to take too long even after the default settings are restored.

Also - change document creation to reuse field names in order to speed up the test.
2014-06-29 14:37:12 +02:00
Boaz Leskes bbc82e2821 [Test] add awaitFix to rerouteRecoveryTest 2014-06-29 09:55:03 +02:00
Boaz Leskes ca194594b3 Recovery API should also report ongoing relocation recoveries
We currently only report relocation related recoveries after they are done.

Closes #6585
2014-06-28 21:27:15 +02:00
Boaz Leskes 155620ed8e [Test] testRelocationWhileRefreshing should wait for the first shard to be started 2014-06-28 10:41:06 +02:00
Simon Willnauer 9ce66cb167 [TEST] Testcase for #6639 2014-06-28 09:12:25 +02:00
Simon Willnauer b2685f132a [TEST] Change es.node.mode default for tests to `local`
In order to speed up test execution we should run in local mode by
default. CI builds will still use network builds all the time.

Closes #6624
2014-06-27 11:57:34 +02:00
Simon Willnauer f0cfdc444f [STORE] Wrap RateLimiter rather than copy RateLimitedIndexOutput
We clone RateLimitedIndexOutput from lucene just to collect pausing
statistics we can do this in a more straight forward way in a delegating
RateLimiter.

Closes #6625
2014-06-27 11:35:13 +02:00
Shay Banon 79af3228ad Thread pool rejection status code should be 429
Thread rejection should return too many requests status code, and not 503, which is used to also show that the cluster is not available
 relates to #6627, but only for rejections for now
closes #6629
2014-06-27 11:15:16 +02:00
Shay Banon 4129bb6a4f Make sure we don't reuse arrays when sending and error back
We want to make sure recycling will not fail for any reason while trying to send a response back that is caused by a failure, for example, if we have circuit breaker on it (at one point), sending an error back will not be affected by it.
closes #6631
2014-06-27 11:12:35 +02:00
Shay Banon e559295228 [TEST] when the test fail, have the exception message as the reason
the test failed but couldn't repro (yet), at the very least, make sure we have the exception message as the reason, can help to track down the failure itself when it happens again
2014-06-27 09:16:51 +02:00
Simon Willnauer f7da6da73a [TEST] suppress sysout checks since CI runs with debug enabled 2014-06-26 19:10:20 +02:00
Robert Muir b55ad98d73 Upgrade to Lucene 4.9 (closes #6623) 2014-06-26 08:18:59 -04:00
Lee Hinman b43b56a6a8 Add a transformer to translate constant BigDecimal to double 2014-06-26 10:52:28 +02:00
Lee Hinman 50bb274efa Remove MVEL as a built-in scripting language 2014-06-26 10:33:28 +02:00
Boaz Leskes 2c2783875e Be more diligent about ThreadPools having names
Add a name parameter to what was the empty ThreadPool constructor. Assert if the the ThreadPool's setting doesn't contain a name.
2014-06-26 10:01:22 +02:00
Clinton Gormley 30c80319c0 Match query with operator and, cutoff_frequency and stacked tokens
If the match query with cutoff_frequency encounters stacked tokens,
like synonyms in the same position, it returns a boolean query instead
of a common terms query.  However, if the original operator was set
to "and", it was ignoring that and resetting the operator to "or".

In fact, if operator is "and" then there is little benefit in using
a common terms query as a must query is already
executed efficiently.
2014-06-25 17:53:43 +02:00
Andrew Raines 534b07a3fb [TEST] Add assertion failure messages 2014-06-25 16:22:20 +02:00
Lee Hinman 5c6d28240f Switch to Groovy as the default scripting language
This is a breaking change to move from MVEL -> Groovy
2014-06-25 12:15:12 +02:00
Lee Hinman 47856ec4cd Add sandboxing for GString-based method invocation 2014-06-25 12:09:32 +02:00
Shay Banon 342563a864 [LOG] better log message 2014-06-25 11:01:20 +02:00
Alexander Reelsen fd9744968f Internal: Made base64 decode parsing to detect more errors
The base64 did not completely check, if there were other characters
after the equals `=` sign. This PR adds some small additional checks.

Closes #6334
2014-06-24 13:01:11 +02:00
Martijn van Groningen e12025f749 [TEST] Improved logging for replica operation failures 2014-06-23 09:28:41 +02:00
Boaz Leskes 3d6d2e700a [Test] testGetFields_complexField should wait for a green cluster
Waiting for ongoing recoveries was not good enough as it can run before the  master finishing processing the started events of primary shards, causing the recovery response to be erroneously empty
2014-06-21 20:15:13 +02:00
Shay Banon 0e83615496 [Test] Use no failures, shard might not have been initialize yet 2014-06-21 12:43:14 +02:00
Boaz Leskes 08ca51d7b6 [TEST] fix a NPE in verifyThreadNames which may happen if thread finishes during sampling 2014-06-21 10:31:45 +02:00
Shay Banon 0d66d3779e Fix optional default script loading
Groovy is optional as a dependency in the classpath, make sure we properly detect when its not at the right time to disable it
closes #6582
2014-06-21 00:27:15 +02:00
Martijn van Groningen 812972ab0e [TEST] Move the waiting for pending tasks to helper methods and let the percolator and update mapping test use these helper methods. 2014-06-20 23:44:33 +02:00
Martijn van Groningen 11251bca92 [TEST] Verify that all pending tasks are rely executed on *all* nodes. 2014-06-20 23:12:52 +02:00
Martijn van Groningen 73e4a9b3f7 Fixed NPE in recovery api by serializing the recovery type in StartRecoveryRequest.
Closes #6190
2014-06-20 22:09:46 +02:00
javanna f16451a446 Refactored AckedClusterStateUpdateTask & co. to remove code repetitions in subclasses
Made AckedClusterStateUpdateTask an abstract class instead of an interface, which contains the common methods.
Also introduced the AckedRequest interface to mark both AcknowledgedRequest & ClusterStateUpdateRequest so that the different ways of updating the cluster state (with or without a MetaData*Service) can share the same code.
Removed ClusterStateUpdateListener as we can just use its base class ActionListener instead.

Closes #6559
2014-06-20 20:14:40 +02:00
Lee Hinman 2708e453ac Re-shade MVEL as a dependency 2014-06-20 11:28:50 +02:00
Lee Hinman c70f6d0171 Add Groovy as a scripting language, add sandboxing for Groovy
Sandboxes the groovy scripting language with multiple configurable
whitelists:

`script.groovy.sandbox.receiver_whitelist`: comma-separated list of string
classes for objects that may have methods invoked.
`script.groovy.sandbox.package_whitelist`: comma-separated list of
packages under which new objects may be constructed.
`script.groovy.sandbox.class_whitelist` comma-separated list of classes
that are allowed to be constructed.

As well as a method blacklist:

`script.groovy.sandbox.method_blacklist`: comma-separated list of
methods that are never allowed to be invoked, regardless of target
object.

The sandbox can be entirely disabled by setting:

`script.groovy.sandbox.enabled: false`
2014-06-20 10:20:16 +02:00
javanna 12fd6ce98c REST api: made it possible to copy the REST headers from REST requests to the corresponding TransportRequest(s)
Introduced the use of the FilterClient in all of the REST actions, which delegates all of the operations to the internal Client, but makes sure that the headers are properly copied if needed from REST requests to TransportRequest(s) when it comes to executing them.

Added new abstract handleRequest method to BaseRestHandler with additional Client argument and made private the client instance member (was protected before) to force the use of the client received as argument.

The list of headers to be copied over is by default empty but can be extended via plugins.

Closes #6513
2014-06-19 18:45:21 +02:00
javanna 8f8b2d7979 Client intermediate interface removal follow-up
After #6517 we ended up registering all of the actions (included admin ones) to the NodeClient.
Made sure that only the proper type of Action instances are registered to each client type.
Also fixed some compiler warnings: unused members, imports and non matching generic types.

Closes #6563
2014-06-19 17:55:25 +02:00
Adrien Grand 8ccfca3a2f Fielddata: Remove BytesValues.WithOrdinals.currentOrd and copyShared.
These methods don't exist in Lucene's sorted set doc values.

Relates to #6524
2014-06-19 12:06:40 +02:00
Adrien Grand 9e624942d8 Fielddata: Move `getTermsEnum` from `AtomicFieldData` to `BytesValues.WithOrdinals`.
Similarly to `SortedSetDocValues.termsEnum()`.

Relates to #6524
2014-06-19 12:01:30 +02:00
Adrien Grand 9b02b5061b Fielddata: Merge ordinals APIs into BytesValues.WithOrdinals.
Mid-term we should switch from `BytesValues` to Lucene's doc values APIs, in
particular the `SortedSetDocValues` class. While `BytesValues.WithOrdinals` and
SortedSetDocValues expose the same functionality, `BytesValues.WithOrdinals`
exposes its ordinals via a different `Ordinals.Docs` object while
`SortedSetDocValues` exposes them on the same object as the one that holds the
values. This commit merges ordinals into `BytesValues.WithOrdinals` in order to
make both classes even closer.

Global ordinals were a bit tricky to migrate so I just changed them to use
Lucene's OrdinalMap that will soon (LUCENE-5767, scheduled for 4.9) have the
same optimizations as our global ordinals.

Close #6524
2014-06-19 12:00:51 +02:00
Adrien Grand 703dbff83d Index field names of documents.
The `exists` and `missing` filters need to merge postings lists of all existing
terms, which can be very costly, especially on high-cardinality fields. This
commit indexes the field names of a document under `_field_names` and reuses it
to speed up the `exists` and `missing` filters.

This is only enabled for indices that are created on or after Elasticsearch
1.3.0.

Close #5659
2014-06-19 11:50:06 +02:00
Adrien Grand e2da2114e7 Mappings: Allow _version to use `disk` as a doc values format.
VersionFieldMapper.defaultDocValuesFormat claims that the default is `disk`.
This is not used to choose the DV format in the index but for mappings
serialization in order to know when the _version doc values format is
different from the default format. This made it impossible to use the `disk`
doc values format since mappings would never retain that information at
serialization time.

Close #6523
2014-06-19 11:39:17 +02:00
Boaz Leskes 5b919d4e4f [TEST] Added (trace) logging to testGetFields_complexField 2014-06-19 11:10:13 +02:00
javanna 2024067465 Java API: BulkRequest#add(Iterable) to support UpdateRequests
Closes #6551
2014-06-19 10:43:05 +02:00
Alexander Reelsen 9569166f94 Mapping: Fix possibility of losing meta configuration on field mapping update
The TTL, size, timestamp and index meta properties could be lost on an
update of a single field mapping due to a wrong comparison in the
merge method (which was caused by a wrong initialization, which marked
an update as explicitely disabled instead of unset.

Closes #5053
2014-06-19 08:39:26 +02:00
Fitblip d18fb8bfbd REST API: Allow to configure JSONP/callback support
Added the http.jsonp.enable option to configure disabling of JSONP responses, as those
might pose a security risk, and can be disabled if unused.

This also fixes bugs in NettyHttpChannel
* JSONP responses were never setting application/javascript as the content-type
* The content-type and content-length headers were being overwritten even if they were set before

Closes #6164
2014-06-19 08:34:38 +02:00
Britta Weber 1cbeaf6c45 significant terms: fix json response
Commit fbd7c9aa5d introduced a regression that caused
the min_doc_count to be equal to the number of documents in the
background set. As a result no buckets were built when the
response for significant terms was created.
This only affected the final XContent response.

closes #6535
2014-06-18 18:51:34 +02:00
javanna a499254566 Fixed typo in TransportAction log line 2014-06-18 17:37:45 +02:00
Boaz Leskes 1114835de5 Also send Refresh and Flush actions to relocation targets
Currently we send relocation & flush actions based on all assigned ShardRoutings. During the final stage of relocation, we may miss to refresh/flush a shard if the coordinating node has not yet processed the cluster state update indicating that a relocation is completed *and* the relocation target node has already processed it (i.e., started the shard and has accepted new indexing requests).

Closes #6545
2014-06-18 16:12:42 +02:00
Martijn van Groningen 68046b64c2 [TEST] Only use the clients from the node. 2014-06-18 16:06:09 +02:00
Martijn van Groningen cb97b79396 [TEST] Disable automatic refresh to prevent unintended field data warming of the the _parent field. 2014-06-18 16:06:09 +02:00
Colin Goodheart-Smithe 7423ce0560 Aggregations: Added percentile rank aggregation
Percentile Rank Aggregation is the reverse of the Percetiles aggregation.  It determines the percentile rank (the proportion of values less than a given value) of the provided array of values.

Closes #6386
2014-06-18 12:02:08 +01:00
Britta Weber fe89ea1ecf percolator: fix handling of nested documents
Nested documents were indexed as separate documents, but it was never checked
if the hits represent nested documents or not. Therefore, nested objects could
match not nested queries and nested queries could also match not nested documents.

Examples are in issue #6540 .

closes #6540
closes #6544
2014-06-18 12:24:47 +02:00
Martijn van Groningen cb9548f811 Changed the type of field docCounts to IntArray instead of LongArray, because a shard can't hold more than Integer.MAX_VALUE a LongArray just takes unnecessary space.
Closes #6529
2014-06-17 13:35:05 +02:00
Simon Willnauer adb5c19849 [CLIENT] Remove unnecessary intermediate interfaces
Client, ClusterAdminClient and IndicesAdminClient had corresponding
intermediate `internal` interfaces that are unnecessary and cause
a lot of casting. This commit removes the intermediate interfaces
and uses the super interfaces directly.

This commit also adds Releaseable to `Node` and `Client` in order to
be used with utilities like try / with.

Closes #4355
Closes #6517
2014-06-17 12:18:37 +02:00
Simon Willnauer e198c58a6b [TEST] Use test.bwc.version if compatibility version is not present 2014-06-17 12:16:10 +02:00
Adrien Grand a06fd46a72 [Benchmark] Fix TermsAggregationSearchBenchmark: The `ordinals` execution mode doesn't exist anymore. 2014-06-17 01:46:02 +02:00
Robert Muir 3892b6ce05 Use ordinals for comparison in GlobalOrdinalsStringTermsAggregator.buildAggregation. Closes #6518 2014-06-16 18:25:28 -04:00
Shay Banon 0427e49b5d [TEST] verify all threads created by node and client have the node name
closes #6516
2014-06-16 21:50:12 +02:00
Martijn van Groningen 612f4618e7 [TEST] wait for ongoing recoveries to finish. Flush fails on shards otherwise. 2014-06-16 17:01:38 +02:00
Simon Willnauer 61eac483ed [TEST] Fix test cluster naming
This commit renames `TestCluster` -> `InternalTestCluster` and
`ImmutableTestCluster` to `TestCluster` for consistency. This also
makes `ExternalTestCluster` and `InternalTestCluster` consistent
with respect to their execution environment.

Closes #6510
2014-06-16 15:14:54 +02:00
Lee Hinman 0f180bd5fd [TEST] Add test for accessing _score in scripts 2014-06-16 14:12:21 +02:00
Simon Willnauer 4dfa822e1b [TEST] Add basic Backwards Compatibility Tests
This commit add a basic infrastructure as well as primitive tests
to ensure version backwards compatibility between the current
development trunk and an arbitrary previous version. The compatibility
tests are simple unit tests derived from a base class that starts
and manages nodes from a provided elasticsearch release package.

Use the following commandline executes all backwards compatiblity tests
in isolation:

```
mvn test -Dtests.bwc=true -Dtests.bwc.version=1.2.1 -Dtests.class=org.elasticsearch.bwcompat.*
```

These tests run basic checks like rolling upgrades and
routing/searching/get etc. against the specified version. The version
must be present in the `./backwards` folder as
`./backwards/elasticsearch-x.y.z`
2014-06-16 12:40:43 +02:00
Simon Willnauer 93b56eb004 [TEST] Force flush even if not needed to ensure successful shards is greater than 0 2014-06-16 11:00:34 +02:00
Simon Willnauer 76fab9d42a [TEST] consistently omit norms in test otherwise scoring will be dependent on merges etc. 2014-06-16 10:54:12 +02:00
Shay Banon 4c579d6c8d Better default size for global index -> alias map
The alias -> (index -> alias) map, specifically the index -> alias one, typically just hold one entry, yet we eagerly initialize it to the number of indices. When there are many indices, each with many aliases, this is a very large overhead per alias...
closes #6504
2014-06-16 00:52:33 +02:00
Simon Willnauer 6d77a248fb [TEST] Stabelize test - wait for yellow to ensure all primaries are allocated 2014-06-14 21:52:15 +02:00
Adrien Grand 7bcabf9481 Fielddata: Don't expose hashes anymore.
Our field data currently exposes hashes of the bytes values. That takes roughly
4 bytes per unique value, which is definitely not negligible on high-cardinality
fields.

These hashes have been used for 3 different purposes:
 - term-based aggregations,
 - parent/child queries,
 - the percolator _id -> Query cache.

Both aggregations and parent/child queries have been moved to ordinals which
provide a greater speedup and lower memory usage. In the case of the percolator
it is used in conjunction with HashedBytesRef to not recompute the hash value
when getting resolving a query given its ID. However, removing this has no
impact on PercolatorStressBenchmark.

Close #6500
2014-06-13 23:05:02 +02:00
Adrien Grand 232394e3a8 Aggregations: Remove `ordinals` execution hint.
This was how terms aggregations managed to not be too slow initially by caching
reads into the terms dictionary using ordinals. However, this doesn't behave
nicely on high-cardinality fields since the reads into the terms dict are
random and this execution mode loads all unique terms into memory.

The `global_ordinals` execution mode (default since 1.2) is expected to be
better in all cases.

Close #6499
2014-06-13 23:02:20 +02:00
Adrien Grand fbd7c9aa5d Aggregations: Fix reducing of range aggregations.
Under some rare circumstances:
 - local transport,
 - the range aggregation has both a parent and a child aggregation,
 - the range aggregation got no documents on one shard or more and several
   documents on one shard or more.
the range aggregation could return incorrect counts and sub aggregations.

The root cause is that since the reduce happens in-place and since the range
aggregation uses the same instance for all sub-aggregation in case of an
empty bucket, sometimes non-empty buckets would have been reduced into this
shared instance.

In order to avoid similar bugs in the future, aggregations have been updated
to return a new instance when reducing instead of doing it in-place.

Close #6435
2014-06-13 23:01:43 +02:00
Martijn van Groningen 52be3748ff [TEST] Fix assert 2014-06-13 18:03:25 +02:00
javanna b9ffb2b0a5 Java API: Make sure afterBulk is always called in BulkProcessor after beforeBulk
Moved BulkProcessor tests from BulkTests to newly added BulkProcessorTests class.
Strenghtened BulkProcessorTests by adding randomizations to existing tests and new tests for concurrent requests and expcetions.
Also made sure that afterBulk is called only once per request if concurrentRequests==0.

Closes #5038
2014-06-13 17:40:06 +02:00
Boaz Leskes 44097b358d [Test] set search request size testGeohashCellFilter
The default of 10 is not good enough as previously thought.
2014-06-13 16:11:13 +02:00
Alexander Reelsen b252aacc67 Packaging: Remove java-6 directories from debian init script
Closes #6350
2014-06-13 13:48:22 +02:00
Adrien Grand 01327d7136 Facets: deprecation.
Users are encouraged to move to the new aggregation framework that was
introduced in Elasticsearch 1.0.

Close #6485
2014-06-13 13:13:44 +02:00
Martijn van Groningen 59ff05020f [TEST] Removed incorrect assertion (it is expected that the flush doesn't execute on all shard copies, because we don't wait for green status) 2014-06-13 12:25:43 +02:00
mikemccand 9620aa315e [TEST] Add FailureMarker to test listeners so -Dtests.failfast works 2014-06-13 06:04:33 -04:00
Martijn van Groningen 77e0429089 [TEST] Verify the flush reponse 2014-06-13 11:40:05 +02:00
Boaz Leskes 7fb16c783d Added caching support to geohash_filter
Caching is turned off by default.

Closes #6478
2014-06-12 22:19:34 +02:00
Shay Banon 2330421816 Wait till node is part of cluster state for join process
When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.
closes #6480
2014-06-12 18:15:51 +02:00
Lee Hinman 3a3f81d59b Enable DiskThresholdDecider by default, change default limits to 85/90%
Fixes #6200
Fixes #6201
2014-06-12 16:35:29 +02:00
Alex Ksikes 35cba50fce More Like This Query: creates only one MLT query per field for all queried items.
Previously, one MLT query per field was created for each item. One issue with
this method is that the maximum number of selected terms was equal to the
number of items times 'max_query_terms'. Instead, users should have direct control
over the maximum number of selected terms allowed, regardless of the number of
queried items.

Another issue related to the previous method is that it could lead to the
selection of rather uninteresting terms, that because they were found in a
particular queried item. Instead, this new procedure enforces the selection of
interesting terms across ALL items, not within each item. This could lead to
search results where the best matching items share commonalities amongst the
best characteristics of all the items.

Closes #6404
2014-06-12 14:19:33 +02:00
Simon Willnauer 5575ba1a12 [BUILD] Check for tabs and nocommits in the code on validate
This commit adds checks for nocommit and tabs in the source code.
The task is executed during the validate phase and can be disabled via
`-Dvalidate.skip`
2014-06-12 11:11:23 +02:00
Israel Tsadok 32f87f8617 Highlighting: make HighlightQuery class public 2014-06-11 17:23:43 +02:00
Clinton Gormley 673ef3db3f The StemmerTokenFilter had a number of issues:
* `english` returned the slow snowball English stemmer
* `porter2` returned the snowball Porter stemmer (v1)
* `portuguese` was used twice, preventing the second version from working

Changes:

* `english` now returns the fast PorterStemmer (for indices created from v1.3.0 onwards)
* `porter2` now returns the snowball English stemmer (for indices created from v1.3.0 onwards)
* `light_english` now returns the `kstem` stemmer (`kstem` still works)
* `portuguese_rslp` returns the PortugueseStemmer
* `dutch_kp` is a synonym for `kp`

Tests and docs updated

Fixes #6345
Fixes #6213
Fixes #6330
2014-06-11 12:30:16 +02:00
Clinton Gormley c25de57d5d Tests: Fixed CompletionSuggester test which relied on a bug 2014-06-10 21:34:03 +02:00
Clinton Gormley bb15def36e Stats: Bugfixes and enhancements to indices stats API
Bugs:
* "groups" and "types" were being ignored
* "completion_fields" as wildcards were not being resolved to fieldnames

Enhancements:
* Made "groups" and "types" support wildcards
* Added missing tests

Closes #6390
2014-06-10 17:35:49 +02:00
Alexander Reelsen d3dc158458 TransportClient: Improve logging, fix minor issue
In order to return more information to the client, in case a TransportClient
can not connect to the cluster, this commit adds logging and also returns the
configured nodes in the NoNodeAvailableException

Also a minor bug has been fixed, which propagated exceptions wrong, so that an
invalid request was actually tried on every node, if a regular connection failure
on the first node had happened.

Closes #6376
2014-06-10 13:15:59 +02:00
Martijn van Groningen 38be1e0dde Aggregations: if maxOrd is 0 then use noop collector
Before the OrdinalsCollector was used and this leads to a ArrayIndexOutOfBoundsException

Closes #6413
2014-06-10 09:14:06 +02:00
Martijn van Groningen e15d2e2514 Fielddata: EmptyOrdinals#getMaxOrd() should return 0 instead of 1, since ordinals are zero based since #5871. 2014-06-10 09:13:27 +02:00
Martijn van Groningen 5e408f3d40 Change the top_hits to be a metric aggregation instead of a bucket aggregation (which can't have an sub aggs)
Closes #6395
Closes #6434
2014-06-10 09:09:50 +02:00
javanna ed5b49a5be [TEST] Added backwards compatibility check to control whether to enable client nodes or not within TestCluster
Our REST backwards compatibility tests need to be able to disable client nodes within the TestCluster when running older tests that assume client nodes are not around.
2014-06-07 15:39:56 +02:00
mikemccand bb8a666b6d make test less evil 2014-06-07 04:15:52 -04:00
Boaz Leskes a06b84d392 [Test] Enabled trace logging to testAutoGenerateIdNoDuplicates
also increased iterations some, to increase chance of identifying bad shards
2014-06-07 09:47:15 +02:00
Boaz Leskes 6c7d260770 fixing recovery debug logging param mismatch 2014-06-07 09:36:48 +02:00
Matthew L Daniel b0a85f6ca3 Guard against improper auto_expand_replica values
Previously if the user provided a non-conforming string, it would blow up with
`java.lang.StringIndexOutOfBoundsException: String index out of range: -1`
which is not a *helpful* error message.

Also updated the documentation to make the possible setting values more clear.

Close #5752
2014-06-07 01:19:06 +02:00
Boaz Leskes b454f64c57 Bulk request which try and fail to create multiple indices may never return
This is caused by an NPE in the error handling code. All is well if only 1 index creation fails (or none).

Closes #6436
2014-06-06 23:10:42 +02:00
markharwood 724129e6ce Aggregations optimisation for memory usage. Added changes to core Aggregator class to support a new mode of deferred collection.
A new "breadth_first" results collection mode allows upper branches of aggregation tree to be calculated and then pruned
to a smaller selection before advancing into executing collection on child branches.

Closes #6128
2014-06-06 15:59:51 +01:00
javanna 11f7c31852 Put index template api: unified PUT/POST behaviour in relation to create parameter
The put index template api supports the create parameter (defaults to false), which tells whether the template can replace an existing one with same name or not. Unified its behaviour between PUT and POST method, whereas the POST would previously force create to true.

Added create parameter to the rest spec (was missing before) and a REST test for create true scenario.
2014-06-06 15:45:05 +02:00
Simon Willnauer 797a9b07ef FileSystem: Use XNativeFSLockFactory instead of the buggy Lucene 4.8.1 version
There is a pretty nasty bug in the lock factory we use that can cause
nodes to use the same data dir wiping each others data. Luckily this is
unlikely to happen if the nodes are running in different JVM which they
do unless they are embedded.

See LUCENE-5738

Closes #6424
2014-06-06 11:51:47 +02:00
mikemccand 2a6468efbd make this new test a bit less stressful for nightly; catch FlushNotAllowedEngineException 2014-06-05 13:57:59 -04:00
mikemccand 59635f9397 Core: switch to the new ConcurrentHashMap implementation coming in Java 8
The new implementation has lower RAM overhead and better concurrency
in some cases.

Closes #6400
2014-06-05 13:39:23 -04:00
stephlag 6a82d59cb8 [DOCS] Added Javadocs to ESLogger and ESLoggerFactory 2014-06-05 19:15:22 +02:00
mikemccand 30d8467775 revert CHMV8 for now (it doesn't compile under Java8) 2014-06-05 12:13:06 -04:00
javanna 21772e0bf9 Scripts: exposed _uid, _id and _type fields as stored fields (_fields notation)
The _uid field wasn't available in a script despite it's always stored. Made it available and made available also _id and _type fields that are deducted from it.

Closes #6406
2014-06-05 17:16:55 +02:00
mikemccand 838142646f Core: switch to the new ConcurrentHashMap implementation coming in Java 8
The new implementation has lower RAM overhead and better concurrency
in some cases.

Closes #6400
2014-06-05 10:49:23 -04:00
mikemccand 2ad8a60532 add versioning test 2014-06-05 09:38:22 -04:00
stephlag b5c9d8c98b Add Javadoc 2014-06-04 17:18:25 +02:00
mikemccand 50e42265ef Indexing: clear versionMap on refresh (not flush) to reduce heap usage
The versionMap holds all versions (keyed by _uid) for recently indexed
documents.  Previously we only cleared it during flush, which can be
infrequent if the translog flush thresholds are high, and can cause
excessive heap usage especially for small documents.

Now we clear it during refresh which is usually more frequent
(e.g. once per second by default).

Closes #6379
2014-06-04 05:37:51 -04:00
Colin Goodheart-Smithe f78480a0bc Aggregations: Fixed failures when geo points are all either positive or negative 2014-06-04 09:16:29 +01:00
Simon Willnauer 288eb3d803 [TEST] remove trace logging 2014-06-04 10:10:38 +02:00
Boaz Leskes ef5d64c73b [Test] Extended IndexActionTests.testAutoGenerateIdNoDuplicates to check both with and without a specific type
The test also captures the first error but continues to run searches in order to gather more information before failing.
2014-06-03 21:55:10 +02:00
Simon Willnauer 963f627dca Add [1.2.1] Release 2014-06-03 17:25:57 +02:00
Colin Goodheart-Smithe b9f4d44b14 Aggregations: Adds GeoBounds Aggregation
The GeoBounds Aggregation is a new single bucket aggregation which outputs the coordinates of a bounding box containing all the points from all the documents passed to the aggregation as well as the doc count. Geobound Aggregation also use a wrap_logitude parameter which specifies whether the resulting bounding box is permitted to overlap the international date line.  This option defaults to true.

This aggregation introduces the idea of MetricsAggregation which do not return double values and cannot be used for sorting.  The existing MetricsAggregation has been renamed to NumericMetricsAggregation and is a subclass of MetricsAggregation.  MetricsAggregations do not store doc counts and do not support child aggregations.

Closes #5634
2014-06-03 15:59:56 +01:00
Simon Willnauer 4b28bc396d Translog: Revert unlimited flush_treshold_ops for translog
This commit reverts the commit for issue #5900 introduced
in `1.2.0`. The unlimited translog size can cause memory pressure
on ES instances with low memory and high indexing load.

Closes #6377
2014-06-03 16:54:22 +02:00
Adrien Grand 7ab99de483 Routing: Restore shard routing.
Routing has been inadvertly changed in #5562 resulting in documents going to
different shards in 1.2. This is a terrible bug because an indexing request
would not necessarily go to the same shard anymore, potentially leading to
duplicates.

Close #6391
2014-06-03 16:37:54 +02:00
Kevin Wang 6a399d4c9a Remove support for field names in node_stats url
Field names ended up making the urls too long, fields are still supported as query string parameters though (same as indices stats)
2014-06-03 13:57:07 +02:00
stephlag 10cb136eb0 [DOCS] Fixed typo in IndexRequestBuilder Javadocs 2014-06-03 13:48:41 +02:00
Alex Ksikes 9797e343aa More Like This Query: values of a multi-value fields are compared at the same level.
Previously, More Like This would create a new mlt query for each value of a
multi-value field. This could result in all the values of the field to be
selected, which defeats the purpose of More Like This. Instead, the correct
behavior is to generate only one mlt query for all the values of the field.
This commit provides the correct behavior for More Like This DSL. The fix for
More Like This API will be coming in another commit.

Closes #6310
2014-06-03 13:43:51 +02:00
Adrien Grand df67b17646 BigArrays: Disable breaking.
The BigArrays limit is currently shared by the translog, netty, http and some
queries/aggregations. If any of these consumers starts taking a lot of memory,
then other ones might fail to allocate memory, which could have bad
consequences, eg. if ping requests can't be sent. The plan is to come up with
a better solution in 1.3.

Close #6332
2014-06-03 11:34:25 +02:00
javanna 90b1e6a461 [TEST] make sure that the -Dtests.rest.blacklist parameter works on windows too
Some reserved characters need to be replaced in the test section names, which gets parsed as a path although it isn't a filename
2014-06-03 09:23:37 +02:00
Britta Weber 125e0c16cd Object and Type parsing: Fix include_in_all in type
include_in_all can also be set on type level (root object).
This fixes a regression introduced  in #6093

closes #6304
2014-06-02 17:48:19 +02:00
Colin Goodheart-Smithe a23e4aefaa Geo: Issue with polygons near date line
If a polygon is constructed which overlaps the date line but has a hole which lies entirely one to one side of the date line, JTS error saying that the hole is not within the bounds of the polygon because the code which splits the polygon either side of the date line does not add the hole to the correct component of the final set of polygons.  The fix ensures this selection happens correctly.

Closes #6179
2014-06-02 15:03:32 +01:00
Martijn van Groningen f2641d29ae [TEST] Added sort duel between a single shard index and a multi shard index. 2014-06-02 14:16:55 +02:00
Martijn van Groningen 43b21719f5 [TEST] size should start from 1, top_hits aggregation doesn't support size <= 0 2014-06-02 13:21:13 +02:00
Simon Willnauer 3b31f25624 [TEST] Ensure cluster size reflected in the cluster state
We perform some management operations that require the cluster to be
consistent with respect to the number of nodes in the cluster state
/ visible to the master in order to rely on the ack mechanism. This
only applies to the test infrastructure when nodes are not explicitly
started / stopped as well as while tearing down the cluster and wiping
indices after the tests.
2014-06-02 11:57:32 +02:00
mikemccand 7552b69b1f Core: reuse Lucene's TermsEnum for faster _uid/version lookup during
Reusing Lucene's TermsEnum for _uid/version lookups gives a small
indexing (updates) speedup and brings us a closer to not having
to spend RAM on bloom filters.

Closes #6212
2014-05-31 17:38:48 -04:00
Martijn van Groningen f51a09d8f7 Core: Protects against: 'from + size > scoreDocs.length' in case only single shard response 2014-05-31 20:30:11 +02:00
javanna e8995ecaa7 [TEST] speed up HighlightSearchTests a bit
Randomize rewrite methods instead of trying them all when highlighting multi term queries with postings highlighter
Rely on search type randomization and remove all the explicit setSearchType calls as they are not needed anymore
Remove explicit `.from`, `.size` and `.explain`, not needed and might slow tests down (especially explain)
2014-05-31 16:29:53 +02:00
Martijn van Groningen 01ca8491cf Core: apply 'from' if there is one shard result. 2014-05-31 13:35:11 +02:00
Martijn van Groningen b8366a3213 Aggregations: apply 'from' if there is one shard result. 2014-05-31 13:34:49 +02:00
Clinton Gormley 46a67b638d Parent/Child: Added min_children/max_children to has_child query/filter
Added support for min_children and max_children parameters to
the has_child query and filter. A parent document will only
be considered if a match if the number of matching children
fall between the min/max bounds.

Closes #6019
2014-05-30 19:38:39 +02:00
mikemccand 48ccb06160 remove stale nocommit 2014-05-30 13:22:48 -04:00
Martijn van Groningen 760cee7c24 Aggregations: Take the 'from' into account when getting a fetched hit (InternalSearchHit). Hits before the 'from' are included in each shard result. 2014-05-30 16:23:28 +02:00
Shay Banon 9c98bb3554 Have a dedicated join timeout that is higher than ping.timeout for node join
Using ping.timeout, which defaults to 3s, to use as a timeout value on the join request a node makes to the master once its discovered can be too small, specifically when there is a large cluster state involved (and by definition, all the buffers and such on the nio layer will be "cold"). Introduce a dedicated join.timeout setting, that by default is 10x the ping.timeout (so 30s by default).
closes #6342
2014-05-30 12:42:08 +02:00
Martijn van Groningen 0e2d33b4a4 [BUILD] Fix compile error 2014-05-30 12:24:11 +02:00
Martijn van Groningen aab38fb2e6 Aggregations: added pagination support to `top_hits` aggregation by adding `from` option.
Closes #6299
2014-05-30 11:45:31 +02:00
Martijn van Groningen 35755cd8a4 Aggregations: Fixed bug in top_hits aggregation to not fail with NPE when shard results are empty.
The top_hits aggregation returned an empty InternalTopHits instance with no fields set when there were no result, causing reduce and serialization errors down the road. This is fixed by setting all required fields when a there are no results.

Closes #6346
2014-05-30 11:40:45 +02:00
Igor Motov 8c903f4787 [TESTS] Add get snapshot status test for partial snapshots 2014-05-29 19:07:04 -04:00
Boaz Leskes 93e0ce0c5b [Test] added search trace logging to IndexActionTests.testAutoGenerateIdNoDuplicates 2014-05-28 22:12:23 +02:00
Boaz Leskes dc34ccebfe [Tests] assert indexRandom's deletion of injection dummy docs find them 2014-05-28 22:06:38 +02:00
Adrien Grand 4ff511000e [TESTS] There might be several live `BigArrays` instances at the same time. 2014-05-28 16:55:26 +02:00
Adrien Grand cc9a7bd454 Recycling: change the default type of the page recycler to CONCURRENT instead of SOFT_CONCURRENT.
This default type has been inherited from its ancestor, the (non-paged) recycler whose memory
usage was unbounded and required soft references to make sure it could release memory eventually.
On the contrary, the page cache recycler memory usage is bounded so we could remove soft
references in order to remove load on the garbage collector.

Note: the cache type is already randomized in integration tests.

Close #6320
2014-05-28 15:23:18 +02:00
Simon Willnauer a5866e226e Mustache: Ensure internal scope extrators are always operating on a Map
Mustache extracts the key/value pairs for parameter substitution from
objects and maps but it's decided on the first execution. We need to
make sure if the params are null we pass an empty map to ensure we
bind the map based extractor

Closes #6318
2014-05-28 13:29:21 +02:00
Mathias Fussenegger 82e9a4e80a Serialization: Add support for Byte to the XContentBuilder.
Close #6127
2014-05-28 12:19:44 +02:00
Adrien Grand be29138962 [BUILD] Remember to use AtomicReader.addCoreClosedListener when upgrading to Lucene 4.9. 2014-05-28 09:35:00 +02:00
mateusz_kaczynski e97a381db2 Highlighting: Plain highlighter to use analyzer defined on a document level when available.
At the moment plain highligher only uses an analyzer defined for on the type
level. However, during the indexing stage it is possible to define analyzer on
per document level, for example mapping '_analyzer' to another field, containing
required name. This commit attempts to make sure that highlighting works
correctly in this scenario.

Closes #5497
2014-05-28 08:27:14 +02:00
Shay Banon 13f49237df [Test] make sure to close the file at the end of the test 2014-05-27 11:08:29 +02:00
Shay Banon cd94af2c9e [Test] make sure we test writeTo(Channel) in BytesReference
also introduce proper randomization of content in the bytes
2014-05-26 13:32:52 +02:00
Alex Brasetvik 15ff3df243 Fix MatchQueryParser not parsing fuzzy_transpositions 2014-05-23 22:02:21 +02:00
Martijn van Groningen 3f2f1f088d Set the sortValues on SearchHit post aggregation instead of during the reduce. 2014-05-23 19:05:30 +02:00
Lee Hinman 65ce5acfb4 Explicitly clean up fielddata cache when clearing entire cache 2014-05-23 16:29:26 +02:00
Robert Muir 2cbe9371d2 Improve error when mlockall fails (closes #6288) 2014-05-23 10:16:26 -04:00
Martijn van Groningen 5fafd2451a Added `top_hits` aggregation that keeps track of the most relevant document being aggregated per bucket.
Closes #6124
2014-05-23 16:01:18 +02:00
Adrien Grand 2d417cf5b6 [TESTS] Left-over from 14420d7c4e. 2014-05-23 10:10:00 +02:00
Adrien Grand 14420d7c4e [TESTS] Fix test to use index-level doc IDs instead of segment-level doc IDs. 2014-05-23 01:20:41 +02:00
Adrien Grand 0d3410a837 [TESTS] Fix test bug in SimpleValidateQueryTests. 2014-05-23 00:52:56 +02:00
Nik Everett 0ff0985e01 Limit guava caches to 31.9GB
Guava's caches have overflow issues around 32GB with our default segment
count of 16 and weight of 1 unit per byte.  We give them 100MB of headroom
so 31.9GB.

This limits the sizes of both the field data and filter caches, the two
large guava caches.

Closes #6268
2014-05-23 00:20:12 +02:00
Adrien Grand a836496e57 [TESTS] Randomly disable the filter cache.
Close #6280
2014-05-22 23:13:29 +02:00
Adrien Grand 6e49256fa8 Nested: Make sure queries/filters/aggs get a FixedBitSet when they expect one.
Close #6279
2014-05-22 23:13:13 +02:00
Adrien Grand b3274bd770 Aggregations: Fix ReverseNestedAggregator to compute the parent document correctly.
Close #6278
2014-05-22 23:13:13 +02:00
Martijn van Groningen cbdd11777f [TEST] Just start two nodes 2014-05-22 21:13:52 +02:00
Martijn van Groningen 41bcb3e0d3 [TEST] Don't stop master node. 2014-05-22 19:17:54 +02:00
Nik Everett 3573822b7e Highlight fields in request order
Because json objects are unordered this also adds an explicit order syntax
that looks like
    "highlight": {
        "fields": [
            {"title":{ /*params*/ }},
            {"text":{ /*params*/ }}
        ]
    }

This is not useful for any of the builtin highlighters but will be useful
in plugins.

Closes #4649
2014-05-22 16:44:14 +02:00
Alex Ksikes 2546c06131 More Like This Query: allow for both 'like_text' and 'docs/ids' to be specified.
Closes #6246
2014-05-22 13:50:17 +02:00
Martijn van Groningen a717af505a [TEST] Use _uid sort field as tie, so that hits with the same score are sorted in the same way in both search responses. 2014-05-22 12:10:03 +02:00
Colin Goodheart-Smithe cabd2340dd Aggregations: Fixed conversion of date field values when using multiple date formats
When multiple date formats are specified using the || syntax in the field mappings the date_histogram aggregation breaks.  This is because we are getting a parser rather than a printer from the date formatter for the object we use to convert the DateTime values back into Strings.  Simple fix to get the printer from the date format and test to back it up

Closes #6239
2014-05-22 10:21:50 +01:00
Martijn van Groningen e8e684c6c4 Add number of shards statistic to PercolateContext instead of throwing exception.
Certain features like significant_terms aggregation rely on this statistic for sizing heuristics.

Closes #6037
Closes #6123
2014-05-22 10:44:50 +02:00
Martijn van Groningen 16e5cdf8d0 Cut over to Lucene's TopDocs#merge for shard topdocs sorting.
Closes #6197
2014-05-22 10:40:56 +02:00
Martijn van Groningen 157d511061 [TEST] Use SuiteScopeTest annotation instead of ClusterScope(scope = ElasticsearchIntegrationTest.Scope.SUITE, numDataNodes = 1) 2014-05-21 22:08:59 +02:00
Alex Ksikes a29b4a800d More Like This Query: replaced 'exclude' with 'include' to avoid double negation when set.
Closes #6248
2014-05-21 18:45:03 +02:00
Britta Weber 8cca9b28df Percolator: Fix assertion in percolation with nested docs
Assertion was triggered for percolating documents with nested object
in mapping if the document did not actually contain a nested object.
Reason:
MultiDocumentPercolatorIndex checks if the number of documents is
actualu >1. Instead we can just use the SingleDocumentPercolatorIndex
in this case.

closes #6263
2014-05-21 18:17:36 +02:00
Simon Willnauer 17d34d5c97 Fix FieldDataWeighter generics to accept RamUsage instead of AtomicFieldData
The `FieldDataWeighter` allowed to use a concrete subclass of the caches
generic type to be used that causes ClassCastException and also trips the
CirciutBreaker to not be decremented appropriately.

This was tripped by settings randomization also part of this commit.

Closes #6260
2014-05-21 17:50:45 +02:00
Lee Hinman 03402c7ed8 [TEST] prevent dummy documents from being indexed in testSimpleQueryString() since scores are compared 2014-05-21 17:37:54 +02:00
Martijn van Groningen a6b0b80f3d [TEST] Added test for #6256 2014-05-21 16:17:03 +02:00
Adrien Grand 34f7bd1ca4 Fail queries that have two aggregations with the same name.
Close #6255
2014-05-21 15:11:23 +02:00
Simon Willnauer f29744cc2f XFilteredQuery default strategy prefers query first in the deleted docs case
Today we check if the DocIdSet we filter by is `fast` but the check fails
if the DocIdSet if wrapped in an `ApplyAcceptedDocsFilter` which is always
the case if the index has deleted documents. This commit unwraps
the original DocIdSet in the case of deleted documents.

Closes #6247
2014-05-21 13:04:41 +02:00
Adrien Grand fa3bd738ab Remove `DocIdSets.isFastIterator(DocIdSetIterator iterator)`.
This method was unused and its implementation wasn't correct since FixedBitSet
has its own iterator since Lucene 4.7.
2014-05-21 11:25:35 +02:00
Simon Willnauer a60dabdf0c [TEST] skip benchmark tests for now 2014-05-20 22:21:37 +02:00
Martijn van Groningen 9494bbd9b7 Verify that the current node is still master before the reroute is executed and if that isn't the case skip reroute
Invoke listener when reroute fails.

Closes #6244
2014-05-20 18:29:06 +02:00
Simon Willnauer 0e445d3aaf [TEST] Wait for all benchmarks to be started if more than one is used 2014-05-20 17:17:25 +02:00
Simon Willnauer 75efa47d5a [TEST] Allow to disable plugin loading from classpath 2014-05-20 16:31:32 +02:00
mikemccand 9c45fe8f9b Don't use AllTokenStream when no fields were boosted
AllTokenStream, used to index the _all field, adds some overhead, but
it's not necessary when no fields were boosted or when positions are
not indexed the _all field.

Closes #6187 Closes #6219
2014-05-20 10:28:31 -04:00
Andrew Selden 476e28f4ce Benchmark abort accepts wildcard patterns
This adds support for sending a list of benchmark names and/or wildcard
patterns when submitting an abort request. Standard wildcards such as:
"*", "*xxx", and "xxx*" are supported. Multiple benchmark names and
wildcard patterns can be submitted together as a comma-separated list
from the REST API or as a string array from the Java API.

Closes #6185
2014-05-20 16:00:11 +02:00
Simon Willnauer e47ee6f683 [TEST] Disable dummy documents for QueryRescorerTests#testEquivalence 2014-05-20 15:52:00 +02:00
Igor Motov 91c7892305 Add ability to snapshot replicating primary shards
This change adds a new cluster state that waits for the replication of a shard to finish before starting snapshotting process. Because this change adds a new snapshot state, an pre-1.2.0 nodes will not be able to join the 1.2.0 cluster that is currently running snapshot/restore operation.

Closes #5531
2014-05-20 08:57:21 -04:00
Boaz Leskes 05d131c39d Before deleting a local unused shard copy, verify we're connected to the node it's supposed to be on
This is yet another safety guard to make sure we don't delete data if the local copy is the only one (even if it's not part of the cluster state any more)

Closes #6191
2014-05-20 11:16:13 +02:00
Boaz Leskes 541acc7e9b Honor time delay when retrying recoveries
In some places we want to delay the start of a shard recovery because the source node is not ready to receive. At the moment the retry logic ignores the time delay parameter (`retryAfter`) causing a busy waiting like scenario. This is fixed in this commit.

Closes #6226
2014-05-20 11:03:14 +02:00
Simon Willnauer 223550bf3c [TEST] Opt out of dummy documents where scores are relevant. 2014-05-20 10:26:50 +02:00
Simon Willnauer ac28557228 [TEST] Provide overloaded indexRandom to opt out of dummy documents 2014-05-20 10:20:31 +02:00
Clinton Gormley 0741ce3684 CharArraySet doesn't know how to lookup the original string
in an ImmutableList.

Closes #6237
2014-05-19 21:27:04 +02:00
Simon Willnauer 7d76548a1a Added Version [1.3.0] 2014-05-19 20:55:23 +02:00
Simon Willnauer 85a0b76dbb Upgrade to Lucene 4.8.1
This commit upgrades to the latest Lucene 4.8.1 release including the
following bugfixes:

 * An IndexThrottle now kicks in when merges start falling behind
   limiting index threads to 1 until merges caught up. Closes #6066
 * RateLimiter now kicks in at the configured rate where previously
   the limiter was limiting at ~8MB/sec almost all the time. Closes #6018
2014-05-19 20:47:55 +02:00
Andrew Selden 3731362ca8 Do not throw execption on no available nodes when listing benchmarks.
Changed behavior to not throw an exception on a status request
when there are no available benchmark nodes.

Closes #6146
2014-05-19 11:42:15 -07:00
Tiago Alves Macambira 4dd2ba6d50 Register uppercase as an exposed ES token filter.
Just follow "lowercase" token filter example and register "uppercase" token filter as an exposed token filter. This will not, by itself, test whether ES is correctly handling "uppercase" TF; this is more of a "code as documentation" fix.
2014-05-19 13:49:33 -03:00
Simon Willnauer fc28fbfada Add dummy docs injection to indexRandom
This commit add `dummy docs` to `ElasticsearchIntegrationTest#indexRandom`.
It indexes document with an empty body into the indices specified by the docs
and deletes them after all docs have been indexed. This produces gaps in
the segments and enforces usage of accept docs on lower levels to ensure
the features work with delete documents as well.
2014-05-19 17:23:14 +02:00
Simon Willnauer 579a79d1ac Check accepts docs before MatchDocIdSet#matchDoc(int)
We currently ask `MatchDocIdSet#matchDoc(int)` before consulting
the accept docs. This can also have a negative performance impact
since `matchDoc(int)` calls might be way more expensive than
acceptDocs calls.

Closes #6234
2014-05-19 17:17:55 +02:00
Simon Willnauer 3e4c896944 [TEST] Drop obsolet test - the option is obsolet and won't be fixed 2014-05-19 15:06:04 +02:00
Simon Willnauer 72da764261 Don't report terms as live if all it's docs are filtered out
FilterableTermsEnum allows to filter stats by supplying per segment
bits. Today if all docs are filtered out the term is still reported as
live but shouldn't.

Relates to #6211
2014-05-19 13:48:56 +02:00
Simon Willnauer c593234b7c [TEST] Ensure multi_match & match query equivalence in the single field case 2014-05-19 13:32:24 +02:00
Martijn van Groningen 39018c5d0b [TEST] Added await for yellow status,
because the shard the get request for 'test' index, 'type1' type and id 1 is getting executed on may not be in a started state
and also added more logging.
2014-05-19 11:56:26 +02:00
Simon Willnauer d9441747e8 [TEST] Beef up MoreLikeThisActionTests#testCompareMoreLikeThisDSLWithAPI 2014-05-18 23:02:08 +02:00
Simon Willnauer 91b74931a3 [TEST] Stabelize MoreLikeThisActionTests
The `testCompareMoreLikeThisDSLWithAPI` test compares results from query
and API which might query different shards. Those shares might use
different doc IDs internally to disambiguate. This commit resorts the
results and compares them after stable disambiguation.
2014-05-18 22:57:46 +02:00
mikemccand 4f7792e64b Tie-break suggestions from phrase suggester by term
If the score for two suggestions is the same, we now tie break by term; earlier terms (aaa) sort before later terms (zzz).

Closes #5978
2014-05-18 16:45:37 -04:00
Simon Willnauer dab4596b13 Use default forceAnalyzeQueryString if no query builder is present
In the single field case no query builder is selected which causes NPE
when the query has only a numeric field.

Closes #6215
2014-05-18 10:20:31 +02:00
Boaz Leskes 1e5138889e Translog: remove unneeded Versions.readVersion & Versions.writeVersion
These calls were introduced in pr #6149 as a backward compatibility layer for the previous value of `Versions.MATCH_ANY`. This is not needed as the translog never contains these values. On top of that, the calls are not effective as the stream the translog used is effectively not versioned (versioining is done on an item by item basis)
2014-05-18 09:45:00 +02:00
Boaz Leskes 682acfcacd DeleteRequest.version was not initialized to `Versions.MATCH_ANY` 2014-05-18 09:45:00 +02:00
Simon Willnauer c7db8843b3 [TEST] Stabelize BenchmarkIntegrationTest#testAbortBenchmark 2014-05-17 23:33:49 +02:00
Alex Ksikes db991dc3a4 More Like This Query: Added searching for multiple items.
The syntax to specify one or more items is the same as for the Multi GET API.
If only one document is specified, the results returned are the same as when
using the More Like This API.

Relates #4075 Closes #5857
2014-05-17 19:14:56 +02:00
Igor Motov a3581959d7 [TESTS] Ignore SnapshotMissingException in snapshotWithStuckNodeTest
The retry mechanism in the transport layer might cause the delete snapshot request to be executed twice if the cluster master is closed while the request is executed. First time delete snapshot request is getting successfully executed on the old master and then it is retried on the newly elected master. When the new master tries to delete the snapshot - the snapshot no longer exists (since it was successfully deleted by the old master) and SnapshotMissingException is returned.
2014-05-17 11:18:11 -04:00
Igor Motov c20713530d Switch to shared thread pool for all snapshot repositories
Closes #6181
2014-05-16 19:03:15 -04:00
Igor Motov 7f5befd95e Add Partial snapshot state
Currently even if some shards of the snapshot are not snapshotted successfully, the snapshot is still marked as "SUCCESS". Users may miss the fact the there are shard failures present in the snapshot and think that snapshot was completed. This change adds a new snapshot state "PARTIAL" that provides a quick indication that the snapshot was only partially successful.

Closes #5792
2014-05-16 18:26:56 -04:00
Boaz Leskes 9f10547f4b Allow 0 as a valid external version
Until now all version types have officially required the version to be a positive long number. Despite of this has being documented, ES versions <=1.0 did not enforce it when using the `external` version type. As a result people have succesfully indexed documents with 0 as a version. In 1.1. we introduced validation checks on incoming version values and causing indexing request to fail if the version was set to 0. While this is strictly speaking OK, we effectively have a situation where data already indexed does not match the version invariant.

To be lenient and adhere to spirit of our data backward compatibility policy, we have decided to allow 0 as a valid external version type. This is somewhat complicated as 0 is also the internal value of `MATCH_ANY`, which indicates requests should succeed regardles off the current doc version. To keep things simple, this commit changes the internal value of `MATCH_ANY` to `-3` for all version types.

Since we're doing this in a minor release (and because versions are stored in the transaction log), the default `internal` version type still accepts 0 as a `MATCH_ANY` value. This is not a problem for other version types as `MATCH_ANY` doesn't make sense in that context.

Closes #5662
2014-05-16 22:10:16 +02:00
Simon Willnauer bf22df7fd0 Remove SoftReferences from StreamInput/StreamOutput
We try to reuse character arrays and UTF8 writers with softreferences.
SoftReferences have negative impact on GC and should be avoided in
general. Yet in this case it can simply replaced with a per-stream
Bytes/CharsRef that is thread local and has the same lifetime as the
stream.
2014-05-16 20:58:42 +02:00
Simon Willnauer 11a3201a09 Use EnumSet rather than static mutable arrays
ClusterBlockLevel uses arrays but should use EnumSets instead
2014-05-16 20:54:01 +02:00
Simon Willnauer d65e9e9bea Add some finals where appropriate 2014-05-16 20:54:01 +02:00
Simon Willnauer c561900512 Use UTF-8 as string encoding 2014-05-16 20:54:01 +02:00
David Pilato 0dbc83e7b0 [TEST] Do not filter gz files 2014-05-16 15:23:09 +02:00
Simon Willnauer d806b567e4 Remove dead code 2014-05-16 15:08:56 +02:00
Simon Willnauer eef505ed51 RecoveryID should not be a per JVM but per Node
Today the RecovyerID is taken from a static atomic long which
is essentially a per JVM ID. We run the tests within the same
JVM and that means we don't really simulate what happens in
production environments. Instead we should use a per node generated
ID.
2014-05-16 14:59:32 +02:00
Simon Willnauer 9a9cc0b8e4 Add simple example to XContentParser how to obtain an instance of it 2014-05-16 14:55:22 +02:00
David Pilato bd871f96c2 Check that a plugin is Lucene compatible with the current running node using `lucene` property in `es-plugin.properties` file.
* If plugin does not provide `lucene` property, we consider that the plugin is compatible.
* If plugin provides `lucene` property, we try to load related Enum org.apache.lucene.util.Version. If this fails, it means that the node is too "old" comparing to the Lucene version the plugin was built for.
* We compare then two first digits of current node lucene version against two first digits of plugin Lucene version. If not equal, it means that the plugin is too "old" for the current node.

Plugin developers who wants to launch plugin check only have to add a `lucene` property in `es-plugin.properties` file. If you are using maven to build your plugin, you can do it like this:

In `pom.xml`:

```xml
    <properties>
        <lucene.version>4.6.0</lucene.version>
    </properties>

    <build>
        <resources>
            <resource>
                <directory>src/main/resources</directory>
                <filtering>true</filtering>
            </resource>
        </resources>
    </build>
```

In `es-plugin.properties`, add:

```properties
lucene=${lucene.version}
```

BTW, if you don't already have it, you can add the plugin version as well:

```properties
version=${project.version}
```

You can disable that check using `plugins.check_lucene: false`.
2014-05-16 13:41:20 +02:00
Simon Willnauer 094908ac7f Randomize CMS settings in index template
This commit adds randomization for:
 * `index.merge.scheduler.max_thread_count`
 * `index.merge.scheduler.max_merge_count`

This commit also moves to use
EsExecutors#boundedNumberOfProcessors(Settings) to default
configure the default `max_thread_count` for better reproducibility

Closes #6194
2014-05-15 23:16:45 +02:00
javanna 7548b2edb7 Unified MetaData#concreteIndices methods into a single method that accepts indices (or aliases) and indices options
Added new internal flag to IndicesOptions that tells whether aliases can be resolved to multiple indices or not.

Cut over to new metaData#concreteIndices(IndicesOptions, String...) for all the api previously using MetaData#concreteIndices(String[], IndicesOptions) and removed old method, deprecation is not needed as it doesn't break client code.

Introduced constants for flags in IndicesOptions for more readability

Renamed MetaData#concreteIndex to concreteSingleIndex, left method as a shortcut although it calls the common concreteIndices that accepts IndicesOptions and multipleIndices
2014-05-15 20:53:05 +02:00
Boaz Leskes 1f28cd0ba8 When sending shard start/failed message due to a cluster state change, use the master indicated in the new state rather than current
This commit also adds extra protection in other cases against a master node being de-elected and thus being null.

Closes #6189
2014-05-15 18:42:26 +02:00
Boaz Leskes 84593f0d7c Added meta data and routing version to cluster state's pretty print 2014-05-15 15:55:11 +02:00
Boaz Leskes dc07ece790 Added some debug logs to the recovery process 2014-05-15 15:37:30 +02:00
Simon Willnauer e47de1f809 [TEST] Randomize number of available processors
We configure the threadpools according to the number of processors which is
different on every machine. Yet, we had some test failures related to this
and #6174 that only happened reproducibly on a node with 1 available processor.
This commit does:
  * sometimes randomize the number of available processors
  * if we don't randomize we should set the actual number of available processors
    in the settings on the test node
  * always print out the num of processors when a test fails to make sure we can
    reproduce the thread pool settings with the reproduce info line

Closes #6176
2014-05-15 12:24:53 +02:00
Simon Willnauer 53bfe44e19 Fix debug logging message for put template action 2014-05-15 11:13:30 +02:00
Andrew Selden fc0bed5236 Fix bug for BENCH thread pool size == 1
On small hardware, the BENCH thread pool can be set to size 1. This is
problematic as it means that while a benchmark is active, there are no
threads available to service administrative tasks such as listing and
aborting. This change fixes that by executing list and abort operations
on the GENERIC thread pool.

Closes #6174
2014-05-14 10:40:39 -07:00
Simon Willnauer 2c1c5c163f [TEST] Ensure all benchmarks are aborted on failure and latches are counted down 2014-05-14 16:40:34 +02:00
Simon Willnauer fc2ab0909e [TEST] Remove busy waiting from BenchmarkIntegrationTest
I think Chuck Norris is required to fix this at this point until we have an API
that can for instance pause a Benchmark. We basically wait for a query to be executed
and that query syncs on a latch with the test in a script :)

This commit also adds some more testing for benchmarks that run into errors.
2014-05-14 14:40:27 +02:00
David Pilato e0a95d9c19 Allow sorting on nested sub generated field
When you have a nested document and want to sort on its fields, it's perfectly doable on regular fields but not on "generated" sub fields.

Here is a SENSE recreation:

```
DELETE /tmp

PUT /tmp

PUT /tmp/doc/_mapping
{
  "properties": {
    "flat": {
      "type": "string",
      "index": "analyzed",
      "fields": {
        "sub": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    },
    "nested": {
      "type": "nested",
      "properties": {
        "foo": {
          "type": "string",
          "index": "analyzed",
          "fields": {
            "sub": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

PUT /tmp/doc/1
{
  "flat":"bar",
  "nested":{
    "foo":"bar"
  }
}
```

When sorting on `flat.sub` sub field, everything is fine:

```
GET /tmp/doc/_search
{
  "sort": [
    {
      "flat.sub": {
        "order": "desc"
      }
    }
  ]
}

```

When sorting on `nested` field, everything is fine:

```
GET /tmp/doc/_search
{
  "sort": [
    {
      "nested.foo": {
        "order": "desc"
      }
    }
  ]
}

```

But when sorting on `nested.sub` field, sorting is incorrect:

```
GET /tmp/doc/_search
{
  "sort": [
    {
      "nested.foo.sub": {
        "order": "desc"
      }
    }
  ]
}

Closes #6150.
2014-05-14 14:13:44 +02:00
Britta Weber 08e57890f8 use shard_min_doc_count also in TermsAggregation
This was discussed in issue #6041 and #5998 .

closes #6143
2014-05-14 14:10:04 +02:00
Britta Weber d4a0eb818e refactor: make requiredSize, shardSize, minDocCount and shardMinDocCount a single parameter
Every class using these parameters has their own member where these four
are stored. This clutters the code. Because they mostly needed together
it might make sense to group them.
2014-05-14 14:10:02 +02:00