33 Commits

Author SHA1 Message Date
Boaz Leskes
cbcc4ec616 Logging: add shard id to logging in InternalIndexService.removeShard 2014-10-13 09:33:46 +02:00
Boaz Leskes
3d3c2cd760 add verbose logging during index close 2014-10-09 11:37:52 +02:00
Michael McCandless
637c6d1606 Tests: always run Lucene's CheckIndex when shards are closed in tests and fail the test if corruption is detected
Today we only run 10% of the time, and the test doesn't fail when
corruption is detected.

I think it's better to always run and fail the test, so we can catch
any possible resiliency bugs in Lucene/Elasticsearch causing corruption.

For known tests that create corrupted indices, it's easy to set
MockFSDirectoryService.CHECK_INDEX_ON_CLOSE to false...

Closes #7730
2014-09-25 16:50:48 -04:00
Martijn van Groningen
94eed4ef56 Introduced FixedBitSetFilterCache that guarantees to produce a FixedBitSet and does evict based on size or time.
Only when segments are merged away due to merging then entries in this cache are cleaned up.

Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this.
Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache.

Closes #7037
Closes #7031
2014-08-27 21:28:36 +02:00
Shay Banon
418ce50ec4 Query Cache: Support shard level query response caching
The query cache allow to cache the (binary serialized) response of the shard level query phase execution based on the actual request as the key. The cache is fully coherent with the semantics of NRT, with a refresh (that actually ended up refreshing) causing previous cached entries on the relevant shard to be invalidated and eventually evicted.

This change enables query caching as an opt in index level setting, called `index.cache.query.enable` and defaults to `false`. The setting can be changed dynamically on an index. The cache is only enabled for search requests with search_type count.

The indices query cache is a node level query cache. The `indices.cache.query.size` controls what is the size (bytes wise) the cache will take, and defaults to `1%` of the heap. Note, this cache is very effective with small values in it already. There is also the advanced option to set `indices.cache.query.expire` that allow to control after a certain time of inaccessibility the cache will be evicted.

Note, the request takes the search "body" as is (bytes), and uses it as the key. This means same JSON but with different key order will constitute different cache entries.

This change includes basic stats (shard level, index/indices level, and node level) for the query cache, showing how much is used and eviction rates.

While this is a good first step, and the goal is to get it in, there are a few things that would be great additions to this work, but they can be done as additional pull requests:

- More stats, specifically cache hit and cache miss, per shard.
- Request level flag, defaults to "not set" (inheriting what the setting is).
- Allowing to change the cache size using the cluster update settings API
- Consider enabling the cache to query phase also when asking hits are involved, note, this will only include the "top docs", not the actual hits.
- See if there is a performant manner to solve the "out of order" of keys in the JSON case.
- Maybe introduce a filter element, that is outside of the request, that is checked, and if it matches all docs in a shard, will not be used as part of the key. This will help with time based indices and moving windows for shards that fall "inside" the window to be more effective caching wise.
- Add a more infra level support in search context that allows for any element to mark the search as non deterministic (on top of the support for "now"), and use it to not cache search responses.

closes #7161
2014-08-05 17:45:42 +02:00
Shay Banon
78e39882ee Allow to change concurrent merge scheduling setting dynamically
Allow to change the concurrent merge scheduler settings dynamically using the update settings API
closes #6098
2014-05-12 07:33:31 -07:00
Kevin Wang
ceed22fe00 Add suggest stats
closes #4032
2014-03-28 11:13:54 +01:00
Igor Motov
3ffd0a1dfa Remove deprecated gateways
Closes #5422
2014-03-26 18:10:51 -04:00
Martijn van Groningen
0e780b7e99 Migrated p/c queries from id cache to field data. Changed p/c queries to use paging data structures (BytesRefHash, BigFloatArray, BigIntArray) instead of hppc maps / sets.
Also removed the id cache.

Closes #4930
2014-02-26 19:46:05 +01:00
Simon Willnauer
a1efa1f7aa Remove ElasticsearchInterruptedException and handle interrupt state
correctly.

InterruptedExceptions should be handled by either rethrowing or
restoring the interrupt state (i.e. calling
`Thread.currentThread().interrupt()`). This is important since the
caller of the is method or subequent method calls might also be
interested in this exception. If we ignore the interrupt state the
caller might be left unaware of the exception and blocks again on
a subsequent method.

Closes #4712
2014-01-13 22:17:11 +01:00
Simon Willnauer
10ec2e948a Fix ASL Header in source files to reflect s/ElasticSearch/Elasticsearch
This commit also removes the license to Shay Banon in favor of soley
Elasticsearch. Thanks Shay for this awesome product you took it far!

Closes #4636
2014-01-07 11:22:01 +01:00
Simon Willnauer
fa16969360 Cleanup comments and class names s/ElasticSearch/Elasticsearch
* Clean up s/ElasticSearch/Elasticsearch on docs/*
 * Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml
 * Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile

Closes #4634
2014-01-07 11:21:51 +01:00
Luca Cavanna
173a91bb46 Added new IndicesLifecycle.Listener method that allows to listen for any IndexShardState internal change.
Closes #4413
2013-12-16 15:00:15 +01:00
Igor Motov
510397aecd Initial implementation of Snapshot/Restore API
Closes #3826
2013-11-10 18:26:56 -05:00
Boaz Leskes
1d7e20b712 Add indexUUID to mapping-updated and mapping-refresh events and make sure they are applied to an index with same UUID.
This can go wrong if indices with the same name are repeatably created and deleted.

UUIDs can not be null anymore. If UUID is not available `_na_` will be used as a value.

Also - some minor clean up in ShardStateAction where shard started events could be added twice to the to-be-applied list where the second instance will be ignored.

Closes #3783
2013-09-27 14:42:12 +02:00
Boaz Leskes
1644444a4f Introduced an index UUID which is added to the index's settings upon creation. Used that UUID to verify old and delayed shard started/failed events are not applied to newer indexes with the same name.
Also, exceptions while processing batched events do not stop the rest of the events from being processed.

Closes #3778
2013-09-25 19:26:05 +02:00
Simon Willnauer
fddb7420ae Add support for Lucene's MockDirectoryWrapper
MockDirectoryWrapper adds asserting logic to the low level directory
implementation that helps to track and catch resource leaks like
unclosed index inputs caused by dangling IndexReader or IndexSearcher
instances. It prevents double writes to files and allows low level
random exceptions to be thrown for testing index consistency etc.

Closes #3654
2013-09-11 22:47:34 +02:00
Boaz Leskes
a96ecea653 Multi term vector request
--------------------------

This feature allows to retrieve [term vectors](https://github.com/elasticsearch/elasticsearch/issues/3114) for a list of documents. The json request has exactly the same [format](https://github.com/elasticsearch/elasticsearch/issues/3484) as the ```_termvectors``` endpoint

It use it, call

```
curl -XGET 'http://localhost:9200/index/type/_mtermvectors' -d '{
    "fields": [
        "field1",
        "field2",
        ...
    ],
    "ids": [
        "docId1",
        "docId2",
        ...
    ],
    "offsets": false|true,
    "payloads": false|true,
    "positions": false|true,
    "term_statistics": false|true,
    "field_statistics": false|true
}'

```

The return format is an array, each entry of which conatins the term vector response for one document:

```
{
   "docs": [
      {
         "_index": "index",
         "_type": "type",
         "_id": "docId1",
         "_version": 1,
         "exists": true,
         "term_vectors": {
         	...
         }
      },
      {
         "_index": "index",
         "_type": "type",
         "_id": "docId2",
         "_version": 1,
         "exists": true,
         "term_vectors": {
         ...
         }
      }
   ]
}
```

Note that, like term vectors, the mult term vectors request will silenty skip over documents that have no term vectors stored in the index and will simply return an empty response in this case.

Closes #3536
2013-08-26 09:25:21 +02:00
Simon Willnauer
7e1d8a6ca3 Raise default DeleteIndex Timeout
Currently the timeout for an delete index operation is set to 10 seconds.
Yet, if a full flush is running while we delete and index this can
easily exceed 10 seconds. The timeout is not dramatic ie. the index
will be deleted eventually but the client request is not acked which
can cause confusion. We should raise it to prevent unnecessary confusion
especially in client tests where this can happen if the machine is pretty busy.

The new timeout is set to 60 seconds.

Closes #3498
2013-08-13 17:28:19 +02:00
Simon Willnauer
82d3693a91 Catch Throwable rather than Exception if latches are present. 2013-08-12 17:46:44 +02:00
Martijn van Groningen
c222ce28fc Redesigned the percolator engine to execute in a distribute manner.
With this design the percolate queries will be stored in a special `_percolator` type with its own mapping in the same index where the actual data is or in a different index (dedicated percolation index, which might require different sharding behavior compared to the index that holds actual data and being search on). This approach allows percolate requests to scale to the number of primary shards an index has been configured with and effectively distributes the percolate execution.

This commit doesn't add new percolate features other than scaling. The response remains similar, with exception that a header similar to the search api has been added to the percolate response.

Closes #3173
2013-07-18 16:52:42 +02:00
Shay Banon
09a6907cca optimize applyDeletes event
- reuse set
- don't copy over again the shard ids immutable set
2013-07-05 17:30:09 -07:00
Shay Banon
15d7ae5983 FieldData Stats: Add field data stats to indices stats API
closes #2870
2013-04-07 18:30:24 -07:00
Shay Banon
84670212a6 Filter / Id Cache Stats: Add to Indices Stats API, revise node stats API
closes #2862
2013-04-05 20:02:32 +02:00
Shay Banon
3e264f6b95 cleanup deletion of content in shards
we are very conservative on when we delete data, remove the actual options of deleting data that are not used
2013-03-04 20:41:19 -08:00
Shay Banon
cfd8bddde4 Remove JMX connector creation flags, and JMX attributes
closes #2728
2013-03-04 16:12:18 -08:00
Shay Banon
a39ca58de9 add field data service to index level services 2013-01-22 16:16:30 +01:00
Shay Banon
90371beedc Store Throttling (node level and/or index level) with options on merge or all, closes #2041.
Allow to configure store throttling (only applied on file system based storage), which allows to control the maximum bytes per sec written to the file system. It can be configured to only apply while merging, or on all output operations. The setting can eb set on the node level (in which case the throttling is done across all shards allocated on the node), or index level, in which case it only applied to that index.

The node level settings are indices.store.throttle.type to set the type, with values of none, merge and all (defaults to none). And, also, indices.store.throttle.max_bytes_per_sec (defaults to 0), which can be set to something like 1mb.

The index level settings is index.store.throttle.type for the type, with values of node, none, merge, and all. Defaults to node which will use the "shared" throttling on the node level. And, index.store.throttle.max_bytes_per_sec (defaults to 0).
2012-06-21 21:21:49 +02:00
Shay Banon
783649adc7 not only noclass, throwable will do well here.... 2012-04-22 12:25:10 +03:00
Shay Banon
c08b968246 rename the cached thread pool to generic (from cached), since really, cached is meaningless, and its actually a generic thread pool we use for different operations 2012-03-09 20:32:33 +02:00
Shay Banon
44a6040293 Jmx: Only register JMX beans when jmx.create_connector is set to true, or explicitly set by setting jmx.export to true, closes #1666. 2012-02-05 18:52:56 +02:00
Shay Banon
6a71eab51f finalize structure, tests pass 2011-12-06 02:43:17 +02:00
Shay Banon
a8fd2d48b8 first cleanup phase, move to single src 2011-12-06 00:59:23 +02:00