Common changes:
Lastname, Firstname -> Firstname Lastname
Name I -> Name
Title-style capitalization
Removed L-to-R characters ( <e200> )
Removed duplicates
Alphabetical order
Named wildcards were not always properly replaced with proper values by PathTrie.
Delete index (curl -XDELETE localhost:9200/*) worked anyway as the named wildcard is the last path element (and even if {index} didn't get replaced with '*', the empty string would have mapped to all indices anyway). When the named wildcard wasn't the last path element (e.g. curl -XPOST localhost:29200/*/_close), the variable didn't get replaced with the current '*' value, but with the empty string, which leads to an error as empty index is not allowed by open/close index.
Closes#4564
The new internal get index settings api is more efficient when it comes to sending the index settings from the master to the client via the
Also the get index settings support now all the indices options.
Closes#4620
The FVH was throwing away some boosts on queries stopping a number of
ways to boost phrase matches to the top of the list of fragments from
working.
The plain highlighter also doesn't work for this but that is because it
doesn't support the concept of the same term having a different score at
different positions.
Also update documentation claiming that FHV is nicer for weighing terms
found by query combinations.
Closes#4351
Important: This breaks backwards compatibility with 0.90
* Removed endpoints: /_cluster/nodes, /_cluster/nodes/nodeId1,nodeId2
* Disallow usage of parameters, but make required metrics part of URI
* Changed NodesInfoRequest to return everything by default
* Fixed NPE in NodesInfoResponse
Closes#4055
Instead of specifying what kind of data should be filtered, this commit
streamlines the API to actually specify, what kind of data should be displayed.
This makes its behaviour similar to the other requests, like NodeIndicesStats.
A small feature has been added as well: If you specify an index to select on, not
only the metadata, but also the routing tables are filtered by index in order
to prevent too big cluster states to be returned.
Also the CAT apis have been changed to only return the wanted data in order to keep
network traffic as small as needed.
Tests for the cluster state API filtering have been added as well.
Note: This change breaks backwards compatibility with 0.90!
Closes#4065
Added a long-based representation of GeoHashes to GeoHashUtils for fast evaluation in aggregations.
The new BucketUtils provides a common heuristic for determining the number of results to obtain from each shard in "top N" type requests.
* Clean up s/ElasticSearch/Elasticsearch on docs/*
* Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml
* Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile
Closes#4634
Fixes#4451
Date fields without date (HH:mm:ss, for example) are parsed as time on Jan 1, 1970 UTC. However, before this change partial dates without year (MMM dd HH:mm:ss, for example) were parsed as as days of they year 2000. This change makes all partial dates to be treated based on year 1970. This is breaking change - before this change "Dec 15, 10:00:00" in most cases was parsed (and indexed) as "2000-12-15T10:00:00Z". After this change, it will be consistently parsed and indexed as "1970-12-15T10:00:00Z"
Refactor cache recycling so that it only caches large arrays (pages) that can
later be used to build more complex data-structures such as hash tables.
- QueueRecycler now takes a limit like other non-trivial recyclers.
- New PageCacheRecycler (inspired of CacheRecycler) has the ability to cache
byte[], int[], long[], double[] or Object[] arrays using a fixed amount of
memory (either globally or per-thread depending on the Recycler impl, eg.
queue is global while thread_local is per-thread).
- Paged arrays in o.e.common.util can now optionally take a PageCacheRecycler
to reuse existing pages.
- All aggregators' data-structures now use PageCacheRecycler:
- for all arrays (counts, mins, maxes, ...)
- LongHash can now take a PageCacheRecycler
- there is a new BytesRefHash (inspired from Lucene but quite different,
still; for instance it cheats on BytesRef comparisons by using Unsafe)
that also takes a PageCacheRecycler
Close#4557
Adding a small value to the threshold prevents weight deltas that are
very very close to the threshold to not trigger relocations. These
deltas can be rounding errors that lead to unnecessary relocations. In
practice this might only happen under very rare circumstances.
In general it's a good idea for the shard allocator to be a bit
more conversavtive in terms of rebalancing since in general relocation
costs are pretty high.
Closes#4630
Norms can be eagerly loaded on a per-field basis by setting norms.loading to
`eager` instead of the default `lazy`:
```
"my_string_field" : {
"type": "string",
"norms": {
"loading": "eager"
}
}
```
In case this behavior should be applied to all fields, it is possible to change
the default value by setting `index.norms.loading` to `eager`.
Close#4079
First, this breaks backwards compatibility!
* Removed /_cluster/nodes/stats endpoint
* Excpect the stats types not as parameters, but as part of the URL
* Returning all indices stats by default, returning all nodes stats by default
* Supporting groups & types in nodes stats now as well
* Updated documentation & tests accordingly
* Allow level parameter for "shards" and "indices" (cluster does not make sense here)
Closes#4057
Note: This breaks backward compatibility
* Removed clear/all parameters, now all stats are returned by default
* Made the metrics part of the URL
* Removed a lot of handlers
* Added shards/indices/cluster level paremeter to change response serialization
* Returning translog statistics in IndicesStats
* Added TranslogStats class
* Added IndexShard.translogStats() method to get the stats from concrete implementation
* Updated documentation
Closes#4054
The reason to not start packages on installation is to allow to configure
them before starting up (setting heap, cluster.name etc)
Also the documentation was updated in order to show, which statements need
to be executed.
In addition, these statements are also printed out when the package is
installed, depending on whether chkconfig, system or update-rc.d is used.
Closes#3722
When testing plugin manager with real downloads, it could happen that the test run forever. Fortunately, test suite will be interrupted after 20 minutes, but it could be useful not to fail the whole test suite but only warn in that case.
By default, plugin manager still wait indefinitely but it can be modified using new `--timeout` option:
```sh
bin/plugin --install elasticsearch/kibana --timeout 30s
bin/plugin --install elasticsearch/kibana --timeout 1h
```
Closes#4603.
Closes#4600.
Today during restart scenarios it is possible that we recover from a node that
has already been upgraded to version N+1. The node that we relocate to is
on version N and might not be able to read the index format from the node
we relocate from. This causes `IndexFormatToNewException` during
recovery but only after recovery has finished which can cause large
load spikes during the upgrade period.
Closes#4588
The node we need to lookup for attribute colelction might not be part
of the `DiscoveryNodes` anymore due to node failure or shutdown. This
commit adds a check and removes the shard from the iteration.
Closes#4589
This adds the field data circuit breaker, which is used to estimate
the amount of memory required to load field data before loading it. It
then raises a CircuitBreakingException if the limit is exceeded.
It is configured with two parameters:
`indices.fielddata.cache.breaker.limit` - the maximum number of bytes
of field data to be loaded before circuit breaking. Defaults to
`indices.fielddata.cache.size` if set, unbounded otherwise.
`indices.fielddata.cache.breaker.overhead` - a contast for all field
data estimations to be multiplied with before aggregation. Defaults to
1.03.
Both settings can be configured dynamically using the cluster update
settings API.
Today we try to detect if we need to generate the mapping or not in
the all mapper. This is error prone since it misses conditions if not
explicitly added. We should rather similate the generation instead.
This commit also adds a random test to check if the settings
of the all field mapper are correctly applied.
Closes#4579Closes#4581
today if a specific feature is disabled for term vectors with something
like 'store_term_vector_positions = false' term vectors might be disabeled
alltogether even if 'store_term_vectors=true' in the mapping. This depends on the
order of the values in the mapping since the more specific one might override
the less specific on.
Closes#4582
Currently there are two get aliases apis that both have the same functionality, but have a different response structure. The reason for having 2 apis is historic.
The GET _alias api was added in 0.90.x and is more efficient since it only sends the needed alias data from the cluster state between the master node and the node that received the request. In the GET _aliases api the complete cluster state is send to the node that received the request and then the right information is filtered out and send back to the client.
The GET _aliases api should be removed in favour for the alias api
Closes to #4539
* `ignore_unavailable` - Controls whether to ignore if any specified indices are unavailable, this includes indices that don't exist or closed indices. Either `true` or `false` can be specified.
* `allow_no_indices` - Controls whether to fail if a wildcard indices expressions results into no concrete indices. Either `true` or `false` can be specified. For example if the wildcard expression `foo*` is specified and no indices are available that start with `foo` then depending on this setting the request will fail. This setting is also applicable when `_all`, `*` or no index has been specified.
* `expand_wildcards` - Controls to what kind of concrete indices wildcard indices expression expand to. If `open` is specified then the wildcard expression if expanded to only open indices and if `closed` is specified then the wildcard expression if expanded only to closed indices. Also both values (`open,closed`) can be specified to expand to all indices.
Closes to #4436
Results in less data being sent over the wire, as the Cat API does not
need to have the whole cluster state.
Also added matchers for hasKey() for immutable open map (I think we should
add more of those to have map style assertions).
Closes#4455
This change make single shard requests fail when no routing is specified and routing has been configured to be required in the mapping. Thi
Closes#4506
The following APIs now accept the query in a top level `query` field like:
* delete_by_query
* validate_query
* count
These APIs used to accept the query directly in the request body which was inconsistent with the search and explain APIs. For this reason t
Closes#4074
Test can be run with `-Dtests.assertion.disabled=org.elasticsearch`
to run the tests without assertions to make sure assertions
don't hide any assignements etc. that introduce bugs in production.
We are mocking out some functionality to add assertions etc. or
randomize store types. We should randomly run with our defaults to make
sure we don't hide any potential problems.
Currently we only test if readers are correctly released when exceptions
occur during reopen or flush. This commit adds a test that
randomly throws exceptions during the search execution ie. when Terms
are pulled or if a docs enum is created.
This commits add doc values support to geo point using the exact same approach
as for numeric data: geo points for a given document are stored uncompressed
and sequentially in a single binary doc values field.
Close#4207
Until now, RangeAggregator was a PER_BUCKET aggregator, expecting to be always
collected with owningBUcketOrdinal == 0. However, since the number of buckets
it creates is known in advance, it can be changed to a MULTI_BUCKETS aggregator
by just multiplying the bucket ordinal by the number of ranges.
This makes aggregations that have ranges as sub aggregations of PER_BUCKET
aggregators more efficient.
Close#4550
Although SORTED_SET doc values make things like terms aggregations very fast
thanks to the use of ordinals, ordinals are usually not that useful on numeric
data. We are more interested in the values themselves in order to be able to
compute sums, averages, etc. on these values. However, SORTED_SET is quite slow
at accessing values, so BINARY doc values are better suited at storing numeric
data.
floats and doubles are encoded without compression with little-endian byte order
(so that it may be optimizable through sun.misc.Unsafe in the future given that
most computers nowadays use the little-endian byte order) and byte, short, int,
and long are encoded using vLong encoding: they first encode the minimum value
using zig-zag encoding (so that negative values become positive) and then deltas
between successive values.
Close#3993
This prevents missing very short timeouts which fire before the calling thread had the chance to add the task to the queue and are therefore ignored. This is mostly of importance for testing where we explicitly want tasks to timeout and set it to a very low value.
- also, in geo distance, because it's based on range agg and that by default the order of the geo point per doc is unknown, always wrap it in a dedicated field data source which sorts the values if needed. But most of the times, a doc will be associated with a single point and therefore most of this wrapping is redundant and adds perf. cost for nothing.
- the idea here is for every request that "hits" a field data agg, we'll first iterate over the searchable segments and load their field data and compute the cross-segment info out of them. This info will be placed in the field context with which the value sources are created.
- we currently have some of this info on the IndexFieldData, but problem with getting it from there is that we may easily end up getting wrong info that originate in unsearchable segments.
The `geo_shape filter and query` option in geo_shape filter and query has been replaced with the `path` option, which allows these filter and query to fetch shapes from within objects as well.
Closes#4486
The fixes introduced in #4235 and #4411 do not take into account, that a
template JSON in the config/ directory includes a template name, as opposed
when calling the Put Template API.
This PR allows to put both formats (either specifying a template name or not)
into files. However you template name/id may not be one of the template
element names like "template", "settings", "order" or "mapping".
Closes#4511
When a search on a shard to a remove node fails, and then replica exists on the local node, then the execution of the search is done on the network thread. This is problematic since we need to execute it on the actual search thread pool, but can also explain #4519, where the get happens on the network thread and it waits to send the get request till the network thread we use is freed (deadlock...)
fixes#4526
note, re-enable the geo shape fetch test, this fix should solve it as well
Allow to have a new index level setting index.codec.bloom.load (default to true), that can control if the boom filters will be loaded or not. This is an updateable setting, that can be updated on a live index using the update settings API.
Note though, when this setting is updated, a fresh Lucene index will be reopened, causing associate caches to be dropped potentially.
closes#4525
Note, this change also disables the returning lucene ram usage stats, due to a bug in Lucene, relates to #4512
The shards in the set are mutated after they are added to the
set such that the hashcode doesn't fit anymore. For this reason
this used an identity hashset before but the downside of this is
that the iteration order is not deterministic. We can just use a list
since shard removal is a very rare action and the size of the list is
very small such that iteration is fast.
The memory used for the Lucene index (term dict, bloom filter, ...) can now be reported per segment using the segments API, and on the segments flag on node/indices stats
closes#4512
a previous change introduces an identity hashset that has non-deterministic
iteration order which kill the reproducibility of our unittests if they fail.
This patch adds back deterministic allocations.
Make sure to evict an existing node with the same transport address as a new node that joins. This can happen for example when there is a bug in a cluster state event handler, which causes the "old" node to not be evicted, or a load on the master node that will take time for the "old" node leaving to be processed.
closes#4503
Currently we trying to find a replica for a primary that is allocated by
running through all shards in the cluster while RoutingNodes already has
a datastructure keyed by shard ID for this. We should lookup this
directly rather than using linear probing. This improves shard allocation performance
by 5x.
This is an extreme case, exposed by a bug we had in our allocation in local gateway, causing a cluster state that doesn't include a node in the nodes list, but still has the shard in the routing table pointing at the non existent node. Then, when a node on the same box comes back, it will cause the local shard data to be deleted because it thinks its fully allocated on other nodes.
fixes#4502
The REST layer can now be tested through tests that are shared between all the elasticsearch official clients.
The tests are based on REST specification that can be found on the elasticsearch-rest-api-spec project and consist of YAML files that describe the operations to be executed and the obtained results that need to be tested.
REST tests can be executed through the ElasticsearchRestTests class, which relies on the rest-spec git submodule that contains the rest spec and tests pulled from the elasticsearch-rest-spec-api project. The rest-spec submodule gets automatically initialized and updated through maven (generate-test-resources phase).
The REST runner and the needed classes are distributed within the test artifact.
The following are the options supported by the REST tests runner:
- tests.rest[true|false|host:port]: determines whether the REST tests need to be run and if so whether to rely on an external cluster (providing host and port) or fire a test cluster (default)
- tests.rest.suite: comma separated paths of the test suites to be run (by default loaded from /rest-spec/test classpath). it is possible to run only a subset of the tests providing a sub-folder or even a single yaml file (the default /rest-spec/test prefix is optional when files are loaded from classpath) e.g. -Dtests.rest.suite=index,get,create/10_with_id
- tests.rest.spec: REST spec path (default /rest-spec/api from classpath)
- tests.iters: runs multiple iterations
- tests.seed: seed to base the random behaviours on
- tests.appendseed[true|false]: enables adding the seed to each test section's description (default false)
- tests.cluster_seed: seed used to create the test cluster (if enabled)
Closes#4469
We support three different settings in templates
* "settings" : { "index" : { "number_of_shards" : 12 } }
* "settings" : { "index.number_of_shards" : 12 }
* "settings" : { "number_of_shards" : 12 }
The latter one was not supported by the fix in #4235
This commit fixes this issue and uses randomized testing to test any of the three cases above when running integration tests.
Closes#4411
When we allocate unassigned shards we can terminate early for some
shards like if we already tried to allocate a replica we don't need
to try the same replica if the first one got rejected. We also
can check if certain nodes can't allocate any primaries or shrads
at all and take those nodes out of the picture for the current round
since it will not change in the current round.
This commit allows to trade precision for memory when storing geo points.
This new field data impl accepts a `precision` parameter that controls the
maximum expected error for storing coordinates. This option can be updated on
a live index with the PUT mapping API.
Default precision is 1cm, which requires 8 bytes per geo-point (50% memory
saving compared to using 2 doubles).
Close#4386
Instead of using the '-f' parameter to start elasticsearch in the
foreground, this is now the default modus.
In order to start elasticsearch in the background, the '-d' parameter
can be used.
Closes#4392
The `text` query was replaced by the `match` query and has been
deprecated for quite a while.
The `field` query should be replaced by a `query_string` query with
the `default_field` specified.
Fixes#4033
If the term suggester is used the results are merged depending on
the number of terms produced by the tokenizer / tokenfilter. If a
term suggester is executed across multiple indices that share the
same field but with different analysis chains we can't merge the
result anymore sicne tokens are our of order or have a different size.
This commit throws ESIllegalArgumentException if the number of entries
are not the same across all results.
Closes#3196
This commit changes field data configuration updates so that they are
immediately taken into account for loading new segments. The way it works
is that field data configuration is now cached separately from the field
data cache, meaning that it is now possible to clear the field data
configuration from IndexFieldDataService while the cache will stay around. On
the next time that Elasticsearch will reload field data configuration, it will
check if there is already a cache entry, and reuse it if it exists.
To disable field data loading, all that is required is to change the field
data format to "none" (supported by all field data types) using the update
mapping API. Elasticsearch will then refuse to load field data on any new
segment, but field data which has been loaded on the previous segments will
remain available. So you need to clear the field data cache in order to
reclaim memory (otherwise memory will be reclaimed slower, as segments get
merged).
Close#4430Close#4431
Currently we miss to reset the source shards status to ACTIVE if we cancel
a relocation. If the shard is RELOCATING we need to reset to state ACTIVE.
Closes#4457
Currently the RoutingNodes API allows modification of it's internal state outside of the class.
This commit improves the APIs of `RoutingNode` and `RoutingNode` to change internal state
only within the classes itself.
Closes#4458