when resolving empty settings values, their value should be removed, for example, when using ${env.ENV_VAR}, and ENV_VAR is not set, then the setting should be removed
This commit allows to set custom headers in HTTP responses (like
setting the WWW-Authenticate header for basic auth) by adding
RestRequest.addHeader() method.
Closes#2936Closes#2540
To get the history right: This is based on PR #2723
Update requests can now be put in the bulk api. All update request options are supported.
Example usage:
```
curl -XPOST 'localhost:9200/_bulk' --date-binary @bulk.json
```
Contents of bulk.json that contains two update request items:
```
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
{ "script" : "counter += param1", "lang" : "js", "params" : {"param1" : 1}, "upsert" : {"counter" : 1}}
```
The `doc`, `upsert` and all script related options are part of the payload. The `retry_on_conflict` option is part of the header.
Closes#2982
Before this change, the GetField#getValue() method was returning a list of values of a multivalued fields if the field values were obtained from source or if the field was stored and real-time get was used. If the field was stored but non-realtime get was used, GetField#getValue() was returning only the first element and the GetField#getValues() was returning a list of elements. This change makes behavior consistent. GetField#getValue() now always returns only the first value of the field and GetField#getValues() returns the entire list.
Typically, the main reason a reroute allocation command with allow_primary is enabled, is to force create an empty new shard because a shard (and its replicas) were lost. This can't be done today because the shard expects to have a valid index where its allocated, we need to clear its post allocation flag to make sure it is allowed to create a fresh index.
Lucene provides a set of statistics that depend on the codec / postingsformat
as well as on the index options used when the field is created / indexed.
If a certain stats value is not available lucene return `-1` instead of the
correct value. We need to ensure that those values are encoded correctly if
we try to write vLongs as well as when we aggregate those values.
Closes#3012
NgramTokenizer and NGramTokenFilter are broken with a version < 4.2
We should still support these filters but should prevent the StringIOOB
exceptions. Adding these fitlers for the FragmentBuilderHelper will
allow seamless highlighting on fields indexed with those tokenizers or
tokenfilters
Lucenes span queries are a different family than 'ordinary' queries
in lucene. Spans only work with other spans such that smart query
wrapping doesn't work with span queries at all ie. we can't wrap
in filtered query.
Closes#2994
Today an analysis chain with broken tokenfilters or tokenizers like
WordDelimiterFilter might produce somewhat broken term vectors that cause
`StringIndexOutOfBoundsExceptions` if FastVectorHighlighter is used
since the positions / offsets contract is violated and offsets of highlight
tokens are not increasing but decreasing even if their positions are increasing.
Yet, if we detect such a situation we can resort the tokens which might cause
somewhat odd highlights but doesn't fail hard with a StringIndexOOBException.
Closes#3006
All index meta data API's have urgent priority when it comes to cluster state updates. We'd like to remove indices asap to avoid things like unnecessary shards relocations
This Lucene Release introduced a new API on DocIdSetIterator that requires each
implementation to return a `cost` upperbound as a function of the iterated documents.
This API allows for several optimizations during query execution especially in
Conjunction and Disjunction Queries with min_should_match set.
Closes#2990
Allow to get the source directly using a specific REST endpoint without any additional content around it, the endpoint is `{index}/{type}/{id}/_source`.
Note, HEAD now also support the _source endpoint.
closes#2993, closes#2995
BytesRefOrdValComparator starts at 1 and ends at maxOrd - 1. Yet, numOrd is defined
as maxOrd - 1 excluding the 0 ord.
This causes wrong sort ords when the bottom of the queue is compared to the next
segment and the greatest term in the new segment is in-fact less than the current
queue bottom. If that is true we treat the values as equal and never include the right
value into the queue.
Closes#2991
This adds support for lucene span multi term queries. This lucene query
allows users to form complicated queries such as wildcards or prefix
queries embedded within span queries.
WeightFunction#weight to allow dedicated weight calculations per operation. In certain
circumstance it is more efficient / required to ignore certain factors in the weight
calculation to prevent for instance relocations if they are solely triggered by tie-breakers.
In particular the primary balance property should not be taken into account if the delta for
early termination is calculated since otherwise a relocation could be triggered solely by the
fact that two nodes have different amount of primaries allocated to them.
Closes#2984
A first implementation of adding matchers and helper methods to elasticsearch.
The following ones are supported
assertHitCount(searchResponse, 2);
// helper methods to easily access the first hits
assertFirstHit(searchResponse, hasId("foo")):
assertSecondHit(searchResponse, hasType("foo")):
assertThirdHit(searchResponse, hasIndex("foo")):
// methods to access all other hits
assertSearchHit(searchResponse, 5, hasId("10"));
// same as above, but maybe more readable
assertSearchHit(searchResponse.getHits().getAt(5), hasIndex("foo"));
I changed GeoFilterTests to show how it works.
Furthermore I inlined assertHighlight() from HighlighterSearchTests.
The ElasticsearchAssertions class can be used now as a centralized assertion class
in order have a centralized class for every developer to look at.
Custom settings are not always present in the `Settings` that are passed
to `NodeSettingsService.Listener#onRefreshSettings` such that using the defaults
will necessarily override the custom settings if set before.
Closes#2973
Fixed testX and testSingleNodeNoFlush by specifying mapping on index creation instead of using dynamic mapping. Dynamic mapping is updated on the cluster level asynchronously and if mapping changes are not applied to the cluster state before node is closed, these changes are not be available after node restart. While data added in the test is preserved, due to absence of mapping, the test still fails. This is a known issue that we are not planning to fix at the moment.
Currently realtime GET does not take source includes/excludes into account.
This patch adds support for the source field mapper includes/excludes
when getting an entry from the transaction log. Even though it introduces
a slight performance penalty, it now adheres to the defined configuration
instead of returning all source data when a realtime get is done.
The filter method of XContentMapValues actually filtered out nested
arrays/lists completely due to a bug in the filter method, which threw
away all data inside of such an array.
Closes#2944
This bug was a follow up problem, because of the filtering of nested arrays
in case source exclusion was configured.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.
Closes#2953#2952
use the latest Lucene version as specified in o.e.common.lucene.Lucene
and must be upgraded with each lucene release.
This commit adds an assert that fails once the actual lucene version
that is used is higher than the current releases version.
Lucene ships with a version constant that is mainly used to provide consistent behaviour across lucene release versions. Lucene's Analysis capabilities are commonly applied at index and search time such that the search-time behaviour should be identical to the index-time behaviour in most of the cases. Currently ElasticSearch always uses the latest version from Lucene which can break backwards compatibility with the index for users that rely on behaviour that changed in new Lucene version.
Users should always use the version the index was created with unless it's explicitly configured.
closes#2945
don't produce broken positions anymore and prevent certain highlighter bugs that fail with
StringArrayOutOfBoundsExceptions as in #2931
This commit breaks backwards compatibility in terms of highlighting when NGramTokenFilter is used.
The highlighter will highlight the entire terms as produced by the tokenizer instead of the individual
sub-gram. To do sub-gram highlighting, the ngram tokenizer should be used. This behavior was based on
broken NGramTokenFilter behavior which will be fixed in Lucene 4.4 but was ported in this commit
to elasticsearch 0.90. The broken behavior can still be used if a version < LUCENE_42 is used
in the token filter mapping.
Closes#2931
- rename to open_contexts from open, we might have other open stats in the future related to search (lucene index searchers?)
- add a test to verify it works
to the lucene version is that `discreteMultiValueHighlighting` does default to `true`. Yet
we set this anyway in the HighlightingPhase such that the classes are obsolet.
if a single highlight phrase or term was greater than the fragCharSize producing negative string offsets
The fixed BaseFragListBuilder was added as XSimpleFragListBuilder which triggers an assert once Elasticsearch
upgrades to Lucene 4.3
In order to handle exceptions correctly, when classes are not found, one
needs to handle ClassNotFoundException as well as NoClassDefFoundError
in order to be sure to have caught every possible case. We did not cater
for the latter in ImmutableSettings yet.
This fix is just executing the same logic for both exceptions instead of
simply bubbling up NoClassDefFoundError.
When specifying minimum_should_match in a multi_match query it was being applied
to the outer bool query instead of to each of the inner field-specific bool queries.
Closes#2918
If cluster settings are update the REST API returns the accepted values. For
example, updating the `cluster.routing.allocation.disable_allocation` via
cluster settings:
```curl -XPUT http://localhost:9200/_cluster/settings -d '{
"transient":{
"cluster.routing.allocation.disable_allocation":"true"
}
}'```
will respond:
```{
"persistent":{},
"transient":{
"cluster.routing.allocation.disable_allocation":"true"
}
}```
Closes#2907
FieldData is an in-memory representation of the term dictionary in an uninverted form. Under certain circumstances this FieldData representation can grow very large on high-cardinality fields like tokenized full-text. Depending on the use-case filtering the terms that are hold in the FieldData representation can heavily improve execution performance and application stability.
FieldData Filters can be applied on a per-segment basis. During FieldData loading the terms enumeration is passed through a filter predicate that either accepts or rejects a term.
## Frequency Filter
The Frequency Filter acts as a high / low pass filter based on the document frequencies of a certain term within the segment that is loaded into field data. It allows to reject terms that are very high or low frequent based on absolute frequencies or percentages relative to the number of documents in the segment or more precise the number of document that have at least one value in the field that is loaded in the current segment.
Here is an example mapping
Here is an example mapping:
```json
{
"tweet" : {
"properties" : {
"locale" : {
"type" : "string",
"fielddata" : "format=paged_bytes;filter.frequency.min=0.001;filter.frequency.max=0.1",
"index" : "analyzed",
}
}
}
}
```
### Paramters
* `filter.frequency.min` - the minimum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
* `filter.frequency.max` - the maximum document frequency (inclusive) in order to be loaded in to memory. Either a percentage if < `1.0` or an absolute value. `0` if omitted.
* `filter.frequency.min_segment_size` - the minimum number of documents in a segment in order for the filter to be applied. Small segments might be omitted with this setting.
## Regular Expression Filter
The regular expression filter applies a regular expression to each term during loading and only loads terms into memory that match the given regular expression.
Here is an example mapping:
```json
{
"tweet" : {
"properties" : {
"locale" : {
"type" : "string",
"fielddata" : "format=paged_bytes;filter.regex=^en_.*",
"index" : "analyzed",
}
}
}
}
```
Closes#2874
Note: This has been disabled by default and is therefore not included in a
standard build. The main reason for this is, that you need to have a RPM
binary and the rpm development packages installed, which is not the case
on many systems.
The package contains an init.d-script as well as systemd configurations.
You can build your own RPM package simply by running 'maven rpm:rpm'
* In the case only should clauses were specified with specific type of filters, the first clause determined which documents matched.
* In some cases the minimum at least 1 should clause should match behaviour was broken.
Have an explicit threadpool warmer that is dedicated to execute warmers. Currently, it uses the search threadpool, which does not work well since the number of concurrent searches should be separate from the number of concurrent warmers allows, also the characteristics of the search pool (for example, bounded queue_size) might not fit well with how warmers should be executed (they should not be "rejected").
closes#2815
The `geo_shape` precision could be only set via `tree_levels` so far. A new option `precision` now allows to set the levels of the underlying tree structures to be set by distances like `50m`. The def
## Example
```json
curl -XPUT 'http://127.0.0.1:9200/myindex/' -d '{
"mappings" : {
"type1": {
"dynamic": "false",
"properties": {
"location" : {
"type" : "geo_shape",
"geohash" : "true",
"store" : "yes",
"precision":"50m"
}
}
}
}
}'
```
## Changes
- GeoUtils defines the [WGS84](http://en.wikipedia.org/wiki/WGS84) reference ellipsoid of earth
- DistanceUnits refer to a more precise definition of earth circumference
- DistanceUnits for inch, yard and meter have been defined
- Set default levels in GeoShapeFieldMapper to 50m precision
Closes#2803
this happens for example because we list assigned shards, and they might not have been allocated on the relevant node yet, no need to list those as actual failures in some APIs