When an index is recovered from disk it's metadata is imported first and the master reaches out to the nodes looking for shards of that index. Sometimes those requests reach other nodes before the cluster state is processed by them. At the moment, that situation disables the checking of the store, which requires the meta data (indices with custom path need to know where the data is). When corruption hits this means we may assign a shard to node with corrupted store, which will be caught later on but causes confusion. Instead we can try loading the meta data from disk in those cases.
Relates to #17630
When it comes to query parsing, either a field is tokenized and it would go
through analysis with its search_analyzer. Or it is not tokenized and the
raw string should be passed to termQuery(). Since numeric fields are not
tokenized and also declare a search analyzer, values would currently go through
analysis twice...
We have a couple places in the code base that assume that search is always done
on the inverted index. However with the new points API in Lucene 6, this is not
true anymore. This commit makes MappedFieldType.indexedValueForSearch protected
and fixes call sites to keep working for field types that use the inverted
index and either work differently ar throw an exception otherwise. For instance,
it will still be possible to run cross_fields multi match queries on numeric
fields, but the score contributions will not be blended as well as before, and
significant terms aggregations on long terms will not be possible anymore since
points do not record document frequencies.
This commit removes `MappedFieldType.value` and simplifies
`MappedFieldType.valueforSearch`. `valueforSearch` was used to post-process
values that come for stored fields (eg. to convert a long back to a string
representation of a date in the case of a date field) and also values that
are extracted from the source but only in the case of GET calls: it would
not be called when performing source filtering on search requests.
`valueforSearch` is now only called for stored fields, since values that are
extracted from the source should already be formatted as expected.
In both cases, what elasticsearch is really interested in is whether the field
is an analyzed string field. So it can just check `tokenized()` instead.
* upgrades numerics to new Point format
* updates geo api changes
* adds GeoPointDistanceRangeQuery as XGeoPointDistanceRangeQuery
* cuts over to ES GeoHashUtils
TL;DR This commit should not have any impact on terms aggs, it will just make
supporting ipv6 easier.
Currently only the numeric terms aggs propagate the DocValueFormat instance since
we use numerics to represent also dates or ip addresses. Since string terms aggs
are only used for text/keyword/string fields, they do not use the format and just
call toUt8String(). However when we support ipv6, ip addresses as well will be
encoded in sorted doc values (just like strings) so we will need to use the
DocValueFormat to format the keys.
CBOR is natively supported in Elasticsearch and allows for byte arrays.
This means, that by using CBOR the user can prevent base64 conversions
for the data being sent back and forth.
This PR adds support to extract data from a byte array in addition to
a string. This also required to add a ByteArrayValueSource class.
In #17133 we introduce request size limit handling and need a custom
channel implementation. In order to ensure we delegate all methods
it is better to have this channel implement an interface instead of
an abstract base class (so changes on the interface turn into
compile errors).
Relates #17133
Sometimes we get a test failure caused by search contexts left open.
The tests include a stack trace of the call that opened the context
but nothing else about the context. This adds more information about
the context that has been left open like what query it was running,
what shard it targeted, and whether or not it was a scroll.
Relates to #17582
When you implement an ingest factory which implements `Closeable`:
```java
public static final class Factory extends AbstractProcessorFactory<MyProcessor> implements Closeable {
@Override
public void close() throws IOException {
logger.debug("closing my processor factory");
}
}
```
The `close()` method is never called which could lead to some leak threads when we close a node.
The `ProcessorsRegistry#close()` method exists though and seems to do the right job:
```java
@Override
public void close() throws IOException {
List<Closeable> closeables = new ArrayList<>();
for (Processor.Factory factory : processorFactories.values()) {
if (factory instanceof Closeable) {
closeables.add((Closeable) factory);
}
}
IOUtils.close(closeables);
}
```
But apparently this method is never called in `Node#stop()`.
Closes#17625.
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
* Create one AllField field per field eligible for _all.
* Add a positionIncrementGap (with a size of 100, not configurable) between
each entry in order to distinguish fields when doing phrase query on _all.
This removes the inconsistent output of IP addresses. The format was parsing-unfriendly and it makes it hard
to reason about API responses, such as to _nodes.
With this change in place, it will never print the hostname as part of the default format, which has the
added benefit that it can be used consistently for URIs, which was not the case when the hostname might
appear at the front with "hostname/ip:port".
CORS headers and methods config parameters must be read as arrays. This
commit fixes the issue. It affects http.cors.allow-methods and
http.cors.allow-headers.
Fixes#17483
Aggregations need to perform instanceof calls on MappedFieldType instances in
order to know how they should be parsed or formatted. Instead, we should let
the field types provide a formatter/parser that can can be used.
This change makes the root (/) rest api delegate to a transport action to get the
data for the response. This aligns this rest api with all of the other apis, which
delegate to one or more actions.
In doing this, unit tests were added to provide coverage of the RestMainAction
and the associated classes.
Because sigma is also used at reduce time, it should be passed to empty aggs.
Otherwise it causes bugs when an empty aggregation is used to perform reduction
is it would assume a sigma of zero.
Closes#17362