Following the discussion in #11744, move boost and query _name to base class AbstractQueryBuilder with their getters and setters. Unify their serialization code and equals/hashcode handling in the base class too. This guarantess that every query supports both _name and boost and nothing needs to be done around those in subclasses besides properly parsing the fields in the parsers and printing them out as part of the doXContent method in the builders. More specifically, these are the performed changes:
- Introduced printBoostAndQueryName utility method in AbstractQueryBuilder that subclasses can use to print out _name and boost in their doXContent method.
- readFrom and writeTo are now final methods that take care of _name and boost serialization. Subclasses have to implement doReadFrom and doWriteTo instead.
- toQuery is a final method too that takes care of properly applying _name and boost to the lucene query. Subclasses have to implement doToQuery instead. The query returned will have boost and queryName applied automatically.
- Removed BoostableQueryBuilder interface, given that every query is boostable after this change. This won't have any negative effect on filters, as the boost simply gets ignored in that case.
- Extended equals and hashcode to handle queryName and boost automatically as well.
- Update the query test infra so that queryName and boost are tested automatically, and whenever they are forgotten in parser or doXContent tests fail, so this makes things a lot less error-prone
- Introduced DEFAULT_BOOST constant to make sure we don't repeat 1.0f all the time for default boost values.
SpanQueryBuilder is again a marker interface only. The convenient toQuery that allowed us to override the return type to SpanQuery cannot be supported anymore due to a clash with the toQuery implementation from AbstractQueryBuilder. We have to go back to castin lucene Query to SpanQuery when dealing with span queries unfortunately.
Note that this change touches not only the already refactored queries but also the untouched ones, by making sure that we parse _name and boost whenever we need to and that we print them out as part of QueryBuilder#doXContent. This will result in printing out the default boost all the time rather than skipping it in non refactored queries, something that we would have changed anyway as part of the query refactoring.
The following are the queries that support boost now while previously they didn't (parser now parses it and builder prints it out): and, exists, fquery, geo_bounding_box, geo_distance, geo_distance_range, geo_hash_cell, geo_polygon, indices, limit, missing, not, or, script, type.
The following are the queries that support _name now while previously they didn't (parser now parses it and builder prints it out): boosting, constant_score, function_score, limit, match_all, type.
Range query parser supports now _name at the same level as boost too (_name is still supported on the outer object though for bw comp).
There are two exceptions that despite have getters and setters for queryName and boost don't really support boost and queryName: query filter and span multi term query. The reason for this is that they only support a single inner object which is another query that they wrap, no other elements.
Relates to #11744Closes#10776Closes#11974
Today we loose the RestStatus code for non-serializable exceptions.
This can be tricky if they are supposed to signal certain situations
like authentication errors etc. This commit adds support for carrying on
the exceptions in the NotSerializableExceptoinWrapper
The filters aggregation now has an option to add an 'other' bucket which will, when turned on, contain all documents which do not match any of the defined filters. There is also an option to change the name of the 'other' bucket from the default of '_other_'
Closes#11289
This change means that when the skip gap policy is used, the bucket script aggregation will skip executing the script on a bucket if any of the required bucket_paths are missing for the bucket. No aggregation will be added to the bucket, and the aggregation will move to the next bucket.
This commit changes the postrm script so that it prints error messages instead of failing & exiting when the deletion of a directory failed while removing a RPM/DEB package.
Closes#11373
This allows a lot of null checks to be removed where we were always falling back to the ValueFormat.RAW anyway. Now the format is set to ValueFormat.RAW when no alternative is suitable.
Closes#10594
Field stats index constraints allows to omit all field stats for indices that don't match with the constraint. An index
constraint can exclude indices' field stats based on the `min_value` and `max_value` statistic. This option is only
useful if the `level` option is set to `indices`.
For example index constraints can be useful to find out the min and max value of a particular property of your data in
a time based scenario. The following request only returns field stats for the `answer_count` property for indices
holding questions created in the year 2014:
curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
"fields" : ["answer_count"] <1>
"index_constraints" : { <2>
"creation_date" : { <3>
"min_value" : { <4>
"gte" : "2014-01-01T00:00:00.000Z",
},
"max_value" : {
"lt" : "2015-01-01T00:00:00.000Z"
}
}
}
}'
Closes#11187
"Root" is a very confusing term for meta field mappers. This change
renames "RootMapper" to "MetadataFieldMapper" and simplifies
how metadata mappers are setup.
It also requires that metadata mappers are now a FieldMapper
(MetadataFieldMapper extends from AbstractFieldMapper). The only
use of a root mapper that wasn't a field mapper was the theoretical
"external" root mapper (just a test mapper). But it doesn't make
sense to not have an actual field, and this falls inline with
the hopefully eventual collapsing of AbstractFieldMapper/FieldMapper/Mapper.
- shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see #11923
- the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately
This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution.
Closes#11960
We had several problems with Java Serializatin in the past. At some point
in the Java 1.7.x series JDKs where not compatible anymore when java
serialization (ObjectStream) was used to exchange objects. In elasticsearch
we used this to serialize exceptions across the wire which caused several problems
with incompatible JDKs. While causing lot of trouble this essentially prevented
users from moving forward and upgrade their JVMs. To prevent these kind of issues
this commit removes the dependency on java serialization entirely and bans the
usage of ObjectOutputStream and ObjectInputStream entirely.
Yet, we can't fully serialize all exception anymore such that this commit
is best effort and adds hand written serialization to all elasticsearch exceptions
as well to a selected set of JDK and Lucene exceptions. (see StreamOutput#writeThrowable /
StreamInput.readThrowable). Stacktraces should be preserved for all exceptions while
several names might be replaced with ElasticsearchException if there is no mapping for
the given exception.
Added validation for all inner queries that any already refactored query may hold. Added also tests around this. At the end of the refactoring validate will be called by SearchRequest#validate during TransportSearchAction execution, which will call validate against the top level query builder that will need to go and validate its data plus all of its inner queries, and so on.
Closes#11889
In order to support older RPM based distributions like CentOS5,
we should have one RPM available, which is not signed.
This commit creates an unsigned RPM first, then moves it over to
target/releases during the build, then builds a signed RPM.
The unsigned one is uploaded via S3, where as the signed one is
used for the repositories.
In addition, you can now build an RPM without having to specify
any gpg credentials due to offloading this into a maven profile
that is only activated when specifying `rpm.sign` property.
Closes#11587
instead of maintaining a thread local cache in the PercolatorQueriesRegistry.
Before PercolatorQueriesRegistry had its own cache, because all the queries had to forcefully opt out of caching. Nowadays in master small segments are never cached by the query cache, so the reason for the dedicated cache is no longer valid.
Today we keep track of how often filters are used at the index level in order
to decide whether they should be cached or not. This is an issue if you have
several shards of the same index on the same node as it will multiply statistics
by the number of shards that you have for this index on the node, which defeats
the purpose of waiting for a filter to be reused before caching them.
If the translog UUID is corrupted we should not convert it
to UTF-8 since it might be invalid. Instead we should compare
the UTF-8 byte representation directly.
This more consistent with the other logging it makes and since it can be used in many operations the output can be more verbose (without adding too much info as to who timed out exactly - which we can fix separately). If need be the caller of the observer can log a higher level message.
Closes#11722
In order to be more consistent with what they do, the query cache has been
renamed to request cache and the filter cache has been renamed to query
cache.
A known issue is that package/logger names do no longer match settings names,
please speak up if you think this is an issue.
Here are the settings for which I kept backward compatibility. Note that they
are a bit different from what was discussed on #11569 but putting `cache` before
the name of what is cached has the benefit of making these settings consistent
with the fielddata cache whose size is configured by
`indices.fielddata.cache.size`:
* index.cache.query.enable -> index.requests.cache.enable
* indices.cache.query.size -> indices.requests.cache.size
* indices.cache.filter.size -> indices.queries.cache.size
Close#11569
Today, we disable CORS by default, but if a user simply enables CORS their instance of
elasticsearch will allow cross origin requests from anywhere, as the default value for allowed
origins is `*`.
This changes the default to be `null` so that no origins are allowed and the user must explicitly
specify the origins they wish to allow requests from. The documentation also mentions that there
is a security risk in using `*` as the value.
Closes#11169
the codebase based on the conventions that we decided to follow. Also including
some cosmetic fixes (making members final where possible, avoiding this
references outside of setters/getters).
In addition to that this PR changes:
* prevent NPEs in doXContent when rendering out nested queries that are null. We now render out empty object ({ }) which then gets parser to null to be consistent with queries than come only through the rest layer
* prevent adding nested null queries to collections of clauses like in BoolQueryBuilder
* add validate() method to all builders (even when empty)
Closes#11834
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.
Relates #10971
Today we are very lenient in parsing the translog files. This is
actually not necessary since we have a clear run once upgrade path.
All files are converted into the new file name pattern such that we
only need to look at old file patterns in the context of the upgrade.
This commit makes parsing really strict with the exceptoin of the upgrade path.
This adds a new pipeline aggregation, the cumulative sum aggregation. This is a parent aggregation which must be specified as a sub-aggregation to a histogram or date_histogram aggregation. It will add a new aggregation to each bucket containing the sum of a specified metrics over this and all previous buckets.
Today we mark a translog as upgraded by adding a marker to the engine commit.
Yet, this commit was only added if there was no translog present before ie. only
if we have a fresh engine which is missing the entire point. Yet, this commit
adds a backwards index tests that ensures we can open old indices more than once
ie. mark the index as upgraded.
Closes#11858
In Lucene 5x the exception thrown when highlighter encounters a huge term
is a BytesRefHash.MaxBytesLengthExceededException but in Lucene 4x it is
wrapped in a RuntimeException. Therefore, it seems saver to unwrap this.
As we now have an enum Operator that comes with many useful helper methods switching to use
that instead of the enums defined separately. Also switches to using the new enum's helper
methods where applicable removing duplicate parsing logic.
This breaks backwards compatibility. Documenting the break in
migrate_query_refactoring.asciidoc
Relates to #10217
There was only a single actual "use" of close, for a threadlocal
in VersionFieldMapper. However, that threadlocal is completely
unnecessary, so this change removes the threadlocal and
close() altogether.
This commit consolidates several abstractions on the shard level in
ordinary classes not managed by the shard level guice injector.
Several classes have been collapsed into IndexShard and IndexShardGatewayService
was cleaned up to be more lightweight and self-contained. It has also been moved into
the index.shard package and it's operation is renamed from recovery from "gateway" to recovery
from "store" or "shard_store".
Closes#11847
This commit folds ShardRouting, ImmutableShardRouting and MutableShardRouting
into ShardRouting. All mutators are package private anyway today so it's just
unnecessary abstraction.
ShardRoutings are now frozen once they are added to the IndexRoutingTable
to prevent modifications outside of the allocation code.
This commit makes the get and search APIs always return `_parent`, `_routing`,
`_timestamp` and `_ttl` in addition to `_id` and `_type`. This way, consumers
always have all required information in order to reindex a document.
Currently the SnapshotsService is concerned with both maintaining the global snapshot lifecycle on the master node as well as responsible for keeping track of individual shards on the data nodes. This refactoring separates two areas of concerns by moving all shard-level operations into a separate SnapshotShardsService.
Closes#11756
Refactors CommonTermsQuery analogous to TermQueryBuilder. Still left to do are
the tests to compare between builder and actual Lucene query.
Relates to #10217
This PR is against the query-refactoring branch.
This commit makes SimpleQueryStringBuilder streamable, add hashCode and equals. Adds a dedicated builder/parser unit test, fixes formatting, adds JavaDoc where needed, adjust the handling of default values according to https://github.com/elastic/dev/blob/master/design/queries/general-guidelines.md
Switched to using toLanguageTag/forLanguageTag when parsing Locales. Using LocaleUtils from either Elasticsearch or Apache commons resulted in Locales not passing the roundtrip test. For more info see https://issues.apache.org/jira/browse/LUCENE-4021
Relates to #10217
Currently the filter cache is configured to have a maximum size in bytes of 10%
of the JVM memory, and a maximum number of cached filters (across all segments
of all shard on the same node) of 100000. I would like to change the latter to
a more reasonable value of 1000.
Given that we track the most 256 most recently used filters per index and only
cache those that have been seen 5 times or more, a single index cannot have more
than 50 hot filters, so a maximum number of cached filters of 1000 per node
should be more than necessary.
Today, we have scheduled reroute that kicks every 10 seconds and checks if a
reroute is needed. We use it when adding nodes, since we don't reroute right
away once its added, and give it a time window to add additional nodes.
We do have recover after nodes setting and such in order to wait for enough
nodes to be added, and also, it really depends at what part of the 10s window
you end up, sometimes, it might not be effective at all. In general, its historic
from the times before we had recover after nodes and such.
This change removes the 10s scheduling, simplifies RoutingService, and adds
explicit reroute when a node is added to the system. It also adds unit tests
to RoutingService.
closes#11776
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#11823
Since elasticsearch doesn't shade artifacts anymore (see #11522), the dependencies list for RPM/DEB must be updated. Now we package all maven libs by default except the generated -shaded/-tests/-test-cours JARs and slf4j-api (marked as optionnal).
We currently are very lax about allowing data types to conflict for the
same field name, across document types. This change makes the underlying
map in MapperService a 1-1 map of field name to field type, and throws
exception when new types are not compatible.
To still allow changing a type, with parameters that are allowed to be
changed, but for a field that exists in multiple types, a new parameter
to index creation and put mapping API is added: update_all_types.
This defaults to false, and the exception messages suggest using
this parameter when trying to modify a setting that is allowed to be
modified but is being limited by this restriction.
There are also a couple changes which try to base fields from new types
for dynamic mappings, and root mappers, on existing settings. For
dynamic mappings this is important if the dynamic defaults have been
changed. For root mappings, this is mostly just for backcompat when
pre 2.0 root mappers could have their field type changed.
fixes#8871
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
we currently don't expose this.
This adds the following to the OS section of `_nodes`:
```
"os": {
"name": "Mac OS X",
...
}
```
and the following to the OS section of `_cluster/stats`:
```
"os": {
...
"names": [
{
"name": "Mac OS X",
"count": 1
}
],
...
},
```
Closes#11807
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0. Users who need to
adjust these settings can use a date field.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings. In
this case this also includes FQueryFilterParser, since both queries are
closely related.
Relates to #10217Closes#11729
- Fixes tests, and removes a few special snowflake, fragile tests.
- Removes concrete implementation of predict() and moves it into
each model so that the logic is clearer. Because there is some
shared checks/assertions, those remain in predict() and the main
prediction happens in doPredict()
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#11717
The commit about adding cluster health response features also removed
accidentally some functionality, that resulted in wrong instanceof checks
in InternalClusterService and thus in test failures because the cluster
state task that was added via an anonymous was missing the cast.
This commit readds the abstract class with slight renaming.
Commit id was: 88f8d58c8b
If we mark the shard as being in POST_RECOVERY before the percolator
is fully set up we might expose it to the user as fully searchable before
all queries are loaded. This can lead to wrong results especially in tests
when a shard is concurrently marked as STARTED.
This commit also removes unneded abstractions on IndexShard where readoperations
should be allowed when the purose is a write.
It is handy to have a base interface, not just an abstract class, for all of our query builders. This gives us more flexibility especialy with complex class hierarchies. For instance SpanTermQueryBuilder extends BaseTermQueryBuilder, but also needs to be marked as a SpanQueryBuilder. The latter is a marker interface that should extend QueryBuilder which is not possible unless QueryBuilder actually is an interface.
Also remove queryId method as it created confusion, getName is good enough for the purpose, and override the return type of toQuery method for SpanQueryBuilders to SpanQuery.
Closes#11796
In order to get a quick overview using by simply checking the cluster state
and its corresponding cat API, the following two attributes have been added
to the cluster health response:
* task max waiting time, the time value of the first task of the
queue and how long it has been waiting
* active shards percent: The percentage of the number of shards that are in
initializing state
This makes the cluster health API handy to check, when a fully restarted
cluster is back up and running.
Closes#10805
The change makes rest-spec-api a project in the same way as we build dev-tools. it packages the tests and api in a bundle using the maven-remote-resources-plugin and uses the same plugin in the plugins and core pom to unpack the rest-api-spec into the target directory and references the rest tests there in the test resources.
The main stimulus for this change is that for those using Eclipse the current build does not work. After running `mvn eclipse:eclipse` the Eclipse IDE errors because the rest-api-spec is outside of the project scope, meaning that every time the command is run (required whenever any dependencies change), the class path of all the projects has to be manually fixed.
This fixes an issue to allow for negative unix timestamps.
An own printer for epochs instead of just having a parser has been added.
Added docs that only 10/13 length unix timestamps are supported
Added docs in upgrade documentation
Fixes#11478
Tests relying on sleeps and latch timeouts are prone to weird timing issues
and hard to read / understand error messages. This commit moves towards a more
deterministic error model and replaces empty fails with real exceptions.
Some repository verification exceptions are currently only returned to the users but not logged on the nodes where the exceptions occurred, which makes troubleshooting difficult.
Closes#11760
For #8871, we need to be able to check field types are compatible,
without comparing FieldMappers. This change moves the simulation
checks (which generate merge conflicts) for any properties of
MappedFieldTypes into a new method, validateCompatible.
This also simplifies the merge code which merges settings
between the old and new fieldtypes. Previously, each subclass
of FieldMapper would have to set its own fieldtype settings.
However, now that we have .clone(), which perfectly copies
all properties (with subclasses accounted for), we can now
do a simple clone when merging.
Finally, this fixes a subtle bug in merging, in which if
merging has conflicts, and we were not simulating, we would
still update the field type, even though it was not compatible!
NOTE: there is one test failure I am trying to track down with
timestamp merging. Otherwise, all tests pass.
Currently, we delete the shard _state file on engine failure.
This behaviour does not persist the engine failure reason for later inspection.
This commit marks the shard store as corrupted instead of deleting
the _state file to ensure the store index can not be opened after and
the engine failure is persisted.
CommonTermsQueryParser does not check for disable_coords, only for
disable_coord. Yet the builder only outputs disable_coords, leading to
disabling the coordination factor to be ignored in the Java API.
Closes#11730Closes#11780
When using compression over the network, you might sometimes see warnings that
the stream was not fully read. This is because DeflaterOutputStream adds an
end-of-stream marker. When deserializing, we need to poll for one byte using
InputStream.read() to make sure to decode this EOS marker.
For the record, it does not strike all the time today because we perform
buffering when decompressing to avoid performing too many JNI calls, but it
is easy to make this warning happen all the time by decreasing the size of
the buffer we use.
Close#11748
Need to reset the registered setting in order to make sure the nex round will capture the right delay interval
also randomize setting and name the setting properly
closes#11759
Added several classes to support expressions being used for numerical
calculations in aggregations. Expressions will still not compile
when used with mapping and update script contexts.
Closes#11596Closes#11689
Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations.
The setting is an index level setting under `index.unassigned.node_left.delayed_timeout` and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically.
When shards are delayed, a log message with "info" level will notify how many shards are being delayed.
An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is unassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas.
close#11712
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#11703
`com.google.common.collect.Iterators#emptyIterator()` is marked as deprecated and will be removed in May 2016. We should use JDK7 `Collections#emptyIterator()`
By setting human parameter to true, it's now possible to see human readable versions of Elasticsearch that created and updated the index as well as the date when the index was created.
Closes#11484
This changes the parameter name `ignore_like` to the more user friendly name
`unlike`. This later feature generates a query from the terms in `A` but not
from the terms in `B`. This translates to a result set which is like `A` but
unlike `B`. We could have further negatively boosted any documents that have
some `B`, but these documents already do not receive any contribution from
having `B`, and would therefore negatively compete with documents having `A`.
Closes#11117
this is really just a workaround for plugins to run their own
REST tests instead of the core ones. It opts out of the rest test
loading from the core jar file and tries to load from the classpath instead.
Eventually we need to fix this infrastrucutre to move away from parameterized
tests such that subclasses can override behavior.
Closes#11721
Added a licenses/ directory to core which contains a sha1 file for each JAR
dependency, and one or more LICENSE files and one NOTICE file for each
project.
Also adds dev-tools/src/main/resources/license-check/check_license_and_sha.pl
which checks that the licenses/ dir is up to date during a mvn verify,
and which can be used to update the sha1 files when upgrading dependencies.
Closes#2794Closes#10684Closes#11705
This change adds a simplistic heuristic to try to balance new shard
allocations across multiple data paths on one node so that e.g. if
there are two path.data and both have roughly the same free space, if
10 shards are suddenly allocated, we will put 5 on one path and 5 on
the other (vs 10 on a single path today).
Closes#11185Closes#11122
Unassigned meta includes additional information as to why a shard is unassigned, this is especially handy when a shard moves to unassigned due to node leaving or shard failure.
The additional data is provided as part of the cluster state, and as part of `_cat/shards` API.
The additional meta includes the timestamp that the shard has moved to unassigned, allowing us in the future to build functionality such as delay allocation due to node leaving until a copy of the shard is found.
closes#11653
Since the /var/run/elasticsearch directory is cleaned when the operating system starts, the init.d script must ensure that the PID_DIR is correctly created.
Closes#11594
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#11628
As part of the refactoring of queries this PR splits the parsers parse() method
into a parsing and a query building part, adding serialization, hashCode() and
equals() to the builder.
Relates to #10217Closes#11621
The cache we are using for fielddata reclaims memory lazily/asynchronously. While this
helps with throughput this is an issue when a clear operation is issued manually since
memory is not reclaimed immediately. Since our clear methods already perform in linear
time anyway, this commit changes the fielddata cache to reclaim memory synchronously
when a clear command is issued manually. However, it remains lazy in case cache entries
are invalidated because segments or readers are closed, which is important since such
events happen all the time.
Close#11695
Some of our Java api builders had wrong logic when it comes to serializing the query in json format, resulting in missing fields like _name. Also, regexp parser was ignoring the _name field.
Closes#11694
`_timestamp` uses NumericDocValues instead of SortedNumericDocValues like other
numeric fields since it is guaranteed to be single-valued. However, we don't
need a different fielddata impl for it since DocValues.getSortedNumeric already
falls back to NUMERIC doc values if SORTED_NUMERIC are not available.
[maven] clean pom.xml
In Maven parent project, in dependency management, we should only declare which versions of 3rd party jars we want to use but not force any scope.
It makes then more obvious in modules what is exactly the scope of any dependency.
For example, one could imagine importing `jimfs` as a `compile` dependency in another module/plugin with:
```xml
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
</dependency>
```
But it won't work as expected as the default maven `scope` should be `compile` but here it's `test` as defined in the parent project.
So, if you want to use this lib for tests, you should simply define:
```xml
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
<scope>test</scope>
</dependency>
```
We also remove `maven-s3-wagon` from gce plugin as it's not used.
This was previously a container for an ObjectMapper, along with the
DocumentMapper that ObjectMapper came from. However, there was
only one use of needing the associated DocumentMapper, and that
wasn't actually used.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#11629
In Maven parent project, in dependency management, we should only declare which versions of 3rd party jars we want to use but not force any scope.
It makes then more obvious in modules what is exactly the scope of any dependency.
For example, one could imagine importing `jimfs` as a `compile` dependency in another module/plugin with:
```xml
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
</dependency>
```
But it won't work as expected as the default maven `scope` should be `compile` but here it's `test` as defined in the parent project.
So, if you want to use this lib for tests, you should simply define:
```xml
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
<scope>test</scope>
</dependency>
```
We also remove `maven-s3-wagon` from gce plugin as it's not used.
Now that doc values are the default for fielddata, specialized in-memory
formats are becoming an esoteric option. This commit removes such formats:
- `fst` on string fields,
- `compressed` on geo points.
I also removed documentation and tests that the fielddata cache is shared if
you change the format, since this is only true for in-memory fielddata formats
(given that for doc values, the caching is done directly in Lucene).
The script APIs have been deprecated long ago we can now remove them.
This commit still keeps the parsing code since it might be used in a
query that is still stuck in transaction log. This issue should be discussed
elsewhere.
Closes#11619
In order to restrict a single set of field type settings for a given
field name across an index, we need the ability to compare field types.
This change adds equals and hashcode, as well as tests for every field
type.
Make sure there is a single place where shard routing move to unassigned, so we can add additional metadata when it does, also, simplify shard routing implementations a bit
closes#11634
The AsyncShardFetch retrieves shard information from the different nodes in order to detirment the best location for unassigned shards. The class uses TransportNodesListGatewayStartedShards and TransportNodesListShardStoreMetaData in order to fetch this information. These actions, inherit from TransportNodesAction and are activated using a list of node ids. Those node ids are extracted from the cluster state that is used to assign shards.
If we perform a reroute and adding new news in the same cluster state update task, it is possible that the AsyncShardFetch administration is based on
a different cluster state then the one used by TransportNodesAction to resolve nodes. This can cause a problem since TransportNodesAction filters away unkown nodes, causing the administration in AsyncShardFetch to get confused.
This commit fixes this allowing to override node resolving in TransportNodesAction and uses the exact node ids transfered by AsyncShardFetch
Information about in-progress snapshot and restore processes is not really metadata and should be represented as a part of the cluster state similar to discovery nodes, routing table, and cluster blocks. Since in-progress snapshot and restore information is no longer part of metadata, this refactoring also enables us to handle cluster blocks in more consistent manner and allow creation of snapshots of a read-only cluster.
Closes#8102
This commit cleans up all the MergeScheduler infrastructure
and simplifies / removes all unneeded abstractions. The MergeScheduler
itself is now private to the Engine and all abstractions like Providers
that had support for multiple merge schedulers etc. are removed.
Closes#11602
This is helpful to track down the origin of pending_tasks that aren't
expected. In tests we catch this with an assert, but in production
asserts may not be enabled so we should at least add the class name.
When specifying a time zone, RangeQueryBuilder currently throws an
exeption when the field does not use DateFieldMapper, but if there
are no mappers at all for that field it doesn't complain. To be
consistent we also throw expection now.
Refactors ExistsQueryBuilder and Parser, splitting the parse() method into a parsing
and a query building part. Also moving newFilter() utility method from parser to query builder.
Changes in the BaseQueryTestCase include introduction of randomized version to test disabled
FieldNamesFieldMappers and also getting rid of the need for createEmptyBuilder() method by
using registered prototype constants.
Relates to #10217Closes#11427
and a query building part, adding NamedWriteable implementation for serialization and hashCode(),
equals() for testing.
This change also adds tests using fixed set of leaf queries (Terms, Ids, MatchAll) as nested Queries in test query setup.
Also QueryParseContext is adapted to return QueryBuilder instances for inner filter parses instead of
previously returning Query instances, keeping old methods in place but deprecating them.
Relates to #10217Closes#11427
Today we provide the ability to plug in MergePolicy and
we provide the once lucene ships with. We do not recommend to change
the default and even only a small number of expert users would ever touch
this. This commit removes the ancient log byte size and log doc count
merge policy providers, simplifies the MergePolicy wiring and makes the
tiered MP the one and only default. All notions of a merge policy has been
removed from the docs and should be deprecated in the previous version.
Closes#11588
There used to be a null check for _field_names mapper not existing. This
was recently removed. However, there is a corner case when the mapper
may be missing: when no types or docs exist at all in the index.
This change adds back a null check and just returns no docs.
The current ExceptionsHelper.unwrapCause(exception) requires the incoming exception to support ElasticsearchWrapperException , which TranslogRecoveryPerformer.BatchOperationException doesn't implement. I opted for a more generic solution
We had to make CompressedXContent.equals decompress data to fix some
correctness issues which had the downside of making equals() slow. Now we store
a crc32 alongside compressed data which should help avoid decompress data in
most cases.
Close#11247
Now that mapping updates are sync and done before indexing we don't really need the waiting component. Also, removed many places were they were used as safe guard against delayed mapping updates, which are now not needed.
As part of the query refactoring, we want to be able to serialize queries by having them extend Writeable, rather than serializing their json. When reading them though, we need to be able to identify which query we have to create, based on its name.
For this purpose we introduce a new abstraction called NamedWriteable, which is supported by StreamOutput and StreamInput through writeNamedWriteable and readNamedWriteable methods. A new NamedWriteableRegistry is introduced also where named writeable prototypes need to be registered so that we are able to retrieve the proper instance of query given its name and then de-serialize it calling readFrom against it.
Closes#11553
While we had initially planned to keep rivers around in 2.0 to ease migration,
keeping support for rivers is challenging as it conflicts with other important
changes that we want to bring to 2.0 like synchronous dynamic mappings updates.
Nothing impossible to fix, but it would increase the complexity of how we
deal with dynamic mappings updates and manage rivers, while handling dynamic
mappings updates correctly is important for resiliency and rivers are on the go.
So removing rivers in 2.0 may well be a better trade-off.
These methods are now all in MappedFieldType. This removes the remaining
callers of the methods on FieldMapper, and cuts down the FieldMapper
API to no longer include them.
The MapperService is the "index wide view" of mappings. Methods on it
are used at query time to lookup how to query a field. This
change reduces the exposed api so that any information returned
is limited to that api exposed by MappedFieldType. In the future,
MappedFieldType will be guaranteed to be the same across all
document types for a given field.
Note CompletionFieldType needed some more settings moved to it. Other
than that, this change is almost purely cosmetic.
Split the parse method into a parsing and a query building part, adding serialization
and hashCode(), equals() for better testing. Add basic unit test for Builder and Parser.
Closes#11551
The shard can potentially not be deleted if the obsever that checks for the shard
STARTED is not registered because the registering is delayed by the disruption.
If the sum of delays is more than 10s then the wait for shard deletion will time out.
The ResourceWatcher used settings prefixed `watcher.`, which
potentially could clash with the watcher plugin.
In order to prevent confusion, the settings have been renamed to
`resource.reload` prefixes.
This also uses the deprecation logging infrastructure introduced
in #11033 to log deprecated settings and their alternative at
startup.
Closes#11175
In case a developer gets a list of ids from another data source,
it does not make a lot of sense, to convert it to an array first,
and then internally in IdsQueryBuilder elasticsearch creates a
list out of this.
Closes#5089
In order for exists queries to use the null value for
a field, null value needs to be part of the field type (should
differ between document types). This change moves null value
into the field type, as well as simplifies the null value
methods available to remove supportsNullValue().
#11363 introduced a retry logic for the case where we have to wait on a mapping update during the translog replay phase of recovery. The retry throws or recovery stats off as it may count ops twice.
There are different ways to register custom query parsers through plugins, a couple of them work per index via index settings, which is probably even too flexible. There also three different ways to add a global custom query parser through either IndicesQueriesModule or IndicesQueriesRegistry. This commit consolidates the registration of custom query parsers via IndicesQueriesModule#addQuery(Class<? extends QueryParser>). The complexity of supporting parsers per index is not needed hence it got removed. Also the other ways of registering global custom parsers are dropped in favour of the one mentioned above.
Closes#11481
The search template and template query did not run the template through the script engine before searching for an inner template. This meant that parsing for the inner template failed because the template was not always valid JSON (if it contained mustache code) when it was parsed to find the inner template. This has been fixed and Tests added to check for the failing behaviour.
Tests are from https://github.com/elastic/elasticsearch/pull/8393
In #10918, we introduced the prompt placeholders. These were had a different format
than our existing placeholders. This changes the prompt placeholders to follow the
format of the existing placeholders.
Relates to #11455
This commit adds an additioal jar that is shaded and keeps all the
artifacts that are used by default on the server-side unshaded. Users
that need a shaded jar can now use the `shaded` classifyer to pull
the shaded minimized jar in instead. Including the shaded jar in a
downstream project looks like this:
```XML
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<classifier>shaded</classifier>
</dependency>
```
After asynchronously fetching shard information the gateway allocator issues a reroute via a cluster state update task. #11421 introduced an optimization trying to avoid submitting unneeded reroutes when results for many shards come in together. This is done by having a rerouting flag, indicating a pending reroute is coming and thus any new incoming shard info doesn't need to issue a reroute. This flag wasn't reset upon an error in the reroute update task. Most notably - if a master node had to step during to a min_master_node violation, it could reject an ongoing reroute. Lacking to reset the flag causing it to skip any future reroute, when the node became master again.
Closes#11519
Since we are creating write.lock earlier now, blob store shouldn't attempt deleting this file during clean up at the end of the restore process. The file is locked and the blog store doesn't succeed, but it generates a lot of useless warnings "failed to delete file [write.lock] during snapshot cleanup".
Closes#11517