This commit makes FilteredQuery a forbidden API and also removes some more usage
of the Filter API. There are some remaining code using filters for parent/child
queries but I'm not touching this as they are already being refactored in #6511.
This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:
- custom cache keys (`_cache_key`) are unsupported
- decisions are made internally and can't be overridden by users ('_cache`)
- not only filters can be cached but also all queries that do not need scores
- parent/child queries can now be cached, however cached entries are only
valid for the current top-level reader so in practice it will likely only
be used on read-only indices
- the cache deduplicates filters, which plays nicer with large keys (eg. `terms`)
- better stats: we already had ram usage and evictions, but now also hit count,
miss count, lookup count, number of cached doc id sets and current number of
doc id sets in the cache
- dynamically changing the filter cache size is not supported anymore
Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).
Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.
Minor issue with specifying the correct version when starting the package release script.
Another issue fixed to make sure that the S3 bucket parameters act the same.
In order to automatically sign and and upload our debian and RPM
packages, this commit incorporates signing into the build process
and adds the necessary steps to the release process. In order to do this
the pom.xml has been adapted and the RPM and jdeb maven plugins have been
updated, so the packages are signed on build. However the repositories
need to signed as well.
Syncing the repos requires downloading the current repo, adding
the new packages and syncing it back.
The following environment variables are now required as part of the build
* GPG_KEY_ID - the key ID of the key used for signing
* GPG_PASSPHRASE - your GPG passphrase
* S3_BUCKET_SYNC_TO: S3 bucket to sync new repo into
The following environment variables are optional
* S3_BUCKET_SYNC_FROM: S3 bucket to get existing packages from
* GPG_KEYRING - home of gnupg, defaults to ~/.gnupg
The following command line tools are needed
* createrepo (creates RPM repositories)
* expect (used by the maven rpm plugin)
* apt-ftparchive (creates DEB repositories)
* gpg (signs packages and repo files)
* s3cmd (syncing between the different S3 buckets)
The current approach would also work for users who want to run their
own repositories, all they need to change are a couple of environment
variables.
Minor implementation detail: Right now the branch name is used as version
for the repositories (like 1.4/1.5/1.6) - if we ever change our branch naming
scheme, the script needs to be fixed.
In Lucene 5.1 lots of filters got deprecated in favour of equivalent queries.
Additionally, random-access to filters is now replaced with approximations on
scorers. This commit
- replaces the deprecated NumericRangeFilter, PrefixFilter, TermFilter and
TermsFilter with NumericRangeQuery, PrefixQuery, TermQuery and TermsQuery,
wrapped in a QueryWrapperFilter
- replaces XBooleanFilter, AndFilter and OrFilter with a BooleanQuery in a
QueryWrapperFilter
- removes DocIdSets.isBroken: the new two-phase iteration API will now help
execute slow filters efficiently
- replaces FilterCachingPolicy with QueryCachingPolicy
Close#8960
This option defaults to false, because it is also important to upgrade
the "merely old" segments since many Lucene improvements happen within
minor releases.
But you can pass true to do the minimal work necessary to upgrade to
the next major Elasticsearch release.
The HTTP GET upgrade request now also breaks out how many bytes of
ancient segments need upgrading.
Closes#10213Closes#10540
Conflicts:
dev-tools/create_bwc_index.py
rest-api-spec/api/indices.upgrade.json
src/main/java/org/elasticsearch/action/admin/indices/optimize/OptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/ShardOptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/TransportOptimizeAction.java
src/main/java/org/elasticsearch/index/engine/InternalEngine.java
src/test/java/org/elasticsearch/bwcompat/StaticIndexBackwardCompatibilityTest.java
src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
src/test/java/org/elasticsearch/rest/action/admin/indices/upgrade/UpgradeReallyOldIndexTest.java
We have two completely different code paths for mappings updates, depending on
whether they come from the API or are guessed based on the parsed documents.
This commit makes dynamic mappings updates execute like updates from the API.
The only change in behaviour is that a document that fails parsing can not
modify mappings anymore (useful to prevent issues such as #9851). Other than
that, this change should be fairly transparent to users but working this way
opens doors to other changes such as validating dynamic mappings updates on the
master node (#8688).
The way it works internally is that Mapper.parse now returns a Mapper instead
of being void. The returned Mapper represents a mapping update that has been
performed in order to parse the document. Mappings updates are propagated
recursively back to the root mapper, and once parsing is finished, we check
that the mappings update can be applied, and either fail the parsing if the
update cannot be merged (eg. because of a concurrent mapping update from the
API) or merge the update into the mappings.
However not all mappings updates can be applied recursively, `copy_to` for
instance can add mappings at totally different places in the tree. Because of
it I added ParseContext.rootMapperUpdates which `copy_to` fills when the
field to copy data to does not exist in the mappings yet. These mappings
updates are merged from the ones generated by regular parsing.
One particular mapping update was the `auto_boost` setting on the `all` root
mapper. Being tricky to work on, I removed it in favour of search-time checks
that payloads have been indexed.
One interesting side-effect of the change is that concurrency on ObjectMapper
is greatly simplified since we do not have to care anymore about having
concurrent dynamic mappings and API updates.
Allowing tests writing to the working directory can mask problems.
For example, multiple tests running in the same jvm, and using the
same relative path, may cause issues if the first test to run
leaves data in the directory, and the second test does not remember
to cleanup the path before using it.
This change adds security manager rules to disallow tests writing
to the working directory. Instead, tests create a temp dir with
the existing test framework.
closes#10605
The static old index tests currently take a long time to run because
each index version essentially recreates the cluster, and spins up
new nodes. This PR instead loads each old version into the existing
cluster as a dangling index. It also removes the intermediate
"StaticIndexBackwardCompatibilityTest" which was an extra layer
with no purpose, and moves a shared version of a commonly found
function to get an http client.
The test now takes between 40 and 60 seconds for me. I also ran it
"under stress" by running all ES tests in one shell, while
simultaneously running 10 iterations of the old index tests. Each
iteration took on average about 90 seconds, which is much better
than the 20+ minutes we see in master on jenkins.
closes#10247
1.1.0 is affected by #5817 which prevents merges from keeping up with the
indexing rate. As a consequence it generates lots of segments and makes bw
compat tests slow. So I added a special case for this version to index fewer
documents.
This pull request makes boolean handled like dates and ipv4 addresses: things
are stored as as numerics under the hood and aggregations add some special
formatting logic in order to return true/false in addition to 1/0.
For example, here is an output of a terms aggregation on a boolean field:
```
"aggregations": {
"top_f": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 2
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 1
}
]
}
}
```
Sorted numeric doc values are used under the hood.
Close#4678Close#7851
Allow to on/off scripting based on their source (where they get loaded from), the operation that executes them and their language.
The settings cover the following combinations:
- mode: on, off, sandbox
- source: indexed, dynamic, file
- engine: groovy, expressions, mustache, etc
- operation: update, search, aggs, mapping
The following settings are supported for every engine:
script.engine.groovy.indexed.update: sandbox/on/off
script.engine.groovy.indexed.search: sandbox/on/off
script.engine.groovy.indexed.aggs: sandbox/on/off
script.engine.groovy.indexed.mapping: sandbox/on/off
script.engine.groovy.dynamic.update: sandbox/on/off
script.engine.groovy.dynamic.search: sandbox/on/off
script.engine.groovy.dynamic.aggs: sandbox/on/off
script.engine.groovy.dynamic.mapping: sandbox/on/off
script.engine.groovy.file.update: sandbox/on/off
script.engine.groovy.file.search: sandbox/on/off
script.engine.groovy.file.aggs: sandbox/on/off
script.engine.groovy.file.mapping: sandbox/on/off
For ease of use, the following more generic settings are supported too:
script.indexed: sandbox/on/off
script.dynamic: sandbox/on/off
script.file: sandbox/on/off
script.update: sandbox/on/off
script.search: sandbox/on/off
script.aggs: sandbox/on/off
script.mapping: sandbox/on/off
These will be used to calculate the more specific settings, using the stricter setting of each combination. Operation based settings have precedence over conflicting source based ones.
Note that the `mustache` engine is affected by generic settings applied to any language, while native scripts aren't as they are static by definition.
Also, the previous `script.disable_dynamic` setting can now be deprecated.
Closes#6418Closes#10116Closes#10274
Lines in the code that should be removed before a release can be annotated with
//NORELEASE . This can be useful when debugging test failures. For example,
one might want to add additional logging that would be too verbose for production
and therfore should be removed before releasing.
closes#10141
This commit modifies the Kernel32Library to use direct mapping instead of a proxy class when doing native calls on Windows platforms. It also adds the "createSecurityManager" permission to the tests.policy file, and adds unit tests that should have failed when the Java security manager is enabled.
Closes#9802
Otherwise the fs repository metadata that points to non-existing location is stored in the old index cluster state, which causes warnings during OldIndexBackwardsCompatibilityTests.