This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.
The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.
As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.
In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.
See #19388
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
In some cases, it might happen that the `_all` field gets a field type that is
not totally configured, and in particular lacks analyzers. This is due to the
fact that `AllFieldMapper.TypeParser.getDefault` uses `Defaults.FIELD_TYPE` as
a default field type, which does not have any analyzers configured since it
does not know about the default analyzers.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.
Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.
When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
This adds a header that looks like `Location: /test/test/1` to the
response for the index/create/update API. The requirement for the header
comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.htmlhttps://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative
URIs are OK. So we use an absolute path which should resolve to the
appropriate location.
Closes#19079
This makes large changes to our rest test infrastructure, allowing us
to write junit tests that test a running cluster via the rest client.
It does this by splitting ESRestTestCase into two classes:
* ESRestTestCase is the superclass of all tests that use the rest client
to interact with a running cluster.
* ESClientYamlSuiteTestCase is the superclass of all tests that use the
rest client to run the yaml tests. These tests are shared across all
official clients, thus the `ClientYamlSuite` part of the name.
This change activates the doc_values on the _size field for indices created after 5.0.0-alpha4.
It also adds a note in the breaking changes that explain the situation and how to get around it.
Closes#18334
Rename `fields` to `stored_fields` and add `docvalue_fields`
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
This removes dead/duplicate code and makes the `_index` field not configurable.
(Configuration used to jus be ignored, now we would throw an exception if any
is provided.)
Now that the current uses of magical camelCase support have been
deprecated, we can remove these in master (sans remaining issues like
BulkRequest). This change removes camel case support from ParseField,
query types, analysis, and settings lookup.
see #8988
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.
Notes about how the change works:
- Numeric mappers have been split into a legacy version that is essentially
the current mapper, and a new version that uses points, eg.
LegacyDateFieldMapper and DateFieldMapper.
- Since new and old fields have the same names, the decision about which one
to use is made based on the index creation version.
- If you try to force using a legacy field on a new index or a field that uses
points on an old index, you will get an exception.
- IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
and sortable).
- The internal MappedFieldType that is stored by the new mappers does not have
any of the points-related properties set. Instead, it keeps setting the index
options when parsing the `index` property of mappings and does
`if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
when parsing documents.
Known issues that won't fix:
- You can't use numeric fields in significant terms aggregations anymore since
this requires document frequencies, which points do not record.
- Term queries on numeric fields will now return constant scores instead of
giving better scores to the rare values.
Known issues that we could work around (in follow-up PRs, this one is too large
already):
- Range queries on `ip` addresses only work if both the lower and upper bounds
are inclusive (exclusive bounds are not exposed in Lucene). We could either
decide to implement it, or drop range support entirely and tell users to
query subnets using the CIDR notation instead.
- Since IP addresses now use a different representation for doc values,
aggregations will fail when running a terms aggregation on an ip field on a
list of indices that contains both pre-5.0 and 5.0 indices.
- The ip range aggregation does not work on the new ip field. We need to either
implement range aggs for SORTED_SET doc values or drop support for ip ranges
and tell users to use filters instead. #17700Closes#16751Closes#17007Closes#11513
When it comes to query parsing, either a field is tokenized and it would go
through analysis with its search_analyzer. Or it is not tokenized and the
raw string should be passed to termQuery(). Since numeric fields are not
tokenized and also declare a search analyzer, values would currently go through
analysis twice...
Closes#16964
Squashed commit of the following:
commit a23f9d2d29220991aa498214530753d7a5a148c6
Merge: eec9c4e 0b0a251
Author: Robert Muir <rmuir@apache.org>
Date: Mon Mar 7 04:12:02 2016 -0500
Merge branch 'master' into lucene6
commit eec9c4e5cd11e9c3e0b426f04894bb2a6dae4f21
Merge: bc67205 675d940
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 13:45:00 2016 -0500
Merge branch 'master' into lucene6
commit bc67205bdfe1526eae277ab7856fc050ecbdb7b2
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 09:56:31 2016 -0500
fix test bug
commit a60723b007ff12d97b1810cef473bd7b553a0327
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:35:35 2016 +0100
Fix SimpleValidateQueryIT to put braces around boosted terms
commit ae3a49d7ba7ced448d2a5262e5d8ec98671a9090
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:27:25 2016 +0100
fix multimatchquery
commit ae23fdb88a8f6d3fb7ba60fd1aaf3fd72d899aa5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:20:49 2016 +0100
Rewrite DecayFunctionScoreIT to be independent of the similarity used
This test relied a lot on the term scoring and compared scores
that are dependent on the similarity. This commit changes the base query
to be a predictable constant score query.
commit 366c2d518c35d31251033f1b6f6a93f6e2ae327d
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 14:06:14 2016 +0100
Fix scoring in tests due to changes to idf calculation.
Lucene 6 uses a different default similarity as well as a different
way to calculate IDF. In contrast to older version lucene 6 uses docCount per field
to calculate the IDF not the # of docs in the index to overcome the sparse field
cases.
commit dac99fd64ac2fa71b8d8d106fe68825e574c49f8
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:21:57 2016 -0500
don't hardcoded expected termquery score
commit 6e9f340ba49ab10eed512df86d52a121aa775b0f
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:04:45 2016 -0500
suppress deprecation warning until migrated to points
commit 3ac8908424b3fdad44a90a4f7bdb3eff7efd077d
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:21:43 2016 -0500
Remove invalid test: all commits have IDs, and its illegal to do this.
commit c12976288124ad1a26467e7e848fb810548e7eab
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:06:14 2016 -0500
don't test with unsupported back compat
commit 18bbfe76128570bc70883bf91ff4c44c82d27817
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:02:18 2016 -0500
remove now invalid lucene 4 backcompat test
commit 7e730e572886f0ef2d3faba712e4256216ff01ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:58:52 2016 -0500
remove now invalid lucene 4 backwards test
commit 244d2ab6868ba5ac9e0bcde3c2833743751a25ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:47:23 2016 -0500
use 6.0 codec
commit 5f64d4a431a6fdaa1234adca23f154c2a1de8284
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:43:08 2016 -0500
compile, javadocs, forbidden-apis, etc
commit 1f273cd62a7fe9ca8f8944acbbfc5cbdd3d81ccb
Merge: cd33921 29e3443
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 10:45:29 2016 +0100
Merge branch 'master' into lucene6
commit cd33921ac742ef9fb351012eff35f3c7dbda7264
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:58:37 2016 -0500
fix hunspell dictionary loading
commit c7fdbd837b01f7defe9cb1c24e2ec65604b0dc96
Merge: 4d4190f d8948ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:41:53 2016 -0500
Merge branch 'master' into lucene6
commit 4d4190fd82601aaafac6b8254ccb3edf218faa34
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:39:14 2016 -0500
remove nocommit
commit 77ca69e288b1a41aa9595c921ed166c272a00ea8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:38:24 2016 -0500
clean up numericutils vs legacynumericutils
commit a466d696fbaad04b647ffbc0857a9439b583d0bf
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:32:43 2016 -0500
upgrade spatial4j
commit 5412c747a8cfe638bacedbc8233163cb75cc3dc5
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:19:28 2016 -0500
move to 6.0.0-snapshot-8eada27
commit b32bfe924626b87e540692375ece09e7c2edb189
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:30:09 2016 +0100
Fix some test compile errors.
commit 6ccde35e9840b03c68d1a2cd47c7923a06edf64a
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:25:51 2016 +0100
Current Lucene version is 6.0.0.
commit f62e1015d931b4cc04c778298a8fa1ba65e97ad9
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:20:48 2016 +0100
Fix compile errors in NGramTokenFilterFactory.
commit 6837c6eabf96075f743649da9b9b52dd39611c58
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:50:59 2016 +0100
Fix the edge ngram tokenizer/filter.
commit ccd7f070de5efcdfbeb34b9555c65c4990bf1ba6
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:42:44 2016 +0100
The missing value is now accessible through a getter.
commit bd3b77f9b28e5b05daa3d49683a9922a6baf2963
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:41:51 2016 +0100
Remove IndexCacheableQuery.
commit 05f3091c347aeae80eeb16349ac51d2b53cf86f7
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:39:43 2016 +0100
Fix compilation of function_score queries.
commit 81cda79a2431ac78f56b0cc5a5765387f662d801
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:35:02 2016 +0100
Fix compile errors in BlendedTermQuery.
commit 70994ce8dd1eca0b995870974a38e20f26f96a7b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 23:33:03 2016 -0500
add bug ID
commit 29d4f1a71f36f646b5a6060bed3db019564a279d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 21:02:32 2016 -0500
easy .store changes
commit 5e1a1e6fd665fa455e88d3a8987362fad5f44bb1
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:47:24 2016 -0500
cleanups mostly around boosting
commit 333a669ec6c305ada5645d13ed1da0e19ec1d053
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:27:56 2016 -0500
more simple fixes
commit bd5cd98a1e089c866b6b4a5e159400b110140ce6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:49:38 2016 -0500
more easy fixes and removal of ancient cruft
commit a68f419ee47da5f9c9ce5b372f01d707e902474c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:35:02 2016 -0500
cutover numerics
commit 4ca5dc1fa47dd5892db00899032133318fff3116
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:34:18 2016 -0500
fix some constants
commit 88710a17817086e477c6c021ec346d0534b7fb88
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:14:25 2016 -0500
Add spatial-extras jar as a core dependency
commit c8cd6726583e5ce3f546ed355d4eca037164a30d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:03:33 2016 -0500
update to lucene 6 jars
This commit removes the ability to use string fields on indices created on or
after 5.0. Dynamic mappings now generate text fields by default for strings
but there are plans to also add a sub keyword field (in a future PR).
Most of the changes in this commit are just about replacing string with
keyword or text. Some tests have been removed because they existed because of
corner cases of string mappings like setting ignore-above on a text field or
enabling term vectors on a keyword field which are now impossible.
The plan is to remove strings entirely in 6.0.
IndexShard currently holds an arbitraritly used `getQueryShardContext` that comes
out of a ThreadLocal. It's usage is undefined and arbitraty since there is also
such a method with different semantics on `IndexService` This commit removes the threadLocal on
IndexShard as well as on the context itself. It's types are now a member and the QueryShardContext
lifecycle is managed byt SearchContext which passes the types on from the SearchRequest.
Parsing is currently very lenient, which has the bad side-effect that if you
have a typo and pass eg. `store: fasle` this will actually be interpreted as
`store: true`. Since mappings can't be changed after the fact, it is quite bad
if it happens on an index that already contains data.
Note that this does not cover all settings that accept a boolean, but since the
PR was quite hard to build and already covers some main settirgs like `store`
or `doc_values` this would already be a good incremental improvement.
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.
This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.
closes#15459
This would be useful in order to only perform some validations in the case of
a mapping update and in cases when a mapping is restored eg. after a restart,
such as discussed in #15989.
This replaces the current `applyDefault` parameter which can be derived from
the mapping merge reason: the default mapping should be applied only in case of
a mapping update, if the mapping does not exist yet and if this is not the
default mapping.
* Added percolator field mapper that extracts the query terms and indexes these terms with the percolator query.
* At percolate time these extracted terms are used to query percolator queries that are like to be evaluated. This can significantly cut down the time it takes to percolate. Whereas before all percolator queries were evaluated if they matches with the document being percolated.
* Changes made to percolator queries are no longer immediately visible, a refresh needs to happen before the changes are visible.
* By default the percolate api only returns upto 10 matches instead of returning all matching percolator queries.
* Made percolate more modular, so that it is easier to add unit tests.
* Added unit tests for the percolator.
Closes#12664Closes#13646
This removes the backward compatibility layer with pre-2.0 indices, notably
the extraction of _id, _routing or _timestamp from the source document when a
path is defined.
This changes a couple of things:
Mappings are truly immutable. Before, each field mapper stored a
MappedFieldTypeReference that was shared across fields that have the same name
across types. This means that a mapping update could have the side-effect of
changing the field type in other types when updateAllTypes is true. This works
differently now: after a mapping update, a new copy of the mappings is created
in such a way that fields across different types have the same MappedFieldType.
See the new Mapper.updateFieldType API which replaces MappedFieldTypeReference.
DocumentMapper is now immutable and MapperService.merge has been refactored in
such a way that if an exception is thrown while eg. lookup structures are being
updated, then the whole mapping update will be aborted. As a consequence,
FieldTypeLookup's checkCompatibility has been folded into copyAndAddAll.
Synchronization was simplified: given that mappings are truly immutable, we
don't need the read/write lock so that no documents can be parsed while a
mapping update is being processed. Document parsing is not performed under a
lock anymore, and mapping merging uses a simple synchronized block.
DocumentMapperParser has both parse and parseCompressed methods. Except that the
parse methods are ONLY used from the unit tests. This commit removes the parse
method and moves all tests to parseCompressed so that they test more
realistically how mappings are managed.
Then I renamed parseCompressed to parse given that this is the only alternative
anyway.
When creating a metadata mapper for a new type, we reuse an existing
configuration from an existing type (if any) in order to avoid introducing
conflicts. However this field type that is provided is considered as both an
initial configuration and the default configuration. So at serialization time,
we might only serialize the difference between the current configuration and
this default configuration, which might be different to what is actually
considered the default configuration.
This does not cause bugs today because metadata mappers usually override the
toXContent method and compare the current field type with Defaults.FIELD_TYPE
instead of defaultFieldType() but I would still like to do this change to
avoid future bugs.