With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.
This allows us to remove the MapperService.getLuceneFieldType() shim method.
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.
Fixes#58188
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.
This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.
Relates to #56814
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.
This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
Backport of #55115.
Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
* Comprehensively test supported/unsupported field type:agg combinations (#52493)
This adds a test to AggregatorTestCase that allows us to programmatically
verify that an aggregator supports or does not support a particular
field type. It fetches the list of registered field type parsers,
creates a MappedFieldType from the parser and then attempts to run
a basic agg against the field.
A supplied list of supported VSTypes are then compared against the
output (success or exception) and suceeds or fails the test accordingly.
Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com>
* Skip fields that are not aggregatable
* Use newIndexSearcher() to avoid incompatible readers (#52723)
Lucene's `newSearcher()` can generate readers like ParallelCompositeReader
which we can't use. We need to instead use our helper `newIndexSearcher`
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.
Closes#43599
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.
As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.
Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.
The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
This commit moves the aggregation and mapping code from joda time to
java time. This includes field mappers, root object mappers, aggregations with date
histograms, query builders and a lot of changes within tests.
The cut-over to java time is a requirement so that we can support nanoseconds
properly in a future field mapper.
Relates #27330
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
The different `DocValueFormat` implementors throw `UnsupportedOperationException` for methods that they don't support. That is perfectly fine, and quite common as not all implementors support all of the possible formats. This makes it hard though to trace back which implementors support which formats as they all implement the same methods.
This commit introduces default methods in the `DocValueFormat` interface so that all methods throw `UnsupportedOperationException` by default. This way implementors can override only the methods that they specifically support.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
The ICU plugin provides the building blocks of an analysis chain, but doesn't actually have a prebuilt analyzer. It would be a better for users if there was a simple analyzer that they could use out of the box, and also something we can point to from the CJK Analyzer docs as a superior alternative.
Relates to #34285
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
This change cleans up "unused variable" warnings. There are several cases were we
most likely want to suppress the warnings (especially in the client documentation test
where the snippets contain many unused variables). In a lot of cases the unused
variables can just be deleted though.
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
* master:
Painless: Simplify Naming in Lookup Package (#32177)
Handle missing values in painless (#32207)
add support for write index resolution when creating/updating documents (#31520)
ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
Remove indication of future multi-homing support (#32187)
Rest test - allow for snapshots to take 0 milliseconds
Make x-pack-core generate a pom file
Rest HL client: Add put watch action (#32026)
Build: Remove pom generation for plugin zip files (#32180)
Fix comments causing errors with Java 11
Fix rollup on date fields that don't support epoch_millis (#31890)
Detect and prevent configuration that triggers a Gradle bug (#31912)
[test] port linux package packaging tests (#31943)
Revert "Introduce a Hashing Processor (#31087)" (#32178)
Remove empty @return from JavaDoc
Adjust SSLDriver behavior for JDK11 changes (#32145)
[test] use randomized runner in packaging tests (#32109)
Add support for field aliases. (#32172)
Painless: Fix caching bug and clean up addPainlessClass. (#32142)
Call setReferences() on custom referring tokenfilters in _analyze (#32157)
Fix BwC Tests looking for UUID Pre 6.4 (#32158)
Improve docs for search preferences (#32159)
use before instead of onOrBefore
Add more contexts to painless execute api (#30511)
Add EC2 credential test for repository-s3 (#31918)
A replica can be promoted and started in one cluster state update (#32042)
Fix Java 11 javadoc compile problem
Fix CP for namingConventions when gradle home has spaces (#31914)
Fix `range` queries on `_type` field for singe type indices (#31756)
[DOCS] Update TLS on Docker for 6.3 (#32114)
ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
Remove versionType from translog (#31945)
Switch distribution to new style Requests (#30595)
Build: Skip jar tests if jar disabled
Painless: Add PainlessClassBuilder (#32141)
Build: Make additional test deps of check (#32015)
Disable C2 from using AVX-512 on JDK 10 (#32138)
Build: Move shadow customizations into common code (#32014)
Painless: Fix Bug with Duplicate PainlessClasses (#32110)
Remove empty @param from Javadoc
Re-disable packaging tests on suse boxes
Docs: Fix missing example script quote (#32010)
[ML] Wait for aliases in multi-node tests (#32086)
[ML] Move analyzer dependencies out of categorization config (#32123)
Ensure to release translog snapshot in primary-replica resync (#32045)
Handle TokenizerFactory TODOs (#32063)
Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
Updates the build to gradle 4.9 (#32087)
Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
Check that client methods match API defined in the REST spec (#31825)
Enable testing in FIPS140 JVM (#31666)
Fix put mappings java API documentation (#31955)
Add exclusion option to `keep_types` token filter (#32012)
[Test] Modify assert statement for ssl handshake (#32072)
* master:
Avoid randomization bug in FeatureAwareTests
Adjust BWC version on client features
Add TRACE, CONNECT, and PATCH http methods (#31035)
Adjust BWC version on client features
[DOCS] Make geoshape docs less memory hungry (#31014)
Fix handling of percent-encoded spaces in Windows batch files (#31034)
[Docs] Fix a typo in Create Index naming limitation (#30891)
Introduce client feature tracking (#31020)
Ensure that index_prefixes settings cannot be changed (#30967)
REST high-level client: add delete ingest pipeline API (#30865)
[ML][TEST] Fix bucket count assertion in all tests in ModelPlotsIT (#31026)
Allow rollup job creation only if cluster is x-pack ready (#30963)
Fix interoperability with < 6.3 transport clients (#30971)
Add an option to split keyword field on whitespace at query time (#30691)
[Tests] Fix alias names in PutIndexTemplateRequestTests (#30960)
REST high-level client: add get ingest pipeline API (#30847)
Cross Cluster Search: preserve remote status code (#30976)
High-level client: list tasks failure to not lose nodeId (#31001)
[DOCS] Fixes links (#31011)
Watcher: Give test a little more time
Reuse expiration date of trial licenses (#30950)
Remove unused query methods from MappedFieldType. (#30987)
Transport client: Don't validate node in handshake (#30737)
[DOCS] Clarify not all PKCS12 usable as truststores (#30750)
HLRest: Allow caller to set per request options (#30490)
Remove version read/write logic in Verify Response (#30879)
[DOCS] Update readme for testing x-pack code snippets (#30696)
Ensure intended key is selected in SamlAuthenticatorTests (#30993)
Core: Remove RequestBuilder from Action (#30966)
* es/master: (24 commits)
Add missing_bucket option in the composite agg (#29465)
Rename index_prefix to index_prefixes (#30932)
Rename methods in PersistentTasksService (#30837)
[DOCS] Fix watcher file location
Update the version checks around range bucket keys, now that the change was backported.
Use dedicated ML APIs in tests (#30941)
[DOCS] Remove reference to platinum Docker image (#30916)
Minor clean-up in InternalRange. (#30886)
stable filemode for zip distributions (#30854)
[DOCS] Adds missing TLS settings for auditing (#30822)
[test] packaging: use shell when running commands (#30852)
Fix location of AbstractHttpServerTransport (#30888)
[test] packaging test logging for suse distros
Moved keyword tokenizer to analysis-common module (#30642)
Upgrade to Lucene-7.4-snapshot-1cbadda4d3 (#30928)
Limit the scope of BouncyCastle dependency (#30358)
[DOCS] Reset edit links (#30909)
Fix IndexTemplateMetaData parsing from xContent (#30917)
Remove log traces in AzureStorageServiceImpl and fix test (#30924)
Deprecate accepting malformed requests in stored script API (#28939)
...
Upgrade to lucene-7.4.0-snapshot-1ed95c097b
This version contains:
* An Analyzer for Korean
* An IntervalQuery and IntervalsSource that retrieve minimum intervals of positional queries.
* A new API to retrieve matches (offsets and positions) of a query for a single document.
* Support for soft deletes in the index writer.
* A fixed shingle filter that handles index time synonyms.
* Support for emoji sequence in ICUTokenizer (with an upgrade to icu 61.1)
This *mostly* silences `javadoc`'s warning about defaulting to
generating html4 files by enabling generating html5 file for the
projects for which that works. It didn't work in a half dozen projects,
about half of which I've fixed in this PR, entirely by replacing
`<tt>thing</tt>` with `{@code thing}`.
There are a few remaining projects that contain javadoc with invalid
html5. I'll fix those projects in a followup.