Commit Graph

60 Commits

Author SHA1 Message Date
Luca Cavanna f4fb4d3bf5
Add support for filtering mappings fields (#27603)
Add support for filtering fields returned as part of mappings in get index, get mappings, get field mappings and field capabilities API.

Plugins can plug in their own function, which receives the index as argument, and return a predicate which controls whether each field is included or not in the returned output.
2017-12-05 20:31:29 +01:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Simon Willnauer 82fa531ab4 Remove `_index` fielddata hack if cluster alias is present (#26082)
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
2017-08-08 09:24:24 +02:00
Simon Willnauer 634ce90dc0 Respect cluster alias in `_index` aggs and queries (#25885)
Today when we aggregate on the `_index` field the cross cluster search
alias is not taken into account. Neither is it respected when we search
on the field. This change adds support for cluster alias when the cluster
alias is present on the `_index` field.

Closes #25606
2017-07-26 09:16:52 +02:00
Ryan Ernst 2a65bed243 Tests: Change rest test extension from .yaml to .yml (#24659)
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
2017-05-16 17:24:35 -07:00
Ryan Ernst 212f24aa27 Tests: Clean up rest test file handling (#21392)
This change simplifies how the rest test runner finds test files and
removes all leniency.  Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.

closes #20240
2017-04-18 15:07:08 -07:00
Colin Goodheart-Smithe 0114f0061c Removes version 2.x constants from Version (#24011)
* Removes version 2.x constants from Version

Closes #21887

* Addresses review comments
2017-04-11 08:31:22 +01:00
AdityaJNair 63757efe9c Remove DocumentMapper#parse(String index, String type, String id, BytesReference source) (#23706)
Removed `parse(String index, String type, String id, BytesReference source)` in DocumentMapper.java and replaced all of its use in Test files with `parse(SourceToParse source)`.

`parse(String index, String type, String id, BytesReference source)` was only used in test files and never in the main code so it was removed. All of the test files that used it was then modified to use `parse(SourceToParse source)` method that existing in DocumentMapper.java
2017-03-23 11:01:09 -04:00
Nik Everett f5f2149ff2 Remove much ceremony from parsing client yaml test suites (#22311)
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.

I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
2016-12-22 11:00:34 -05:00
Nik Everett a04dcfb95b Introduce XContentParser#namedObject (#22003)
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.

Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.

The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.

The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
2016-12-20 11:05:24 -05:00
Boaz Leskes fe01c0f83b fix TemplateQueryBuilderTests & Murmur3FieldMapperTests 2016-12-01 14:21:57 +01:00
Adrien Grand 6231009a8f Remove 2.x backward compatibility of mappings. (#21670)
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.

I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
2016-11-30 13:34:46 +01:00
Nicholas Knize af1ab68b64 Add RangeFieldMapper for numeric and date range types
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.

Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.

When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
2016-11-29 10:10:14 -06:00
Ryan Ernst 7a2c984bcc Test: Remove multi process support from rest test runner (#21391)
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
2016-11-07 15:07:34 -08:00
Adrien Grand aa6cd93e0f Require arguments for QueryShardContext creation. (#21196)
The `IndexService#newQueryShardContext()` method creates a QueryShardContext on
shard `0`, with a `null` reader and that uses `System.currentTimeMillis()` to
resolve `now`. This may hide bugs, since the shard id is sometimes used for
query parsing (it is used to salt random score generation in `function_score`),
passing a `null` reader disables query rewriting and for some use-cases, it is
simply not ok to rely on the current timestamp (eg. percolation). So this pull
request removes this method and instead requires that all call sites provide
these parameters explicitly.
2016-11-02 09:48:49 +01:00
Simon Willnauer fe1803c957 Remove AnalysisService and reduce it to a simple name to analyzer mapping (#20627)
Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitive. We fixed this in #19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.

Closes #19828
2016-09-23 08:53:50 +02:00
Jun Ohtani 450f47d5b5 Validate blank field name
add validation and validate only 5.0+
Add tests before 5.0

Closes #19251
2016-08-26 20:10:33 +09:00
Adrien Grand 3ed0da5a58 GET operations should not extract fields from `_source`. #20158
This makes GET operations more consistent with `_search` operations which expect
`(stored_)fields` to work on stored fields and source filtering to work on the
`_source` field. This is now possible thanks to the fact that GET operations
do not read from the translog anymore (#20102) and also allows to get rid of
`FieldMapper#isGenerated`.

The `_termvectors` API (and thus more_like_this too) was relying on the fact
that GET operations would extract fields from either stored fields or the source
so the logic to do this that used to exist in `ShardGetService` has been moved
to `TermVectorsService`. It would be nice that term vectors do not rely on this,
but this does not seem to be a low hanging fruit.
2016-08-26 10:35:23 +02:00
Adrien Grand 0d6ac57acf Collapse o.e.index.mapper packages. #19921
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.
2016-08-10 17:51:11 +02:00
Nik Everett 9270e8b22b Rename client yaml test infrastructure
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
2016-07-26 13:53:44 -04:00
Nik Everett a95d4f4ee7 Add Location header and improve REST testing
This adds a header that looks like `Location: /test/test/1` to the
response for the index/create/update API. The requirement for the header
comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

https://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative
URIs are OK. So we use an absolute path which should resolve to the
appropriate location.

Closes #19079

This makes large changes to our rest test infrastructure, allowing us
to write junit tests that test a running cluster via the rest client.
It does this by splitting ESRestTestCase into two classes:
* ESRestTestCase is the superclass of all tests that use the rest client
to interact with a running cluster.
* ESClientYamlSuiteTestCase is the superclass of all tests that use the
rest client to run the yaml tests. These tests are shared across all
official clients, thus the `ClientYamlSuite` part of the name.
2016-07-25 17:02:40 -04:00
Ryan Ernst e817b5daa3 Plugins: Remove guice from Mapper plugins
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
2016-06-21 22:50:39 -07:00
Ryan Ernst a4503c2aed Plugins: Remove name() and description() from api
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
2016-06-15 17:12:22 -07:00
Adrien Grand 864ed04059 Lessen leniency of the query dsl. #18276
This change does the following:
 - Queries that are currently unsupported such as prefix queries on numeric
   fields or term queries on geo fields now throw an error rather than returning
   a query that does not match anything.
 - Fuzzy queries on numeric, date and ip fields are now unsupported: they used
   to create range queries, we now expect users to use range queries directly.
   Fuzzy, regexp and prefix queries are now only supported on text/keyword
   fields (including `_all`).
 - The `_uid` and `_id` fields do not support prefix or range queries anymore as
   it would prevent us to store them more efficiently in the future, eg. by
   using a binary encoding.

Note that it is still possible to ignore these errors by using the `lenient`
option of the `match` or `query_string` queries.
2016-05-16 17:37:00 +02:00
Alexander Reelsen f71eb0b888 Version: Set version to 5.0.0-alpha2 2016-04-26 09:30:26 +02:00
Adrien Grand d84c643f58 Use the new points API to index numeric fields. #17746
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.

Notes about how the change works:
 - Numeric mappers have been split into a legacy version that is essentially
   the current mapper, and a new version that uses points, eg.
   LegacyDateFieldMapper and DateFieldMapper.
 - Since new and old fields have the same names, the decision about which one
   to use is made based on the index creation version.
 - If you try to force using a legacy field on a new index or a field that uses
   points on an old index, you will get an exception.
 - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
   in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
   and sortable).
 - The internal MappedFieldType that is stored by the new mappers does not have
   any of the points-related properties set. Instead, it keeps setting the index
   options when parsing the `index` property of mappings and does
   `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
   when parsing documents.

Known issues that won't fix:
 - You can't use numeric fields in significant terms aggregations anymore since
   this requires document frequencies, which points do not record.
 - Term queries on numeric fields will now return constant scores instead of
   giving better scores to the rare values.

Known issues that we could work around (in follow-up PRs, this one is too large
already):
 - Range queries on `ip` addresses only work if both the lower and upper bounds
   are inclusive (exclusive bounds are not exposed in Lucene). We could either
   decide to implement it, or drop range support entirely and tell users to
   query subnets using the CIDR notation instead.
 - Since IP addresses now use a different representation for doc values,
   aggregations will fail when running a terms aggregation on an ip field on a
   list of indices that contains both pre-5.0 and 5.0 indices.
 - The ip range aggregation does not work on the new ip field. We need to either
   implement range aggs for SORTED_SET doc values or drop support for ip ranges
   and tell users to use filters instead. #17700

Closes #16751
Closes #17007
Closes #11513
2016-04-14 17:56:23 +02:00
Adrien Grand 3bf6f4076c Do not set analyzers on numeric fields.
When it comes to query parsing, either a field is tokenized and it would go
through analysis with its search_analyzer. Or it is not tokenized and the
raw string should be passed to termQuery(). Since numeric fields are not
tokenized and also declare a search analyzer, values would currently go through
analysis twice...
2016-04-12 17:47:29 +02:00
Areek Zillur c3078f4d65 adapt tests to use index uuid as folder name 2016-03-14 23:24:24 -04:00
Simon Willnauer fdfb0e56f6 Remove bw compat from size mapper 2016-03-07 12:48:02 +01:00
Simon Willnauer f96900013c Remove bw compat from murmur3 mapper 2016-03-07 12:44:53 +01:00
Adrien Grand eef19be072 Deprecate string in favor of text/keyword. #16877
This commit removes the ability to use string fields on indices created on or
after 5.0. Dynamic mappings now generate text fields by default for strings
but there are plans to also add a sub keyword field (in a future PR).

Most of the changes in this commit are just about replacing string with
keyword or text. Some tests have been removed because they existed because of
corner cases of string mappings like setting ignore-above on a text field or
enabling term vectors on a keyword field which are now impossible.

The plan is to remove strings entirely in 6.0.
2016-03-03 10:20:56 +01:00
Simon Willnauer 96bcb47fc9 Detach QueryShardContext from IndexShard and remove obsolete threadlocals
IndexShard currently holds an arbitraritly used `getQueryShardContext` that comes
out of a ThreadLocal. It's usage is undefined and arbitraty since there is also
such a method with different semantics on `IndexService` This commit removes the threadLocal on
IndexShard as well as on the context itself. It's types are now a member and the QueryShardContext
lifecycle is managed byt SearchContext which passes the types on from the SearchRequest.
2016-02-08 15:05:03 +01:00
Daniel Mitterdorfer e9bb3d31a3 Convert "path.*" and "pidfile" to new settings infra 2016-01-22 15:14:13 +01:00
Simon Willnauer fbfa9f4925 Merge branch 'master' into new_index_settings 2016-01-19 10:13:48 +01:00
Ryan Ernst ef4f0a8699 Test: Make rest test framework accept http directly for the test cluster
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.

This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.

closes #15459
2016-01-18 16:44:14 -08:00
Simon Willnauer a8eedd0457 fix all kinds of crazy test failures 2016-01-18 09:23:34 +01:00
Simon Willnauer ceef6d7a42 fix Murmur3FieldMapperTests 2016-01-18 09:23:34 +01:00
Nik Everett 0786c506dc Remove a few more Xlint skips 2016-01-06 23:28:13 -05:00
Martijn van Groningen 2d6adf6428 Percolator refactoring:
* Added percolator field mapper that extracts the query terms and indexes these terms with the percolator query.
* At percolate time these extracted terms are used to query percolator queries that are like to be evaluated. This can significantly cut down the time it takes to percolate. Whereas before all percolator queries were evaluated if they matches with the document being percolated.
* Changes made to percolator queries are no longer immediately visible, a refresh needs to happen before the changes are visible.
* By default the percolate api only returns upto 10 matches instead of returning all matching percolator queries.
* Made percolate more modular, so that it is easier to add unit tests.
* Added unit tests for the percolator.

Closes #12664
Closes #13646
2016-01-06 16:08:10 +01:00
Adrien Grand ac393b7a31 Make mappings tests more realistic.
DocumentMapperParser has both parse and parseCompressed methods. Except that the
parse methods are ONLY used from the unit tests. This commit removes the parse
method and moves all tests to parseCompressed so that they test more
realistically how mappings are managed.

Then I renamed parseCompressed to parse given that this is the only alternative
anyway.
2015-12-21 10:44:00 +01:00
Ryan Ernst 4ea19995cf Remove wildcard imports 2015-12-18 12:43:47 -08:00
Adrien Grand 50eeafa75c Make mappings immutable.
Today mappings are mutable because of two APIs:
 - Mapper.merge, which expects changes to be performed in-place
 - IncludeInAll, which allows to change whether values should be put in the
   `_all` field in place.

This commit changes both APIs to return a modified copy instead of modifying in
place so that mappings can be immutable. For now, only the type-level object is
immutable, but in the future we can imagine making them immutable at the
index-level so that mapping updates could be completely atomic at the index
level.

Close #9365
2015-12-15 10:20:28 +01:00
David Pilato d23d8a891f Remove "empty" licenses dir
Follow up #15168

We don't need to have "fake" licenses dir anymore.
2015-12-02 10:22:52 +01:00
Simon Willnauer 65b661b1f4 [TEST] Fix MapperUpgrade tests to use a dedicated master to ensure dangeling index import works predictably
When importing dangling indices on a single node that is data and master eligable the async dangling index
call can still be in-flight when the cluster is checked for green / yellow. Adding a dedicated master node
and a data only node that does the importing fixes this issus just like we do in OldIndexBackwardsCompatibilityIT
2015-11-27 10:32:21 +01:00
Adrien Grand e8520bf519 Tests: For single data path for *FieldMapperUpgradeTests. 2015-11-25 11:46:19 +01:00
Adrien Grand aad84395c9 Add a test that upgrades succeed even if a mapping contains fields that come from a plugin. 2015-11-24 19:14:19 +01:00
Adrien Grand 5f33fbdb75 Register field mappers at the node level.
This moves the registration of field mappers from the index level to the node
level and also ensures that mappers coming from plugins are treated no
differently from core mappers.
2015-11-24 08:59:37 +01:00
Simon Willnauer 94bed42213 Simplify plugin API and fix IndexService internal allocation 2015-11-05 13:16:35 +01:00
Simon Willnauer 487af301ea Remove guice from the index level
This commit removes guice from the index level and adds a simple extension point
to add class instances with an index-lifecycle scope.
2015-11-05 11:18:11 +01:00
Ryan Ernst 542522531a Build: Remove maven pom files and supporting ant files
This change removes the leftover pom files. A couple files were left for
reference, namely in qa tests that have not yet been migrated (vagrant
and multinode). The deb and rpm assemblies also still exist for
reference when finishing their setup in gradle.

See #13930
2015-10-29 23:53:49 -07:00