Commit Graph

98 Commits

Author SHA1 Message Date
Nik Everett a04dcfb95b Introduce XContentParser#namedObject (#22003)
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.

Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.

The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.

The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
2016-12-20 11:05:24 -05:00
Yannick Welsch 63af03a104 Atomic mapping updates across types (#22220)
This commit makes mapping updates atomic when multiple types in an index are updated. Mappings for an index are now applied in a single atomic operation, which also allows to optimize some of the cross-type updates and checks.
2016-12-19 14:39:50 +01:00
Daniel Mitterdorfer 7e5058037b Enable strict duplicate checks for JSON content
With this commit we enable the Jackson feature 'STRICT_DUPLICATE_DETECTION'
by default. This ensures that JSON keys are always unique. While this has
a performance impact, benchmarking has indicated that the typical drop in
indexing throughput is around 1 - 2%.

As a last resort, we allow users to still disable strict duplicate checks
by setting `-Des.json.strict_duplicate_detection=false` which is
intentionally undocumented.

Closes #19614
2016-12-14 09:35:53 +01:00
Nik Everett 49bdd29f91 Consolidate more parser creation into ESTestCase
This will make it easier to add the forthcoming required argument,
`NamedXContentRegistry`.
2016-12-13 20:28:41 -05:00
Luca Cavanna 6d987a9b69 Remove support for empty queries (#22092)
Our query DSL supports empty queries (`{}`), which have a different meaning depending on the query that holds it, either ignored, match_all or match_none. We deprecated the support for empty queries in 5.0, where we log a deprecation warning wherever they are used.

The way we supported it once we moved query parsing to the coordinating node was having an Optional<QueryBuilder> return type in all of our parse methods (called fromXContent). See #17624. The central place for this was QueryParseContext#parseInnerQueryBuilder. We can now remove all the optional return types and simply throw an exception whenever an empty query is found.
2016-12-12 12:37:12 +01:00
Nik Everett 3adefb7b4a Begin centralizing XContentParser creation into RestRequest (#22041)
To get #22003 in cleanly we need to centralize as much `XContentParser` creation as possible into `RestRequest`. That'll mean we have to plumb the `NamedXContentRegistry` into fewer places.

This removes `RestAction.hasBody`, `RestAction.guessBodyContentType`, and `RestActions.getRestContent`, moving callers over to `RestRequest.hasContentOrSourceParam`, `RestRequest.contentOrSourceParam`, and `RestRequest.contentOrSourceParamParser` and `RestRequest.withContentOrSourceParamParserOrNull`. The idea is to use `withContentOrSourceParamParserOrNull` if you need to handle requests without any sort of body content and to use `contentOrSourceParamParser` otherwise.

I believe the vast majority of this PR to be purely mechanical but I know I've made the following behavioral change (I'll add more if I think of more):
* If you make a request to an endpoint that requires a request body and has cut over to the new APIs instead of getting `Failed to derive xcontent` you'll get `Body required`.
* Template parsing is now non-strict by default. This is important because we need to be able to deprecate things without requests failing.
2016-12-09 20:23:02 -05:00
Lee Hinman ef64d230e7 Merge remote-tracking branch 'dakrone/index-seq-id-and-primary-term' 2016-12-08 19:47:21 -07:00
Lee Hinman ee22a477df Add internal _primary_term doc values field, fix _seq_no indexing
This adds the `_primary_term` field internally to the mappings. This field is
populated with the current shard's primary term.

It is intended to be used for collision resolution when two document copies have
the same sequence id, therefore, doc_values for the field are stored but the
filed itself is not indexed.

This also fixes the `_seq_no` field so that doc_values are retrievable (they
were previously stored but irretrievable) and changes the `stats` implementation
to more efficiently use the points API to retrieve the min/max instead of
iterating on each doc_value value. Additionally, even though we intend to be
able to search on the field, it was previously not searchable. This commit makes
it searchable.

There is no user-visible `_primary_term` field. Instead, the fields are
updated by calling:

```java
index.parsedDoc().updateSeqID(seqNum, primaryTerm);
```

This includes example methods in `Versions` and `Engine` for retrieving the
sequence id values from the index (see `Engine.getSequenceID`) that are only
used in unit tests. These will be extended/replaced by actual implementations
once we make use of sequence numbers as a conflict resolution measure.

Relates to #10708
Supercedes #21480

P.S. As a side effect of this commit, `SlowCompositeReaderWrapper` cannot be
used for documents that contain `_seq_no` because it is a Point value and SCRW
cannot wrap documents with points, so the tests have been updated to loop
through the `LeafReaderContext`s now instead.
2016-12-08 19:47:03 -07:00
Christoph Büscher 7454a9647b Add fromXContent to HighlightField
This adds a fromXContent method and unit test to the HighlightField class so we
can parse it as part of a serch response. This is part of the preparation for
parsing search responses on the client side.
2016-12-07 16:32:44 +01:00
Luca Cavanna 5b8bdba12e Remove subrequests method from CompositeIndicesRequest (#21873) 2016-11-30 15:03:58 +01:00
Adrien Grand 6231009a8f Remove 2.x backward compatibility of mappings. (#21670)
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.

I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
2016-11-30 13:34:46 +01:00
Nicholas Knize af1ab68b64 Add RangeFieldMapper for numeric and date range types
Lucene 6.2 added index and query support for numeric ranges. This commit adds a new RangeFieldMapper for indexing numeric (int, long, float, double) and date ranges and creating appropriate range and term queries. The design is similar to NumericFieldMapper in that it uses a RangeType enumerator for implementing the logic specific to each type. The following range types are supported by this field mapper: int_range, float_range, long_range, double_range, date_range.

Lucene does not provide a DocValue field specific to RangeField types so the RangeFieldMapper implements a CustomRangeDocValuesField for handling doc value support.

When executing a Range query over a Range field, the RangeQueryBuilder has been enhanced to accept a new relation parameter for defining the type of query as one of: WITHIN, CONTAINS, INTERSECTS. This provides support for finding all ranges that are related to a specific range in a desired way. As with other spatial queries, DISJOINT can be achieved as a MUST_NOT of an INTERSECTS query.
2016-11-29 10:10:14 -06:00
Ryan Ernst 6940b2b8c7 Remove groovy scripting language (#21607)
* Scripting: Remove groovy scripting language

Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
2016-11-22 19:24:12 -08:00
Boaz Leskes 2c0338fa87 Merge remote-tracking branch 'upstream/master' into feature/seq_no 2016-11-15 17:09:08 +00:00
Adrien Grand df4482fdc8 Do not cache the QueryShardContext in PercolatorFieldMapper: it is cheap to create. 2016-11-15 15:45:18 +01:00
Adrien Grand 54809065a6 Make PercolatorFieldMapper get a QueryShardContext lazily. 2016-11-15 12:02:40 +01:00
Boaz Leskes c9f49039d3 Merge remote-tracking branch 'upstream/master' into feature/seq_no 2016-11-15 10:14:47 +00:00
Ryan Ernst d14c470b89 Remove generics from ActionRequest
closes #21368
2016-11-14 15:32:01 -08:00
Jason Tedor d3417fb022 Merge branch 'master' into feature/seq_no
* master: (516 commits)
  Avoid angering Log4j in TransportNodesActionTests
  Add trace logging when aquiring and releasing operation locks for replication requests
  Fix handler name on message not fully read
  Remove accidental import.
  Improve log message in TransportNodesAction
  Clean up of Script.
  Update Joda Time to version 2.9.5 (#21468)
  Remove unused ClusterService dependency from SearchPhaseController (#21421)
  Remove max_local_storage_nodes from elasticsearch.yml (#21467)
  Wait for all reindex subtasks before rethrottling
  Correcting a typo-Maan to Man-in README.textile (#21466)
  Fix InternalSearchHit#hasSource to return the proper boolean value (#21441)
  Replace all index date-math examples with the URI encoded form
  Fix typos (#21456)
  Adapt ES_JVM_OPTIONS packaging test to ubuntu-1204
  Add null check in InternalSearchHit#sourceRef to prevent NPE (#21431)
  Add VirtualBox version check (#21370)
  Export ES_JVM_OPTIONS for SysV init
  Skip reindex rethrottle tests with workers
  Make forbidden APIs be quieter about classpath warnings (#21443)
  ...
2016-11-10 23:40:33 -05:00
Jack Conradson aeb97ff412 Clean up of Script.
Closes #21321
2016-11-10 09:59:13 -08:00
Ryan Ernst 7a2c984bcc Test: Remove multi process support from rest test runner (#21391)
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
2016-11-07 15:07:34 -08:00
Adrien Grand aa6cd93e0f Require arguments for QueryShardContext creation. (#21196)
The `IndexService#newQueryShardContext()` method creates a QueryShardContext on
shard `0`, with a `null` reader and that uses `System.currentTimeMillis()` to
resolve `now`. This may hide bugs, since the shard id is sometimes used for
query parsing (it is used to salt random score generation in `function_score`),
passing a `null` reader disables query rewriting and for some use-cases, it is
simply not ok to rely on the current timestamp (eg. percolation). So this pull
request removes this method and instead requires that all call sites provide
these parameters explicitly.
2016-11-02 09:48:49 +01:00
Jack Conradson 512a77a633 Refactor ScriptType to be a top-level class. 2016-10-26 10:21:22 -07:00
Jim Ferenczi c80a563a71 Replace org.elasticsearch.common.lucene.search.MatchNoDocsQuery with its Lucene version (org.apache.lucene.search.MatchNoDocsQuery) (#20832)
* Replace org.elasticsearch.common.lucene.search.MatchNoDocsQuery with its Lucene version (org.apache.lucene.search.MatchNoDocsQuery)

This change removes the ES version of the match no docs query and replaces it with the Lucene version.

relates #18030

* Add missing change
2016-10-10 17:45:19 +02:00
Simon Willnauer 9c9afe3f01 Remove SearchContext#current and all it's threadlocals (#20778)
Today SearchContext expose the current context as a thread local which makes any kind of sane interface design very very hard. This PR removes the thread local entirely and instead passes the relevant context anywhere needed. This simplifies state management dramatically and will allow for a much leaner SearchContext interface down the road.
2016-10-06 19:51:54 +02:00
Simon Willnauer ce21b607bb move test to a single node test 2016-10-05 21:55:50 +02:00
Simon Willnauer 838c28eeb4 add percolate with script query test 2016-10-05 20:43:46 +02:00
Simon Willnauer 57afbadf33 PercolateQuery is never cacheable 2016-10-05 16:38:47 +02:00
Colin Goodheart-Smithe 7bffe95025 Fix percolator queries to not be cacheable 2016-10-05 15:03:29 +01:00
Jason Tedor 51d53791fe Remove lenient URL parameter parsing
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).

This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.

Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.

Relates #20722
2016-10-04 12:45:29 -04:00
Jason Tedor 25fd9e26c4 Merge branch 'master' into feature/seq_no
* master: (1199 commits)
  [DOCS] Remove non-valid link to mapping migration document
  Revert "Default `include_in_all` for numeric-like types to false"
  test: add a test with ipv6 address
  docs: clearify that both ip4 and ip6 addresses are supported
  Include complex settings in settings requests
  Add production warning for pre-release builds
  Clean up confusing error message on unhandled endpoint
  [TEST] Increase logging level in testDelayShards()
  change health from string to enum (#20661)
  Provide error message when plugin id is missing
  Document that sliced scroll works for reindex
  Make reindex-from-remote ignore unknown fields
  Remove NoopGatewayAllocator in favor of a more realistic mock (#20637)
  Remove Marvel character reference from guide
  Fix documentation for setting Java I/O temp dir
  Update client benchmarks to log4j2
  Changes the API of GatewayAllocator#applyStartedShards and (#20642)
  Removes FailedRerouteAllocation and StartedRerouteAllocation
  IndexRoutingTable.initializeEmpty shouldn't override supplied primary RecoverySource (#20638)
  Smoke tester: Adjust to latest changes (#20611)
  ...
2016-09-29 00:22:31 +02:00
Simon Willnauer fe1803c957 Remove AnalysisService and reduce it to a simple name to analyzer mapping (#20627)
Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitive. We fixed this in #19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.

Closes #19828
2016-09-23 08:53:50 +02:00
Jim Ferenczi 1764ec56b3 Fixed naming inconsistency for fields/stored_fields in the APIs (#20166)
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.

The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain

The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain

The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.

Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.

Fixes #20155
2016-09-13 20:54:41 +02:00
Lee Hinman 94625d74e4 No longer allow cluster name in data path
In 5.x we allowed this with a deprecation warning. This removes the code
added for that deprecation, requiring the cluster name to not be in the
data path.

Resolves #20391
2016-09-12 15:47:01 -06:00
javanna 90ab460fcc move parsing of search ext sections to the coordinating node 2016-09-09 19:10:42 +02:00
Martijn van Groningen 245882cde3 * Removed `script.default_lang` setting and made `painless` the hardcoded default script language.
** The default script language is now maintained in `Script` class.
* Added `script.legacy.default_lang` setting that controls the default language for scripts that are stored inside documents (for example percolator queries).  This defaults to groovy.
** Added `QueryParseContext#getDefaultScriptLanguage()` that manages the default scripting language. Returns always `painless`, unless loading query/search request in legacy mode then the returns what is configured in `script.legacy.default_lang` setting.
** In the aggregation parsing code added `ParserContext` that also holds the default scripting language like `QueryParseContext`. Most parser don't have access to `QueryParseContext`. This is for scripts in aggregations.
* The `lang` script field is always serialized (toXContent).

Closes #20122
2016-09-06 18:44:48 +02:00
Jason Tedor e166459bbe Merge branch 'master' into log4j2
* master:
  Increase visibility of deprecation logger
  Skip transport client plugin installed on JDK 9
  Explicitly disable Netty key set replacement
  percolator: Fail indexing percolator queries containing either a has_child or has_parent query.
  Make it possible for Ingest Processors to access AnalysisRegistry
  Allow RestClient to send array-based headers
  Silence rest util tests until the bogusness can be simplified
  Remove unknown HttpContext-based test as it fails unpredictably on different JVMs
  Tests: Improve rest suite names and generated test names for docs tests
  Add support for a RestClient base path
2016-08-31 10:59:27 -04:00
Martijn van Groningen 3fcb95b814 percolator: Fail indexing percolator queries containing either a has_child or has_parent query.
Closes #2960
2016-08-31 07:46:17 +02:00
Jason Tedor 7da0cdec42 Introduce Log4j 2
This commit introduces Log4j 2 to the stack.
2016-08-30 13:31:24 -04:00
Jun Ohtani 450f47d5b5 Validate blank field name
add validation and validate only 5.0+
Add tests before 5.0

Closes #19251
2016-08-26 20:10:33 +09:00
Ryan Ernst 743d9fd008 Merge branch 'master' into search_parser 2016-08-16 11:28:59 -07:00
Ryan Ernst 7fde410586 Internal: Consolidate search parser registries
Parsing a search request is currently split up among a number of
classes, using multiple public static methods, which take multiple
regstries of elements that may appear in the search request like query
parsers and aggregations. This change begins consolidating all this code
by collapsing the registries normally used for parsing search requests
into a single SearchRequestParsers class. It is also made available to
plugin services to enable templating of search requests.  Eventually all
of the actual parsing logic should move to the class, and the registries
should be hidden, but for now they are at least co-located to reduce the
number of objects that must be passed around.
2016-08-16 01:59:24 -07:00
Nik Everett 1452ab4b9f Squash the rest of o.e.rest.action
Squashes all the subpackages of `org.elasticsearch.rest.action` down to
the following:
* `o.e.rest.action.admin` - Administrative actions
* `o.e.rest.action.cat` - Actions that make tables for `grep`ing
* `o.e.rest.action.document` - Actions that act on documents
* `o.e.rest.action.ingest` - Actions that act on ingest pipelines
* `o.e.rest.action.search` - Actions that search

I'm tempted to merge `search` into `document` but the `document`
package feels fairly complete as is and `Suggest` isn't actually always
about documents either....

I'm also tempted to merge `ingest` into `admin.cluster` because the
latter contains the actions for dealing with stored scripts.

I've moved the `o.e.rest.action.support` into `o.e.rest.action`.

I've also added `package-info.java`s to all packges in `o.e.rest`. I
figure if the package is too small to deserve a `package-info.java` file
then it is too small to deserve to be a package....

Also fixes checkstyle in all moved classes.
2016-08-15 21:06:32 -04:00
Nik Everett cf6e1a4362 Move all FetchSubPhases to `o.e.search.fetch.subphase`
As the most complicated `FetchSubPhase` highlighting gets its own package
(`o.e.seach.fetch.subphase.highlight`. No other `FetchSubPhase`s get their
own package. Instead they all reside together in `o.e.search.fetch.subphase`.

Add package descriptions to `o.e.search.fetch` and subpackages.
2016-08-12 18:21:15 -04:00
Adrien Grand 0d6ac57acf Collapse o.e.index.mapper packages. #19921
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.
2016-08-10 17:51:11 +02:00
javanna 2c44278ce8 [TEST] use ParseField instead of plain strings in query tests 2016-08-10 12:21:25 +02:00
javanna 0a98b5e56e [TEST] make AbstractQueryTestCase#testUnknownObjectException more accurate
testUnknownObjectException used to generate malformed json objects in some cases, due to the existence of arrays as it was not closing the injected object correctly. That is why the test was catching JsonParseException among the exception that are expected to be thrown. That is fixed by tracking where the new object is placed and placing its end object marker to the right level rather than always at the end.

Also introduced a mechanism to explicitly declare objects that won't cause any exception when they get additional objects injected, so that there is no need to override the method anymore as that caused copy pasting of the whole test method. This also makes sure that changes are reflected in tests, as those inner objects are not skipped but we actually check that what is declared is true (no exceptions get thrown when an additional object is added within them.
2016-08-10 11:48:51 +02:00
javanna 2437226802 [TEST] restore tests repeatability in AbstractQueryTestCase
Some random operations were conditionally performed in the before test, which made tests not repeatable. For instance take the seed chain to repeat a specific iteration and try to reproduce it, this conditional code would get executed in both cases when trying to isolate the failure, but not among the different iterations (as only the first method/iteration executes it), hence the failure will not reproduce.

Moved the random operations to beforeClass and left the non random part in the before method, which is needed as it depends on some method that can be overridden by subclasses.
2016-08-05 22:38:31 +02:00
Nik Everett 9270e8b22b Rename client yaml test infrastructure
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
2016-07-26 13:53:44 -04:00
Nik Everett a95d4f4ee7 Add Location header and improve REST testing
This adds a header that looks like `Location: /test/test/1` to the
response for the index/create/update API. The requirement for the header
comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

https://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative
URIs are OK. So we use an absolute path which should resolve to the
appropriate location.

Closes #19079

This makes large changes to our rest test infrastructure, allowing us
to write junit tests that test a running cluster via the rest client.
It does this by splitting ESRestTestCase into two classes:
* ESRestTestCase is the superclass of all tests that use the rest client
to interact with a running cluster.
* ESClientYamlSuiteTestCase is the superclass of all tests that use the
rest client to run the yaml tests. These tests are shared across all
official clients, thus the `ClientYamlSuite` part of the name.
2016-07-25 17:02:40 -04:00