Commit Graph

3016 Commits

Author SHA1 Message Date
Adrien Grand d84c643f58 Use the new points API to index numeric fields. #17746
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.

Notes about how the change works:
 - Numeric mappers have been split into a legacy version that is essentially
   the current mapper, and a new version that uses points, eg.
   LegacyDateFieldMapper and DateFieldMapper.
 - Since new and old fields have the same names, the decision about which one
   to use is made based on the index creation version.
 - If you try to force using a legacy field on a new index or a field that uses
   points on an old index, you will get an exception.
 - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
   in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
   and sortable).
 - The internal MappedFieldType that is stored by the new mappers does not have
   any of the points-related properties set. Instead, it keeps setting the index
   options when parsing the `index` property of mappings and does
   `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
   when parsing documents.

Known issues that won't fix:
 - You can't use numeric fields in significant terms aggregations anymore since
   this requires document frequencies, which points do not record.
 - Term queries on numeric fields will now return constant scores instead of
   giving better scores to the rare values.

Known issues that we could work around (in follow-up PRs, this one is too large
already):
 - Range queries on `ip` addresses only work if both the lower and upper bounds
   are inclusive (exclusive bounds are not exposed in Lucene). We could either
   decide to implement it, or drop range support entirely and tell users to
   query subnets using the CIDR notation instead.
 - Since IP addresses now use a different representation for doc values,
   aggregations will fail when running a terms aggregation on an ip field on a
   list of indices that contains both pre-5.0 and 5.0 indices.
 - The ip range aggregation does not work on the new ip field. We need to either
   implement range aggs for SORTED_SET doc values or drop support for ip ranges
   and tell users to use filters instead. #17700

Closes #16751
Closes #17007
Closes #11513
2016-04-14 17:56:23 +02:00
Christoph Büscher e15e7f7e6e Remove parser argument from methods where we already pass in a parse context
When we pass down both parser and QueryParseContext to a method, we cannot
make sure that the parser contained in the context and the parser that is
parsed as an argument have the same state. This removes the parser argument
from methods where we currently have both the parser and the parse context
as arguments and instead retrieves the parse from the context inside the
method.
2016-04-14 16:18:05 +02:00
Martijn van Groningen 2928fd6ef3 Cleanup query builder for inner hits construction.
* Inner hits can now only be provided and prepared via setter in the nested, has_child and has_parent query.
* Also made `score_mode` a required constructor parameter.
* Moved has_child's min_child/max_children validation from doToQuery(...) to a setter.
2016-04-14 14:43:21 +02:00
Nik Everett cca3154c43 Rename isSourceEmpty to hasSource
And add a test case for {} to reindex.
2016-04-13 08:19:58 -04:00
Nik Everett c2e745bf3b reindex: Guard against user disabling fields 2016-04-13 08:19:58 -04:00
Nik Everett 0f9804b0e2 reindex: gracefully handle when _source is disabled
Closes #17666
2016-04-13 08:19:58 -04:00
Adrien Grand 013acf9179 Remove MappedFieldType.value. #17557
This commit removes `MappedFieldType.value` and simplifies
`MappedFieldType.valueforSearch`. `valueforSearch` was used to post-process
values that come for stored fields (eg. to convert a long back to a string
representation of a date in the case of a date field) and also values that
are extracted from the source but only in the case of GET calls: it would
not be called when performing source filtering on search requests.

`valueforSearch` is now only called for stored fields, since values that are
extracted from the source should already be formatted as expected.
2016-04-12 09:12:56 +02:00
Adrien Grand a14db8e17e Remove MappedFieldType.useTermQueryWithQueryString() and isNumeric(). #17599
In both cases, what elasticsearch is really interested in is whether the field
is an analyzed string field. So it can just check `tokenized()` instead.
2016-04-12 08:45:28 +02:00
Adrien Grand 496c7fbd84 Upgrade Lucene 6 Release
* upgrades numerics to new Point format
* updates geo api changes
  * adds GeoPointDistanceRangeQuery as XGeoPointDistanceRangeQuery
  * cuts over to ES GeoHashUtils
2016-04-11 16:50:04 -05:00
Adrien Grand 0eb1a816c8 Allow the query cache to be disabled. #16268
This replaces the internal `index.queries.cache.type` setting with
a new `index.queries.cache.enabled` setting, which is documented.

Closes #15802
2016-04-11 18:06:16 +02:00
Adrien Grand 42526ac28e Remove Settings.settingsBuilder.
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
2016-04-08 18:10:02 +02:00
Adrien Grand bef38a4d12 Fix test bug. 2016-04-07 09:49:27 +02:00
Adrien Grand e1bfe23c22 ExtendedStatsAggregator should also pass sigma to emtpy aggs. #17388
Because sigma is also used at reduce time, it should be passed to empty aggs.
Otherwise it causes bugs when an empty aggregation is used to perform reduction
is it would assume a sigma of zero.

Closes #17362
2016-04-07 09:34:11 +02:00
Nik Everett 16c12afabe Rework ScoreFunctionBuilder registration to remove PROTOTYPEs
This removes PROTOTYPEs from ScoreFunctionsBuilders. To do so we rework
registration so it doesn't need PROTOTYPEs and lines up with the recent
changes to query registration.
2016-04-06 13:04:11 -04:00
Nik Everett 2b6866d26b Fix references to the removed parsers
Mostly stuff is just in the builder now.
2016-04-06 11:15:22 -04:00
Adrien Grand 4c4bbb3e45 Replace FieldStatsProvider with a method on MappedFieldType. #17334
FieldStatsProvider had to perform instanceof calls to properly handle dates or
ip addresses. By moving the logic to MappedFieldType, each field type can check
whether all values are within bounds its way.

Note that this commit only keeps rewriting support for dates, which are the only
field for which the rewriting mechanism is likely to help (because of time-based
indices).
2016-04-01 10:28:58 +02:00
Nik Everett 14d37baa4b [reindex] Don't get rejected
BulkByScrollTaskTest#testDelayAndRethrottle was getting rejected exceptions
every once in a while. This was reproducible ~20% of the time for me. I
added a CyclicBarrier to prevent the test from shutting down the thread pool
before the threads get finished.
2016-03-31 14:50:14 -04:00
Nik Everett 0c762fca35 Fix test mistake 2016-03-31 12:27:35 -04:00
Nik Everett 7f794e7b77 Test for invalid scroll_size 2016-03-31 12:21:32 -04:00
Nik Everett 30a1862339 Remove PROTOTYPE from BulkItemResponse.Failure
Closes #17086
2016-03-31 09:10:36 -04:00
javanna 32b6e529f4 Merge branch 'master' into enhancement/discovery_node_one_getter 2016-03-31 10:49:26 +02:00
Jack Conradson a37e53c50f Painless clean up including fixing _score issues and improving type
error messages.

Closes #17428
2016-03-30 16:40:17 -07:00
Nik Everett 78ab6c5b7f [reindex] Dynamic throttle!
This allows the user to update the reindex throttle on the fly, with changes
that speed up the throttling being applied immediately and changes that
slow down the throttling being applied during the next batch. This means
that if a user throttles reindex in such a way that it tries to sleep for
16 years and then realizes that they've done something wrong then they
can change the throttle and reindex will wake up again. We don't apply
slow downs immediately so we never get in danger of losing the scan context.

Also, if reindex is canceled while it is sleeping (how it honor throttling)
then it'll immediately wake up and cancel itself.
2016-03-30 16:40:42 -04:00
javanna b9f9b2e3ee Merge branch 'master' into enhancement/discovery_node_one_getter 2016-03-30 17:22:40 +02:00
Nik Everett 1c16d63a9a Merge pull request #17394 from camilojd/refactor/replace-getrandom
Refactor: replace all ocurrences of ESTestCase.getRandom() with LuceneTestCase.random()
2016-03-30 08:58:21 -04:00
javanna a8bbdff3bc Remove DiscoveryNode#name in favour of existing DiscoveryNode#getName 2016-03-30 14:47:36 +02:00
Adrien Grand 068c788ec8 Disable fielddata on text fields by defaults. #17386
`text` fields will have fielddata disabled by default. Fielddata can still be
enabled on an existing index by setting `fielddata=true` in the mappings.
2016-03-30 14:35:32 +02:00
Camilo Diaz Repka 7be11a36cd Refactor: replace all ocurrences of ESTestCase.getRandom() for random().
Remove getRandom().
2016-03-29 23:18:05 -04:00
Nik Everett df08854c60 Remove PROTOTYPEs from suggesters
Also stops using guice for suggesters at all and lots of checkstyle.
2016-03-29 17:55:01 -04:00
javanna 061f09d9a4 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-29 20:19:33 +02:00
Tal Levy 16e888fac3 Merge pull request #17260 from talevy/fix-regex-exceptions
Handle regex parsing errors in Gsub and Grok Processors
2016-03-29 08:12:26 -07:00
Clinton Gormley 3087d2b882 Fixed bad YAML in reindex REST test: 50_routing.yaml 2016-03-29 15:03:09 +02:00
Clinton Gormley 52daed0732 Update-by-query rest tests: fixed bad yaml and deleted a client-dependent test 2016-03-29 14:58:29 +02:00
Colin Goodheart-Smithe ff3fd99074 Prevents exception being raised when ordering by an aggregation which wasn't collected
If a terms aggregation was ordered by a metric nested in a single bucket aggregator which did not collect any documents (e.g. a filters aggregation which did not match in that term bucket) an ArrayOutOfBoundsException would be thrown when the ordering code tried to retrieve the value for the metric. This fix fixes all numeric metric aggregators so they return their default value when a bucket ordinal is requested which was not collected.

Closes #17225
2016-03-29 13:28:03 +01:00
javanna 8fc9dbbb99 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-29 14:27:04 +02:00
Clinton Gormley 5f24581de3 The reindex body is now required, which changes the exception thrown by the REST test 2016-03-29 14:09:59 +02:00
Clinton Gormley b87beeb05f Rename update-by-query REST tests to update_by_query 2016-03-29 13:13:49 +02:00
Clinton Gormley 97606850e8 Renamed update-by-query REST spec to update_by_query 2016-03-29 11:45:20 +02:00
javanna de5cbda8e7 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-29 10:48:47 +02:00
Nik Everett 0e6141e675 Replace is_true: took with took >= 0
This prevents tests from failing on machines that can finish the request
less than half a millisecond.
2016-03-28 13:03:48 -04:00
javanna 27d4994aff Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-24 18:10:11 +01:00
Nik Everett 93ab4cfc99 Stop using PROTOTYPE in NamedWriteableRegistry
readFrom is confusing because it requires an instance of the type that it
is reading but it doesn't modify it. But we also have (deprecated) methods
named readFrom that *do* modify the instance. The "right" way to implement
the non-modifying readFrom is to delegate to a constructor that takes a
StreamInput so that the read object can be immutable. Now that we have
`@FunctionalInterface`s it is fairly easy to register things by referring
directly to the constructor.

This change modifying NamedWriteableRegistry so that it does that. It keeps
supporting `registerPrototype` which registers objects to be read by
readFrom but deprecates it and delegates it to a new `register` method
that allows passing a simple functional interface. It also cuts Task.Status
subclasses over to using that method.

The start of #17085
2016-03-24 11:26:44 -04:00
Nik Everett 48aaebf23d [reindex] Wait for headers
The test was checking that we'd set the headers properly but in some cases
the request had yet to come in because it was running on another thread.
Now we wait for the headers to show up before failing the test.

Closes #17299
2016-03-24 09:55:49 -04:00
Jim Ferenczi da42f199bd Enforce isolated mode for all plugins
This commit removes the isolated option, each plugin have its own classloader.
2016-03-24 09:17:33 +01:00
Nik Everett aaa4d57fff [reindex] Don't attempt to refresh on noop
If the user asks for a refresh but their reindex or update-by-query
operation touched no indexes we should just skip the resfresh call
entirely. Without this commit we refresh *all* indexes which is totally
wrong.

Closes #17296
2016-03-23 18:12:40 -04:00
Areek Zillur ed49ec437f remove suggest transport action 2016-03-23 16:37:56 -04:00
javanna 030453d320 Merge branch 'master' into enhancement/remove_node_client_setting 2016-03-23 11:25:34 +01:00
Adrien Grand e50eeeaffb Refactor fielddata mappings. #17148
The fielddata settings in mappings have been refatored so that:
 - text and string have a `fielddata` (boolean) setting that tells whether it
   is ok to load in-memory fielddata. It is true by default for now but the
   plan is to make it default to false for text fields.
 - text and string have a `fielddata_frequency_filter` which contains the same
   thing as `fielddata.filter.frequency` used to (but validated at parsing time
   instead of being unchecked settings)
 - regex fielddata filtering is not supported anymore and will be dropped from
   mappings automatically on upgrade.
 - text, string and _parent fields have an `eager_global_ordinals` (boolean)
   setting that tells whether to load global ordinals eagerly on refresh.
 - in-memory fielddata is not supported on keyword fields anymore at all.
 - the `fielddata` setting is not supported on other fields that text and string
   and will be dropped when upgrading if specified.
2016-03-23 09:48:13 +01:00
Tal Levy 534caa8927 Handle regex parsing errors in Gsub and Grok Processors
Currently, both Gsub and Grok parse regex strings during
Pipeline creation. Thrown parsing exceptions were leaking out, this
commit wraps those exceptions in ElasticsearchParseExceptions.
2016-03-22 15:06:29 -07:00
Nik Everett da96b6e41d [reindex] Add thottling support
The throttle is applied when starting the next scroll request so that its
timeout can include the throttle time.
2016-03-22 12:34:14 -04:00