Commit Graph

7622 Commits

Author SHA1 Message Date
Jim Ferenczi 63bdd01eb7 Expose WordDelimiterGraphTokenFilter (#23327)
This change exposes the new Lucene graph based word delimiter token filter in the analysis filters.
Unlike the `word_delimiter` this token filter named `word_delimiter_graph` correctly handles multi terms expansion at query time.

Closes #23104
2017-02-24 00:53:38 +01:00
Shai Erera eeac6d27f2 Add BreakIteratorBoundaryScanner support for FVH (#23248)
This commit adds a boundary_scanner property to the search highlight
request so the user can specify different boundary scanners:

* `chars` (default,  current behavior)
* `word` Use a WordBreakIterator
* `sentence` Use a SentenceBreakIterator

This commit also adds "boundary_scanner_locale" to define which locale
should be used when scanning the text.
2017-02-23 23:32:22 +01:00
Ali Beyad 25a9a7ee3a Prioritize listing index-N blobs over index.latest in reading snapshots (#23333)
There are two ways to determine the latest index-N blob that contains
the truth of the contents of the repository: (1) list all index-N blobs
and figure out what the latest value of N is, and (2) read the
index.latest blob, which contains the latest value of N explicitely.
Note that the index.latest blob is not written atomically and can be
re-written, as opposed to the index-N blobs which are never re-written
(to create an updated index blob, index-{N+1} is written).

Previously, the latest index-N was determined by first trying to read
the index.latest blob and if that blob was missing (it was deleted
before being re-written and in between deleting it and re-writing it,
the system crashed), then all index-N blobs were listed to pick the
highest N value.

For non-read-only repositories, this could produce race conditions with
the file system.  In particular, it is possible that the index.latest
blob is being read in order to serve a read request (e.g. get snapshots)
and while doing so, an attempt is made to delete the index.latest blob
and re-write it in order to finalize a snapshot operation.  On some file
systems (e.g. Windows), it is forbidden to delete a file while it is
open for reading by another process/thread.

This commit changes the priority so that figuring out the latest index-N
blob is first done by listing all index-N blobs and determining the
latest N value.  If that values because the repository does not
support listing blobs (e.g. the URL repository), then the index.latest
blob is read.  This is safe because in read-only repositories that do
not support listing blobs, the index.latest blob is never deleted and
then re-written, so the aforementioned issue does not arise.
2017-02-23 15:44:12 -05:00
sabi0 09b3c7f270 Do not create String instances in 'Strings' methods accepting StringBuilder (#22907) 2017-02-23 10:57:34 -08:00
Christoph Büscher 8b1b152e91 Remove abstract InternalMetricsAggregation class (#23326)
This class doesn't seem to do much other than to group together
certain types of aggregations.
2017-02-23 18:03:40 +01:00
Simon Willnauer 2f3f9b9961 Remove unnecessary result sorting in SearchPhaseController (#23321)
In oder to use lucene's utilities to merge top docs the results
need to be passed in a dense array where the index corresponds to the shard index in
the result list. Yet, we were sorting results before merging them just to order them
in the incoming order again for the above mentioned reason. This change removes the
obsolet sort and prevents unnecessary materializing of results.
2017-02-23 13:48:54 +01:00
Simon Willnauer 771fd1f4ea Fix SamplerAggregatorTests to have stable and predictable docIds
Closes #23315
2017-02-23 08:08:38 +01:00
Ryan Ernst 18f57c05cf Script: Fix value of `ctx._now` to be current epoch time in milliseconds (#23175)
In update scripts, `ctx._now` uses the same milliseconds value used by the
rest of the system to calculate deltas. However, that time is not
actually epoch milliseconds, as it is derived from `System.nanoTime()`.
This change reworks the estimated time thread in ThreadPool which this
time is based on to make available both the relative time, as well as
absolute milliseconds (epoch) which may be used with calendar system. It
also renames the EstimatedTimeThread to a more apt CachedTimeThread.

closes #23169
2017-02-22 15:11:02 -08:00
Lee Hinman 77d641216a Handle long overflow when adding paths' totals
From #23093, we fixed the issue where a filesystem can be so large that it
overflows and returns a negative number. However, there is another issue when
adding a path as a sub-path to another `FsInfo.Path` object, when adding the
totals the values can still overflow.

This adds the same safety to return `Long.MAX_VALUE` instead of the negative
number, as well as a test exercising the logic.
2017-02-22 13:04:34 -07:00
Yannick Welsch 0f88f21535 Don't set local node on cluster state used for node join validation (#23311)
When a node wants to join a cluster, it sends a join request to the master. The master then sends a join validation request to the node. This checks that the node can deserialize the current cluster state that exists on the master and that it can thus handle all the indices that are currently in the cluster (see #21830).

The current code can trip an assertion as it does not take the cluster state as is but sets itself as the local node on the cluster state. This can result in an inconsistent DiscoveryNodes object as the local node is not yet part of the cluster state and a node with same id but different address can still exist in the cluster state. Also another node with the same address but different id can exist in the cluster state if multiple nodes are run on the same machine and ports have been swapped after node crashes/restarts.
2017-02-22 20:27:27 +01:00
Lee Hinman 6f1ed8a3d1
[TEST] Add additional logging to IndicesStoreIntegrationIT.testIndexCleanup 2017-02-22 10:11:05 -07:00
Luca Cavanna 495b24655b Update indices settings api to support CBOR and SMILE format (#23309)
Also expand testing on the different ways to provide index settings and remove dead code around ability to provide settings as query string parameters

Closes #23242
2017-02-22 17:51:10 +01:00
javanna f2acf466aa Convert script/template objects to json format
Elasticsearch accepts multiple content-type formats, hence scripts can be stored/provided in json, yaml, cbor or smile. Yet the format that should be used internally is json. This is a problem mainly around search templates, as they only support json out of the four content-types, so instead of maintaining the content-type of the request we should rather convert the scripts/templates to json.

 Binary formats were not previously supported. If you stored a template in yaml format, you'd get back an error "No encoder found for MIME type [application/yaml]" when trying to execute it. With this commit the request content-type is independent from the template, which always gets converted to json internally. That is transparent to users and doesn't affect the content type of the response obtained when executing the template.
2017-02-22 16:20:53 +01:00
Simon Willnauer 5c1924ad19 Remove BWC layer for number of reduce phases (#23303)
Both PRs below have been backported to 5.4 such that we can enable
BWC tests of this feature as well as remove version dependend serialization
for search request / responses.

Relates to #23288
Relates to #23253
2017-02-22 15:03:09 +01:00
mms-programming d31e41547a Handle BlobPath's trailing separator case (#23091) 2017-02-22 09:04:55 +01:00
Areek Zillur 148be11f26 Make document write requests immutable (#23038)
* Make document write requests immutable

Previously, write requests were mutated at the
transport level to update request version, version type
and sequence no before replication.
Now that all write requests go through the shard bulk
transport action, we can use the primary response stored
in item level bulk requests to pass the updated version,
seqence no. to replicas.

* incorporate feedback

* minor cleanup

* Add bwc test to ensure correct index version propagates to replica

* Fix bwc for propagating write operation versions

* Add assertion on replica request version type

* fix tests using internal version type for replica op

* Fix assertions to assert version type in replica and recovery

* add bwc tests for version checks in concurrent indexing

* incorporate feedback
2017-02-21 17:41:22 -05:00
Simon Willnauer ca38e88148 Remote assertion that relies on all shards being successful
The assertion that if there are buffered aggs at least one incremental
reduce phase should have happened doens't hold if there are shard failure.
This commit removes this assertion.

Relates to #23288
2017-02-21 22:41:49 +01:00
Nik Everett 7475175957 Adds unit test for sampler aggregation (#23243)
* Adds unit test for sampler aggregation

Relates to #22278
2017-02-21 12:51:47 -05:00
Jim Ferenczi 0ff6356b7e Revert "Never reduce the same agg twice"
This change reverts 5e4ba4a60e
Incremental reduction of aggs should also work with a single aggregation now that InternalTopHits.equals
 is fixed.
2017-02-21 18:48:28 +01:00
Simon Willnauer ce625ebdcc Expose `batched_reduce_size` via `_search` (#23288)
In #23253 we added an the ability to incrementally reduce search results.
This change exposes the parameter to control the batch since and therefore
the memory consumption of a large search request.
2017-02-21 18:36:59 +01:00
Jim Ferenczi 1ba9770037 Fix comparaison of double in InternalTopHits
InternalTopHits uses "==" to compare hit scores and fails when score is NaN.
This commit changes the comparaison to always use Double.compare.

Relates #23253
2017-02-21 18:18:44 +01:00
Simon Willnauer 5e4ba4a60e Never reduce the same agg twice
Some randomization caused reduction of the same agg multiple times
which causes issues on some aggregations.

Relates to #23253
2017-02-21 17:55:44 +01:00
Simon Willnauer 489f38918d Fix incremental reduce randomization in base tests cases
We can and should randomly reduce down to a single result before
we passing the aggs to the final reduce. This commit changes the logic
to do that and ensures we don't trip the assertions the previous imple tripped.

Relates to #23253
2017-02-21 17:13:46 +01:00
Nik Everett 74c33823ab Comment 2017-02-21 10:43:29 -05:00
Nik Everett 0dee1f85e6 Remove closeAgg 2017-02-21 10:31:42 -05:00
Tanguy Leroux 3a0fc526bb UpdateRequest implements ToXContent (#23289)
This commit changes UpdateRequest so that it implements the ToXContentObject interface.
2017-02-21 15:20:15 +01:00
Jim Ferenczi cc865cbc96 Add unit tests for stats and extended stats aggregations (#23287)
Add tests for InternalStats, InternalExtendedStats and StatsAggregator/ExtendedStatsAggregator

Relates #22278
2017-02-21 15:14:54 +01:00
Simon Willnauer f933f80902 First step towards incremental reduction of query responses (#23253)
Today all query results are buffered up until we received responses of
all shards. This can hold on to a significant amount of memory if the number of
shards is large. This commit adds a first step towards incrementally reducing
aggregations results if a, per search request, configurable amount of responses
are received. If enough query results have been received and buffered all so-far
received aggregation responses will be reduced and released to be GCed.
2017-02-21 13:02:48 +01:00
Tanguy Leroux 39ed76c58b Add parsing method to bulk response (#23234)
This commit adds the `fromXContent()` parsing method to BulkResponse.
2017-02-21 10:49:40 +01:00
Tanguy Leroux c88eb00b83 Add javadoc for DocWriteResponse.Builders (#23267) 2017-02-21 10:19:01 +01:00
Martin Scholz 24bf18b610 Upgrade HDRHistogram to 2.1.9 (#23254) 2017-02-21 08:50:26 +01:00
Martin Scholz 3e292d5245 Migrate TermsQuery to TermInSetQuery (#23229) 2017-02-21 08:49:43 +01:00
Jim Ferenczi 1ff5b318be Fix for IpRangeAggregatorTests#testRanges
Handle null from/to ranges.

Closes #23272
2017-02-20 21:16:14 +01:00
Jason Tedor 4c2bd5feab Introduce sequence-number-aware translog
Today, the relationship between Lucene and the translog is rather
simple: every document not in Lucene is guaranteed to be in the
translog. We need a stronger guarantee from the translog though, namely
that it can replay all operations after a certain sequence number. For
this to be possible, the translog has to made sequence-number aware. As
a first step, we introduce the min and max sequence numbers into the
translog so that each generation knows the possible range of operations
contained in the generation. This will enable future work to keep around
all generations containing operations after a certain sequence number
(e.g., the global checkpoint).

Relates #22822
2017-02-20 15:05:24 -05:00
Jason Tedor 15f5810774 Mark IP range aggregator test as awaits fix
This test reliably fails with the seed 4AC319F8A6B0329B.
2017-02-20 14:42:16 -05:00
Christoph Büscher ea7deace5d Adding fromXContent to Suggest and Suggestion class (#23226)
A follow up to #23202, this adds parsing from xContent and tests to the four Suggestion implementations
and the top level suggest element to be used later when parsing the entire SearchResponse.
2017-02-20 15:45:10 +01:00
Christoph Büscher ea9d51114c Tests: Add unit test for InternalChildren (#23261)
Relates to #22278
2017-02-20 14:02:56 +01:00
Jim Ferenczi 76d6b872dd Add unit tests for GeoBoundsAggregator/InternalGeoBounds (#23259)
* Add unit tests for GeoBoundsAggregator/InternalGeoBounds

Relates #22278
2017-02-20 12:04:30 +01:00
Jim Ferenczi 69b1463f7c Add unit tests for BinaryRangeAggregator/InternalBinaryRange (#23255)
* Add unit tests for BinaryRangeAggregator/InternalBinaryRange

Relates #22278
2017-02-20 11:55:48 +01:00
Tanguy Leroux 872412f645 [Tests] Cleans up DocWriteResponse parsing tests (#23233)
This commit cleans up some parsing tests added from the High Level Rest Client: IndexResponseTests, DeleteResponseTests, UpdateResponseTests, BulkItemResponseTests.

These tests are now more uniform with the others test-from-to-XContent tests we have, they now shuffle the XContent fields before parsing, the asserting method for parsed objects does not used a Map<String, Object> anymore, and buggy equals/hasCode methods in ShardInfo and ShardInfo.Failure have been removed.
2017-02-20 09:45:33 +01:00
Nik Everett d9c37ce195 Adds unit test for sampler aggregation
Relates to #22278
2017-02-17 16:16:04 -05:00
Nik Everett d1de9574ea Checkstyle: Fix link lengths in sampler aggregation 2017-02-17 15:03:57 -05:00
Jay Modi b234644035 Enforce Content-Type requirement on the rest layer and remove deprecated methods (#23146)
This commit enforces the requirement of Content-Type for the REST layer and removes the deprecated methods in transport
requests and their usages.

While doing this, it turns out that there are many places where *Entity classes are used from the apache http client
libraries and many of these usages did not specify the content type. The methods that do not specify a content type
explicitly have been added to forbidden apis to prevent more of these from entering our code base.

Relates #19388
2017-02-17 14:45:41 -05:00
Adrien Grand 3bd1d46fc7 Add unit tests for terms aggregation objects. (#23149)
Relates #22278
2017-02-17 18:01:40 +01:00
javanna 578853f264 Remove stale comment about setting routing before parent
Order does not matter anymore since we merged #15371
2017-02-17 17:10:53 +01:00
Yuhao Bi 576e698613 Minor fix of _cat output (#23211) (#23213)
One line was missing a trailing "\n"
2017-02-17 10:46:20 +01:00
Jason Tedor 00a8b8799f Fix control group pattern
The file /proc/self/cgroup lists the control groups to which the process
belongs. This file is a colon separated list of three fields:
 1. a hierarchy ID number
 2. a comma-separated list of hierarchies
 3. the pathname of the control group in the hierarchy

The regex pattern for this contains a bug for the second field. It
allows one or two entries in the comma-separated list, but not
more. This commit fixes the pattern to allow one or more entires in the
comma-separated list.

Relates #23219
2017-02-16 15:31:18 -05:00
Christoph Büscher 268d15ec4c Adding fromXContent to Suggestion.Entry and subclasses (#23202)
This adds parsing from xContent to Suggestion.Entry and its subclasses for Terms-, Phrase-
and CompletionSuggestion.Entry.
2017-02-16 17:59:55 +01:00
markharwood 1cd1ff6010 Test fix - faulty assumptions about when exceptions are thrown in relation to number of failing shards. (#23205)
Search exceptions are thrown only when all shards report failure. Fix changes assertion logic to reflect this.

Closes #23203
2017-02-16 13:48:17 +00:00
Jason Tedor 0a5917d182 Fix get HEAD requests
Get HEAD requests incorrectly return a content-length header of 0. This
commit addresses this by removing the special handling for get HEAD
requests, and just relying on the general mechanism that exists for
handling HEAD requests in the REST layer.

Relates #23186
2017-02-15 13:07:29 -05:00