Commit Graph

46589 Commits

Author SHA1 Message Date
Armin Braun 2176d09c37
Provide an Option to Use Path-Style-Access with S3 Repo (#41966) (#44046)
* Provide an Option to Use Path-Style-Access with S3 Repo

* As discussed, added the option to use path style access back again and
deprecated it.
* Defaulted to `false`
* Added warning to docs

* Closes #41816
2019-07-08 08:10:01 +02:00
Ioannis Kakavas 9beb51fc44 Revert "Mute testEnableDisableBehaviour (#42929)"
This reverts commit 6ee578c6eb.
2019-07-08 08:52:21 +03:00
Armin Braun f6efc55556
Fix SnapshotResiliencyTest (#44015) (#44041)
* Closes #43989
2019-07-07 19:59:16 +02:00
Armin Braun 990ac4ca83
Some Cleanup in BlobStoreRepository (#43323) (#44043)
* Some Cleanup in BlobStoreRepository

* Extracted from #42833:
  * Dry up index and shard path handling
  * Shorten XContent handling
2019-07-07 19:50:46 +02:00
Nhat Nguyen 9089820d8f Enable indexing optimization using sequence numbers on replicas (#43616)
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.

Relates #34099
2019-07-05 22:12:08 -04:00
Dimitris Athanasiou d3ddedf9fc
[7.x][ML] Add missing doc links to df-analytics rest spec and HLRC javadocs (#44025) (#44033) 2019-07-06 02:03:29 +03:00
Mayya Sharipova 37e1ad7062 Forbid empty doc values on vector functions (#43944)
Currently when a document misses a vector value, vector function
returns 0 as a score for this document. We think this is incorrect
behaviour.
With this change, an error will be thrown if vector functions are
used with docs that are missing vector doc values.
Also VectorScriptDocValues is modified to allow size() function,
which can be used to check if a document has a value for the
vector field.
2019-07-05 18:09:06 -04:00
Dimitris Athanasiou a1a62fded3
[7.x][ML] Stop df-analytics action request should filter tasks (#44016) (#44023)
As a `BaseTasksRequest`, `StopDataFrameAnalyticsAction.Request` should
implement a `match` method that makes sure only df-analytics tasks
are applied.
2019-07-05 23:10:45 +03:00
Yannick Welsch 504a43d43a Move ConnectionManager to async APIs (#42636)
This commit converts the ConnectionManager's openConnection and connectToNode methods to
async-style. This will allow us to not block threads anymore when opening connections. This PR also
adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove
some hacks in the test infrastructure that had to account for the previous synchronous nature of the
connection APIs.
2019-07-05 20:40:22 +02:00
Nhat Nguyen 8bfe18477e Clarify consequence of translog async setting (#44020)
Relates #43915
2019-07-05 13:56:42 -04:00
lcawl a831d4707c [DOCS] Temporarily disables data frame API testing 2019-07-05 10:56:09 -07:00
Yannick Welsch 88783927d1 Weaken assertion in PublicationTransportHandler (#44014)
These assertions do not hold true when a master fails during publication and quickly becomes
master again, publishing a new cluster state in a higher term which races against the previous
cluster state publication to self (which does not matter anyway).

Relates #43994

Closes #44012
2019-07-05 18:27:42 +02:00
István Zoltán Szabó 5aeb736801 Merge branch '7.x' of github.com:elastic/elasticsearch into 7.x 2019-07-05 14:26:47 +02:00
István Zoltán Szabó 7242267f5d [DOCS] Adds data frame analytics APIs to the ML APIs (#43875)
This PR adds the reference documentation pages of the data frame analytics APIs (PUT, START, STOP, GET, GET stats, DELETE, Evaluate) to the ML APIs pool.
2019-07-05 14:25:54 +02:00
Akshesh Doshi 01b982fd31 Draw attention to transport layer in remote cluster docs (#43883)
Closes #43858
2019-07-05 13:44:36 +02:00
Yannick Welsch 1220ff5b6d Publish to self through transport (#43994)
This commit ensures that cluster state publications to self also go through the transport layer. This
allows voting-only nodes to intercept the publication to self.

Fixes an issue discovered by a test failure where a voting-only node, which was the only
bootstrapped node, would not step down as master after state transfer because publishing to self
would succeed.

Closes #43631
2019-07-05 13:00:52 +02:00
Yannick Welsch 5cdf3ff3fa Revert "[TEST] Mute RemoteClusterServiceTests.testCollectNodes"
This reverts commit d8a2970fa4.
2019-07-05 11:02:42 +02:00
Yannick Welsch d090fa514f Use unique ports per test worker (#43983)
* Use unique ports per test worker

* Add test for system property

* check presence of tests.gradle

* Revert "check presence of tests.gradle"

This reverts commit 2fee7512a28f95c94c5bf7a3312e808f918a9510.
2019-07-05 11:02:28 +02:00
David Turner 06df0c0a4c Improve RetentionLease(Bgrd)SyncAction#toString() (#43987)
Today `RetentionLeaseSyncAction.Request` and
`RetentionLeaseBackgroundSyncAction.Request` both describe themselves as
`Request{...}` in the value returned from their respective `toString()`
methods. This commit adds the name of the owning class to both so we have
something a bit easier to search for and so we can distinguish foreground from
background syncs in logs and test failures and so on.
2019-07-05 09:58:35 +01:00
David Turner 435a83f3fd Add more logging to testOnlyBlocksOnConnectionsToNewNodes (#43979)
Some more output from this occasionally-failing test tracked in #40170.
2019-07-05 09:54:48 +01:00
István Zoltán Szabó 4c3e71b61a [DOCS] Adds description to the preview data frame transform API (#43745) 2019-07-05 09:53:24 +02:00
Dimitris Athanasiou 30b20920b9
[7.x][ML] Report correct count for df-analytics get-stats API (#43969) (#43981)
The count should match the number of all df-analytics that
matched the id in the request. However, we set the count
to the number of df-analytics returned which was bound to the
`size` parameter. This commit fixes this by setting the count
to the count of the `get` response.
2019-07-05 10:28:57 +03:00
Jim Ferenczi cdf55cb5c5 Refactor index engines to manage readers instead of searchers (#43860)
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
2019-07-04 22:49:43 +02:00
Hendrik Muhs 4128b9b4f7 audit message missing for autostop
call onStop when auto stopping (#43984)

fixes #43977
2019-07-04 21:40:42 +02:00
lcawl 688bf1b388 [DOCS] Fixes broken link 2019-07-04 09:13:56 -07:00
Lisa Cawley a030e3e513 [DOCS] Reformat CCR APIs to use new API format (#43952) 2019-07-04 08:29:54 -07:00
Christoph Büscher aeb3c1fd1b Prevent types deprecation warning for indices.exists requests (#43963)
Currently we log a deprecation warning to the types removal in
RestGetIndicesAction even if the REST method is HEAD, which is used by the
indices.exists API. Since the body is empty in this case we should not need to
show the deprecation warning.

Closes #43905
2019-07-04 17:20:43 +02:00
Tanguy Leroux b037aeaa6e
Fix IndexShardIT.testIndexCanChangeCustomDataPath() (#43978)
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
2019-07-04 17:06:37 +02:00
Benjamin Trent 36f7259737 [ML] Fix datafeed checks when a concrete remote index is present (#43923)
A bug was introduced in 6.6.0 when we added support for
rollup indices. Rollup caps does NOT support looking at
remote indices, consequently, since we always look up rollup
caps, the datafeed fails with an error if its config
includes a concrete remote index.  (When all remote indices
in a datafeed config are wildcards the problem did not
occur.)

The rollups feature does not support remote indices, so if
there is any remote index in a datafeed config (wildcarded
or not), we can skip the rollup cap checks.  This PR
implements that change.
2019-07-04 13:31:45 +01:00
Dimitris Athanasiou 2a70df424d
[TEST][ML] Fix assertion after starting df-analytics job (#43957) (#43967)
In MachineLearningIT.testStopDataFrameAnalytics we call start and
then assert the state is `started`. However, if things go fast enough,
the state could have already changed to `reindexing` or `analyzing`.
The test has been failing occasionally due to the state being
`reindexing`. We fix this by simply asserting the state is either
of `started`, `reindexing` or `analyzing`.

Closes #43924
2019-07-04 15:17:36 +03:00
Alan Woodward 4b99255fed Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:28:55 +01:00
Alpar Torok 1b6109517a Mute failing test
Tracking in #43960
2019-07-04 12:13:02 +03:00
Jim Ferenczi 2cc0a56fe6 Fix wrong logic in `match_phrase` query with multi-word synonyms (#43941)
Disjunction over two individual terms in a phrase query with multi-word synonyms
wrongly applies a prefix query to each of these terms. This change fixes this bug
by inversing the logic to use prefixes on `phrase_prefix` queries only.

Closes #43308
2019-07-04 09:39:39 +02:00
Henning Andersen cacc3f7ff8 Async IO Processor release before notify (#43682)
This commit changes async IO processor to release the promiseSemaphore
before notifying consumers. This ensures that a bad consumer that
sometimes does blocking (or otherwise slow) operations does not halt the
processor. This should slightly increase the concurrency for shard
fsync, but primarily improves safety so that one bad piece of code has
less effect on overall system performance.
2019-07-04 06:33:38 +02:00
Igor Motov c593085104 Geo: Refactors libs/geo parser to provide serialization logic as well (#43717)
Enables libs/geo parser to return a geometry format object that can
perform both serialization and deserialization functions. This can
be useful for ingest nodes that are trying to modify an existing
geometry in the source.

Relates to #43554
2019-07-03 19:31:44 -04:00
Benjamin Trent 7063a40411
[7.x] [ML][Data Frame] Adding bwc tests for pivot transform (#43506) (#43929)
* [ML][Data Frame] Adding bwc tests for pivot transform (#43506)

* [ML][Data Frame] Adding bwc tests for pivot transform

* adding continuous transforms

* adding continuous dataframes to bwc

* adding continuous data frame tests

* Adding rolling upgrade tests for continuous df

* Fixing test

* Adjusting indices used in BWC, and handling NPE for seq_no_stats

* updating and muting specific bwc test

* Adjusting bwc tests for backport
2019-07-03 16:39:38 -05:00
Przemyslaw Gomulka 553f783e73
Fix DieWithDignity test when waiting on jps backport(#43861) (#43871)
the test often hangs on executing jps command
we don't need to wait for this command to finish.

closes #43413
2019-07-03 20:39:48 +02:00
Adrien Grand 680edbe3f1
Bump current version to 7.4. (#43927) 2019-07-03 20:32:04 +02:00
Lisa Cawley 50e96f9f0e
[DOCS] Updates documentation version (#43937) 2019-07-03 11:09:34 -07:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Alan Woodward 49d69bf987 Actually close IndexAnalyzers contents (#43914)
IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
2019-07-03 16:06:58 +01:00
Alpar Torok 3250cc53f0 Mute failing test
Tracked in #43924
2019-07-03 17:43:40 +03:00
Zachary Tong f8fd4321f8 Link rare_terms docs from index page (#43882)
Docs for rare_terms were added in #35718, but neglected to
link it from the bucket index page
2019-07-03 09:32:01 -04:00
David Turner 9cecc31cdc Shortcut simple patterns ending in `*` (#43904)
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      433.190 ±(99.9%) 0.644 ns/op [Average]
      (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
      CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

    @Fork(3)
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @Benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
2019-07-03 14:15:27 +01:00
Armin Braun 3317169c4f
Fix GCS Blob Repository 3rd Party Tests (#43030) (#43913)
* We have to strip the trailing slash from child names here like we do for AWS
* closes #43029
2019-07-03 15:09:28 +02:00
James Rodewig e2a9a787fc [DOCS] Rewrite dis max query (#43586) 2019-07-03 08:56:18 -04:00
paulward24 cff027499a Ensure to access RecoveryState#fileDetails under lock
Closes #43840
2019-07-03 07:39:58 -04:00
Armin Braun 7059224668
Optimize Snapshot Finalization (#42723) (#43908)
* Optimize Snapshot Finalization

* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
2019-07-03 13:26:35 +02:00
Christoph Büscher 662f517f4e Add _reload_search_analyzers endpoint to HLRC (#43733)
This change adds the new endpoint that allows reloading of search analyzers to
the high-level java rest client.

Relates to #43313
2019-07-03 12:05:59 +02:00
Dimitris Athanasiou 96b0b27f18
[7.x][ML] Set df-analytics task state to failed when appropriate (#43880) (#43906)
This introduces a `failed` state to which the data frame analytics
persistent task is set to when something unexpected fails. It could
be the process crashing, the results processor hitting some error,
etc. The failure message is then captured and set on the task state.
From there, it becomes available via the _stats API as `failure_reason`.

The df-analytics stop API now has a `force` boolean parameter. This allows
the user to call it for a failed task in order to reset it to `stopped` after
we have ensured the failure has been communicated to the user.

This commit also adds the analytics version in the persistent task
params as this allows us to prevent tasks to run on unsuitable nodes in
the future.
2019-07-03 12:41:56 +03:00