Commit Graph

46861 Commits

Author SHA1 Message Date
Benjamin Trent 36f7259737 [ML] Fix datafeed checks when a concrete remote index is present (#43923)
A bug was introduced in 6.6.0 when we added support for
rollup indices. Rollup caps does NOT support looking at
remote indices, consequently, since we always look up rollup
caps, the datafeed fails with an error if its config
includes a concrete remote index.  (When all remote indices
in a datafeed config are wildcards the problem did not
occur.)

The rollups feature does not support remote indices, so if
there is any remote index in a datafeed config (wildcarded
or not), we can skip the rollup cap checks.  This PR
implements that change.
2019-07-04 13:31:45 +01:00
Dimitris Athanasiou 2a70df424d
[TEST][ML] Fix assertion after starting df-analytics job (#43957) (#43967)
In MachineLearningIT.testStopDataFrameAnalytics we call start and
then assert the state is `started`. However, if things go fast enough,
the state could have already changed to `reindexing` or `analyzing`.
The test has been failing occasionally due to the state being
`reindexing`. We fix this by simply asserting the state is either
of `started`, `reindexing` or `analyzing`.

Closes #43924
2019-07-04 15:17:36 +03:00
Alan Woodward 4b99255fed Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:28:55 +01:00
Alpar Torok 1b6109517a Mute failing test
Tracking in #43960
2019-07-04 12:13:02 +03:00
Jim Ferenczi 2cc0a56fe6 Fix wrong logic in `match_phrase` query with multi-word synonyms (#43941)
Disjunction over two individual terms in a phrase query with multi-word synonyms
wrongly applies a prefix query to each of these terms. This change fixes this bug
by inversing the logic to use prefixes on `phrase_prefix` queries only.

Closes #43308
2019-07-04 09:39:39 +02:00
Henning Andersen cacc3f7ff8 Async IO Processor release before notify (#43682)
This commit changes async IO processor to release the promiseSemaphore
before notifying consumers. This ensures that a bad consumer that
sometimes does blocking (or otherwise slow) operations does not halt the
processor. This should slightly increase the concurrency for shard
fsync, but primarily improves safety so that one bad piece of code has
less effect on overall system performance.
2019-07-04 06:33:38 +02:00
Igor Motov c593085104 Geo: Refactors libs/geo parser to provide serialization logic as well (#43717)
Enables libs/geo parser to return a geometry format object that can
perform both serialization and deserialization functions. This can
be useful for ingest nodes that are trying to modify an existing
geometry in the source.

Relates to #43554
2019-07-03 19:31:44 -04:00
Benjamin Trent 7063a40411
[7.x] [ML][Data Frame] Adding bwc tests for pivot transform (#43506) (#43929)
* [ML][Data Frame] Adding bwc tests for pivot transform (#43506)

* [ML][Data Frame] Adding bwc tests for pivot transform

* adding continuous transforms

* adding continuous dataframes to bwc

* adding continuous data frame tests

* Adding rolling upgrade tests for continuous df

* Fixing test

* Adjusting indices used in BWC, and handling NPE for seq_no_stats

* updating and muting specific bwc test

* Adjusting bwc tests for backport
2019-07-03 16:39:38 -05:00
Przemyslaw Gomulka 553f783e73
Fix DieWithDignity test when waiting on jps backport(#43861) (#43871)
the test often hangs on executing jps command
we don't need to wait for this command to finish.

closes #43413
2019-07-03 20:39:48 +02:00
Adrien Grand 680edbe3f1
Bump current version to 7.4. (#43927) 2019-07-03 20:32:04 +02:00
Lisa Cawley 50e96f9f0e
[DOCS] Updates documentation version (#43937) 2019-07-03 11:09:34 -07:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Alan Woodward 49d69bf987 Actually close IndexAnalyzers contents (#43914)
IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
2019-07-03 16:06:58 +01:00
Alpar Torok 3250cc53f0 Mute failing test
Tracked in #43924
2019-07-03 17:43:40 +03:00
Zachary Tong f8fd4321f8 Link rare_terms docs from index page (#43882)
Docs for rare_terms were added in #35718, but neglected to
link it from the bucket index page
2019-07-03 09:32:01 -04:00
David Turner 9cecc31cdc Shortcut simple patterns ending in `*` (#43904)
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      433.190 ±(99.9%) 0.644 ns/op [Average]
      (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
      CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

    @Fork(3)
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @Benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
2019-07-03 14:15:27 +01:00
Armin Braun 3317169c4f
Fix GCS Blob Repository 3rd Party Tests (#43030) (#43913)
* We have to strip the trailing slash from child names here like we do for AWS
* closes #43029
2019-07-03 15:09:28 +02:00
James Rodewig e2a9a787fc [DOCS] Rewrite dis max query (#43586) 2019-07-03 08:56:18 -04:00
paulward24 cff027499a Ensure to access RecoveryState#fileDetails under lock
Closes #43840
2019-07-03 07:39:58 -04:00
Armin Braun 7059224668
Optimize Snapshot Finalization (#42723) (#43908)
* Optimize Snapshot Finalization

* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
2019-07-03 13:26:35 +02:00
Christoph Büscher 662f517f4e Add _reload_search_analyzers endpoint to HLRC (#43733)
This change adds the new endpoint that allows reloading of search analyzers to
the high-level java rest client.

Relates to #43313
2019-07-03 12:05:59 +02:00
Dimitris Athanasiou 96b0b27f18
[7.x][ML] Set df-analytics task state to failed when appropriate (#43880) (#43906)
This introduces a `failed` state to which the data frame analytics
persistent task is set to when something unexpected fails. It could
be the process crashing, the results processor hitting some error,
etc. The failure message is then captured and set on the task state.
From there, it becomes available via the _stats API as `failure_reason`.

The df-analytics stop API now has a `force` boolean parameter. This allows
the user to call it for a failed task in order to reset it to `stopped` after
we have ensured the failure has been communicated to the user.

This commit also adds the analytics version in the persistent task
params as this allows us to prevent tasks to run on unsuitable nodes in
the future.
2019-07-03 12:41:56 +03:00
Jay Modi 1e0f67fb38 Deprecate transport profile security type setting (#43237)
This commit deprecates the `transport.profiles.*.xpack.security.type`
setting. This setting is used to configure a profile that would only
allow client actions. With the upcoming removal of the transport client
the setting should also be deprecated so that it may be removed in
a future version.
2019-07-03 19:31:55 +10:00
Armin Braun 455b12a4fb
Add Ability to List Child Containers to BlobContainer (#42653) (#43903)
* Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
2019-07-03 11:30:49 +02:00
Alexander Reelsen 9077c4402f Watcher: Allow to execute actions for each element in array (#41997)
This adds the ability to execute an action for each element that occurs
in an array, for example you could sent a dedicated slack action for
each search hit returned from a search.

There is also a limit for the number of actions executed, which is
hardcoded to 100 right now, to prevent having watches run forever.

The watch history logs each action result and the total number of actions
the were executed.

Relates #34546
2019-07-03 11:28:50 +02:00
Tim Vernum 2a8f30eb9a
Support builtin privileges in get privileges API (#43901)
Adds a new "/_security/privilege/_builtin" endpoint so that builtin
index and cluster privileges can be retrieved via the Rest API

Backport of: #42134
2019-07-03 19:08:28 +10:00
Tim Vernum deacc2038e
Always attach system user to internal actions (#43902)
All valid licenses permit security, and the only license state where
we don't support security is when there is a missing license.
However, for safety we should attach the system (or xpack/security)
user to internally originated actions even if the license is missing
(or, more strictly, doesn't support security).

This allows all nodes to communicate and send internal actions (shard
state, handshake/pings, etc) even if a license is transitioning
between a broken state and a valid state.

Relates: #42215
Backport of: #43468
2019-07-03 19:07:16 +10:00
Henning Andersen cd2972239c AsyncIOProcessor preserve thread context (#43729)
AsyncIOProcessor now preserves thread context, ensuring that deprecation
warnings are not duplicated to other concurrent operations on the same
shard.
2019-07-03 10:22:20 +02:00
Tim Vernum 31b19bd022
Use separate BitSet cache in Doc Level Security (#43899)
Document level security was depending on the shared
"BitsetFilterCache" which (by design) never expires its entries.

However, when using DLS queries - particularly templated ones - the
number (and memory usage) of generated bitsets can be significant.

This change introduces a new cache specifically for BitSets used in
DLS queries, that has memory usage constraints and access time expiry.

The whole cache is automatically cleared if the role cache is cleared.
Individual bitsets are cleared when the corresponding lucene index
reader is closed.

The cache defaults to 50MB, and entries expire if unused for 7 days.

Backport of: #43669
2019-07-03 18:04:06 +10:00
Jim Ferenczi 05c0cff1b6 Fix index_prefix sub field name on nested text fields (#43862)
This change fixes the name of the index_prefix sub field when the `index_prefix`
option is set on a text field that is nested under an object or a multi-field.
We don't use the full path of the parent field to set the index_prefix field name
so the field is registered under the wrong name. This doesn't break queries since
we always retrieve the prefix field through its parent field but this breaks other
APIs like _field_caps which tries to find the parent of the `index_prefix` field
in the mapping but fails.

Closes #43741
2019-07-03 09:50:52 +02:00
Armin Braun 826f38cd70
Enable Parallel Deletes in Azure Repository (#42783) (#43886)
* Parallel deletes via private thread pool
2019-07-03 09:28:39 +02:00
Tanguy Leroux 365dfe88ca Refresh translog stats after translog trimming in NoOpEngine (#43825)
This commit changes NoOpEngine so that it refreshes its translog 
stats once translog is trimmed.

Relates #43156
2019-07-03 08:49:14 +02:00
Tim Vernum 461aa39daf
Switch WriteActionsTests.testBulk to use hamcrest (#43897)
If an item in the bulk request fails, that could be for a variety of
reasons - it may be that the underlying behaviour of security has
changed, or it may just be a transient failure during testing.

Simply asserting a `true`/`false` value produces failure messages that
are difficult to diagnose and debug. Using hamcert (`assertThat`) will
make it easier to understand the causes of failures in this test.

Backport of: #43725
2019-07-03 16:29:28 +10:00
Tim Vernum 14884c871f
Document API-Key APIs require manage_api_key priv (#43869)
Add the "Authorization" section to the API key API docs.
These APIs require The new manage_api_key cluster privilege.

Relates: #43865
Backport of: #43811
2019-07-03 13:51:44 +10:00
Jake Landis 6e9ccda2c5
ilm test - allow more time for policy completion (#43844) 2019-07-02 22:05:18 -05:00
Jake Landis 0a79f4ca70
Extend timeout for TimeSeriesLifecycleActionsIT> testFullPolicy (#43891) 2019-07-02 22:05:04 -05:00
Jake Landis 2dc056b0a0
Read the default pipeline for bulk upsert through an alias (#41963) (#42802)
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.

Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
2019-07-02 20:44:33 -05:00
Deb Adair a4e518b640 [DOCS] Revise GS intro and remove redundant conceptual content. Closes #43846. 2019-07-02 18:28:13 -07:00
Mayya Sharipova 756c42f99f
Add dims parameter to dense_vector mapping (#43444) (#43895)
Typically, dense vectors of both documents and queries must have the same
number of dimensions. Different number of dimensions among documents
or query vector indicate an error. This PR enforces that all vectors
for the same field have the same number of dimensions. It also enforces
that query vectors have the same number of dimensions.
2019-07-02 21:14:16 -04:00
Benjamin Trent fb825a6470
[7.x] [ML][Data Frame] add node attr to GET _stats (#43842) (#43894)
* [ML][Data Frame] add node attr to GET _stats (#43842)

* [ML][Data Frame] add node attr to GET _stats

* addressing testing issues with node.attributes

* adjusting for backport
2019-07-02 19:35:37 -05:00
Benjamin Trent 2c97e26ce8
[ML][Data Frame] fix progress measurement for continuous transforms (#43838) (#43887)
* [ML][Data Frame] fix progress measurement for continuous transforms

* Update DataFrameIndexer.java
2019-07-02 19:35:09 -05:00
Jack Conradson 8755448a18 Add Datetime Now to Painless Documentation (#43852)
This change explains why Painless doesn't natively support datetime now, and 
gives examples of how to create a version of now through user-defined 
parameters.
2019-07-02 15:43:34 -07:00
Jake Landis eb73bed40d
7x watcher backport testfixes (#43848)
* fix org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests (#41777)

This commit un-mutes org.elasticsearch.xpack.watcher.test.integration.RejectedExecutionTests
which was failing intermittently due to a logic bug. It is not possible to use the real
Watcher scheduler (which is needed for this test) and reliabliby count the .triggered-watches
since current count of documents in the .triggered-watches index is based on the timing of the
scheduler and the ability to delete based on the Watcher and Write thread pools.

This commit simply removes the .triggered-watch check and relies soley on the .watcher-history
index as an indication that operations that can occur when the Watcher threadpool is rejecting.

closes #41734

* fix unlikely bug that can prevent Watcher from restarting (#42030)

The bug fixed here is unlikely to happen. It requires ES to be started with
ILM disabled, Watcher enabled, and Watcher explicitly stopped and restarted.
Due to template validation Watcher does not fully start and can result in a
partially started state. This is an unlikely scenerio outside of the testing
framework.

Note - this bug was introduced while the test that would have caught it was
muted. The test remains muted since the underlying cuase of the random failures
has not been identified. When this test is un-muted it will now work.
2019-07-02 12:16:06 -05:00
David Roberts 8e44f5d845 [ML-Data Frame] Add data frame transform cluster privileges to HLRC (#43879)
Adds the monitor_data_frame_transforms and
manage_data_frame_transforms cluster privileges to
the high level rest client.

The ALL_ARRAY variable is only used in randomized
tests at the within the Elasticsearch code, so it's
not a major problem that these cluster privileges
weren't added from the start.  But since ALL_ARRAY
is public HLRC users may be using it to find out
which cluster privileges exist, so it's best that
it contains them all.
2019-07-02 17:52:15 +01:00
Christoph Büscher 31cf96e7bf Return reloaded analyzers in _reload_search_ananlyzer response (#43813)
Currently the repsonse of the "_reload_search_analyzer" endpoint contains the
index names and nodeIds of indices were analyzers reloading was triggered. This
change add the names of the search-time analyzers that were reloaded.

Closes #43804
2019-07-02 18:51:15 +02:00
Yannick Welsch cc7c5ab2c0 Clarify voting-only master node docs (#43857)
Clarifies the roles of a dedicated voting-only master-eligible node.

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Co-Authored-By: David Turner <david.turner@elastic.co>
2019-07-02 18:49:40 +02:00
Nhat Nguyen 697cd494bf Remove sort by primary term when reading soft-deletes (#43845)
With Lucene rollback (#33473), we should never have more than one
primary term for each sequence number. Therefore we don't have to sort
by the primary term when reading soft-deletes.
2019-07-02 10:54:32 -04:00
Dimitris Athanasiou 1ea53979b5
[7.x][ML] Get df-analytics action should require monitor privilege (#43831) (#43866) 2019-07-02 16:00:54 +03:00
Tim Vernum 8d099dad38
Add "manage_api_key" cluster privilege (#43865)
This adds a new cluster privilege for manage_api_key. Users with this
privilege are able to create new API keys (as a child of their own
user identity) and may also get and invalidate any/all API keys
(including those owned by other users).

Backport of: #43728
2019-07-02 21:57:42 +10:00
Benjamin Trent b95ee7ebb2
[7.x] [ML][Data Frame] using transform creation version for node assignment (#43764) (#43843)
* [ML][Data Frame] using transform creation version for node assignment (#43764)

* [ML][Data Frame] using transform creation version for node assignment

* removing unused imports

* Addressing PR comment

* adjusing for backport
2019-07-02 06:52:34 -05:00