Commit Graph

46631 Commits

Author SHA1 Message Date
James Baiera c357f81aa7 Add soft limit for max concurrent policy executions (#43117)
Adds a global soft limit on the number of concurrently executing enrich policies.
Since an enrich policy is run on the generic thread pool, this is meant to limit
policy runs separately from the generic thread pool capacity.
2019-07-23 16:03:14 -04:00
James Baiera fc20264b99 Add Enrich index background task to cleanup old indices (#43746)
This PR adds a background maintenance task that is scheduled on the master node only.
The deletion of an index is based on if it is not linked to a policy or if the enrich alias is not
currently pointing at it. Synchronization has been added to make sure that no policy
executions are running at the time of cleanup, and if any executions do occur, the marking
process delays cleanup until next run.
2019-07-22 14:41:22 -04:00
James Baiera 7ad9beb087 Set auto expand replicas on enrich index after force merge is done. (#43600) 2019-07-12 11:56:56 -04:00
Michael Basnight b4b2ad3593 Ensure enrich policy is immutable (#43604)
This commit ensures the policy cannot be overwritten. An error is thrown
if the policy exists. All tests have been updated accordingly.
2019-07-11 13:23:12 -05:00
Michael Basnight d2c3f4bae9 Validate read priv of enrich source indices (#43595)
This commit adds permissions validation on the indices provided in the
enrich policy. These indices should be validated at store time so as not
to have cryptic error messages in the event the user does not have
permissions to access said indices.
2019-07-10 13:09:10 -05:00
Martijn van Groningen adc06ffd89
take builtin role into account in docs tests 2019-07-05 08:06:18 +02:00
Martijn van Groningen 9528c59fb3
added a basic test that enriching data works 2019-07-04 17:42:45 +02:00
Martijn van Groningen 1dd3d14f09
take into account `manage_enrich` builtin role 2019-07-04 16:51:48 +02:00
Martijn van Groningen ac119b07e7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-04 15:50:11 +02:00
Benjamin Trent 36f7259737 [ML] Fix datafeed checks when a concrete remote index is present (#43923)
A bug was introduced in 6.6.0 when we added support for
rollup indices. Rollup caps does NOT support looking at
remote indices, consequently, since we always look up rollup
caps, the datafeed fails with an error if its config
includes a concrete remote index.  (When all remote indices
in a datafeed config are wildcards the problem did not
occur.)

The rollups feature does not support remote indices, so if
there is any remote index in a datafeed config (wildcarded
or not), we can skip the rollup cap checks.  This PR
implements that change.
2019-07-04 13:31:45 +01:00
Dimitris Athanasiou 2a70df424d
[TEST][ML] Fix assertion after starting df-analytics job (#43957) (#43967)
In MachineLearningIT.testStopDataFrameAnalytics we call start and
then assert the state is `started`. However, if things go fast enough,
the state could have already changed to `reindexing` or `analyzing`.
The test has been failing occasionally due to the state being
`reindexing`. We fix this by simply asserting the state is either
of `started`, `reindexing` or `analyzing`.

Closes #43924
2019-07-04 15:17:36 +03:00
Martijn van Groningen 7ba6e1752a
required changes after merge 2019-07-04 13:17:22 +02:00
Martijn van Groningen 653f1436a0
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-04 13:05:10 +02:00
Alan Woodward 4b99255fed Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:28:55 +01:00
Alpar Torok 1b6109517a Mute failing test
Tracking in #43960
2019-07-04 12:13:02 +03:00
Jim Ferenczi 2cc0a56fe6 Fix wrong logic in `match_phrase` query with multi-word synonyms (#43941)
Disjunction over two individual terms in a phrase query with multi-word synonyms
wrongly applies a prefix query to each of these terms. This change fixes this bug
by inversing the logic to use prefixes on `phrase_prefix` queries only.

Closes #43308
2019-07-04 09:39:39 +02:00
Henning Andersen cacc3f7ff8 Async IO Processor release before notify (#43682)
This commit changes async IO processor to release the promiseSemaphore
before notifying consumers. This ensures that a bad consumer that
sometimes does blocking (or otherwise slow) operations does not halt the
processor. This should slightly increase the concurrency for shard
fsync, but primarily improves safety so that one bad piece of code has
less effect on overall system performance.
2019-07-04 06:33:38 +02:00
Igor Motov c593085104 Geo: Refactors libs/geo parser to provide serialization logic as well (#43717)
Enables libs/geo parser to return a geometry format object that can
perform both serialization and deserialization functions. This can
be useful for ingest nodes that are trying to modify an existing
geometry in the source.

Relates to #43554
2019-07-03 19:31:44 -04:00
Benjamin Trent 7063a40411
[7.x] [ML][Data Frame] Adding bwc tests for pivot transform (#43506) (#43929)
* [ML][Data Frame] Adding bwc tests for pivot transform (#43506)

* [ML][Data Frame] Adding bwc tests for pivot transform

* adding continuous transforms

* adding continuous dataframes to bwc

* adding continuous data frame tests

* Adding rolling upgrade tests for continuous df

* Fixing test

* Adjusting indices used in BWC, and handling NPE for seq_no_stats

* updating and muting specific bwc test

* Adjusting bwc tests for backport
2019-07-03 16:39:38 -05:00
Przemyslaw Gomulka 553f783e73
Fix DieWithDignity test when waiting on jps backport(#43861) (#43871)
the test often hangs on executing jps command
we don't need to wait for this command to finish.

closes #43413
2019-07-03 20:39:48 +02:00
Adrien Grand 680edbe3f1
Bump current version to 7.4. (#43927) 2019-07-03 20:32:04 +02:00
Lisa Cawley 50e96f9f0e
[DOCS] Updates documentation version (#43937) 2019-07-03 11:09:34 -07:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Alan Woodward 49d69bf987 Actually close IndexAnalyzers contents (#43914)
IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
2019-07-03 16:06:58 +01:00
Alpar Torok 3250cc53f0 Mute failing test
Tracked in #43924
2019-07-03 17:43:40 +03:00
Martijn van Groningen 397150fa1e
Add enrich coordinator proxy action (#43801)
Introduced proxy api the handle the search request load that originates
from enrich processor. The enrich processor can execute many search
requests that execute asynchronously in parallel and that can easily overwhelm
the search thread pool on nodes. In order to protect this the Coordinator
queues the search requests and only executes a fixed number of search requests
in parallel.

Besides this; the Coordinator tries to include as much as possible search requests
(up to a defined maximum) inside a multi search request in order to reduce the number
of remote api calls to be made from the node that performs ingestion.
2019-07-03 15:50:40 +02:00
Zachary Tong f8fd4321f8 Link rare_terms docs from index page (#43882)
Docs for rare_terms were added in #35718, but neglected to
link it from the bucket index page
2019-07-03 09:32:01 -04:00
David Turner 9cecc31cdc Shortcut simple patterns ending in `*` (#43904)
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      433.190 ±(99.9%) 0.644 ns/op [Average]
      (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
      CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

    @Fork(3)
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @Benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
2019-07-03 14:15:27 +01:00
Armin Braun 3317169c4f
Fix GCS Blob Repository 3rd Party Tests (#43030) (#43913)
* We have to strip the trailing slash from child names here like we do for AWS
* closes #43029
2019-07-03 15:09:28 +02:00
James Rodewig e2a9a787fc [DOCS] Rewrite dis max query (#43586) 2019-07-03 08:56:18 -04:00
paulward24 cff027499a Ensure to access RecoveryState#fileDetails under lock
Closes #43840
2019-07-03 07:39:58 -04:00
Armin Braun 7059224668
Optimize Snapshot Finalization (#42723) (#43908)
* Optimize Snapshot Finalization

* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
2019-07-03 13:26:35 +02:00
Christoph Büscher 662f517f4e Add _reload_search_analyzers endpoint to HLRC (#43733)
This change adds the new endpoint that allows reloading of search analyzers to
the high-level java rest client.

Relates to #43313
2019-07-03 12:05:59 +02:00
Dimitris Athanasiou 96b0b27f18
[7.x][ML] Set df-analytics task state to failed when appropriate (#43880) (#43906)
This introduces a `failed` state to which the data frame analytics
persistent task is set to when something unexpected fails. It could
be the process crashing, the results processor hitting some error,
etc. The failure message is then captured and set on the task state.
From there, it becomes available via the _stats API as `failure_reason`.

The df-analytics stop API now has a `force` boolean parameter. This allows
the user to call it for a failed task in order to reset it to `stopped` after
we have ensured the failure has been communicated to the user.

This commit also adds the analytics version in the persistent task
params as this allows us to prevent tasks to run on unsuitable nodes in
the future.
2019-07-03 12:41:56 +03:00
Jay Modi 1e0f67fb38 Deprecate transport profile security type setting (#43237)
This commit deprecates the `transport.profiles.*.xpack.security.type`
setting. This setting is used to configure a profile that would only
allow client actions. With the upcoming removal of the transport client
the setting should also be deprecated so that it may be removed in
a future version.
2019-07-03 19:31:55 +10:00
Armin Braun 455b12a4fb
Add Ability to List Child Containers to BlobContainer (#42653) (#43903)
* Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
2019-07-03 11:30:49 +02:00
Alexander Reelsen 9077c4402f Watcher: Allow to execute actions for each element in array (#41997)
This adds the ability to execute an action for each element that occurs
in an array, for example you could sent a dedicated slack action for
each search hit returned from a search.

There is also a limit for the number of actions executed, which is
hardcoded to 100 right now, to prevent having watches run forever.

The watch history logs each action result and the total number of actions
the were executed.

Relates #34546
2019-07-03 11:28:50 +02:00
Tim Vernum 2a8f30eb9a
Support builtin privileges in get privileges API (#43901)
Adds a new "/_security/privilege/_builtin" endpoint so that builtin
index and cluster privileges can be retrieved via the Rest API

Backport of: #42134
2019-07-03 19:08:28 +10:00
Tim Vernum deacc2038e
Always attach system user to internal actions (#43902)
All valid licenses permit security, and the only license state where
we don't support security is when there is a missing license.
However, for safety we should attach the system (or xpack/security)
user to internally originated actions even if the license is missing
(or, more strictly, doesn't support security).

This allows all nodes to communicate and send internal actions (shard
state, handshake/pings, etc) even if a license is transitioning
between a broken state and a valid state.

Relates: #42215
Backport of: #43468
2019-07-03 19:07:16 +10:00
Henning Andersen cd2972239c AsyncIOProcessor preserve thread context (#43729)
AsyncIOProcessor now preserves thread context, ensuring that deprecation
warnings are not duplicated to other concurrent operations on the same
shard.
2019-07-03 10:22:20 +02:00
Tim Vernum 31b19bd022
Use separate BitSet cache in Doc Level Security (#43899)
Document level security was depending on the shared
"BitsetFilterCache" which (by design) never expires its entries.

However, when using DLS queries - particularly templated ones - the
number (and memory usage) of generated bitsets can be significant.

This change introduces a new cache specifically for BitSets used in
DLS queries, that has memory usage constraints and access time expiry.

The whole cache is automatically cleared if the role cache is cleared.
Individual bitsets are cleared when the corresponding lucene index
reader is closed.

The cache defaults to 50MB, and entries expire if unused for 7 days.

Backport of: #43669
2019-07-03 18:04:06 +10:00
Jim Ferenczi 05c0cff1b6 Fix index_prefix sub field name on nested text fields (#43862)
This change fixes the name of the index_prefix sub field when the `index_prefix`
option is set on a text field that is nested under an object or a multi-field.
We don't use the full path of the parent field to set the index_prefix field name
so the field is registered under the wrong name. This doesn't break queries since
we always retrieve the prefix field through its parent field but this breaks other
APIs like _field_caps which tries to find the parent of the `index_prefix` field
in the mapping but fails.

Closes #43741
2019-07-03 09:50:52 +02:00
Armin Braun 826f38cd70
Enable Parallel Deletes in Azure Repository (#42783) (#43886)
* Parallel deletes via private thread pool
2019-07-03 09:28:39 +02:00
Tanguy Leroux 365dfe88ca Refresh translog stats after translog trimming in NoOpEngine (#43825)
This commit changes NoOpEngine so that it refreshes its translog 
stats once translog is trimmed.

Relates #43156
2019-07-03 08:49:14 +02:00
Tim Vernum 461aa39daf
Switch WriteActionsTests.testBulk to use hamcrest (#43897)
If an item in the bulk request fails, that could be for a variety of
reasons - it may be that the underlying behaviour of security has
changed, or it may just be a transient failure during testing.

Simply asserting a `true`/`false` value produces failure messages that
are difficult to diagnose and debug. Using hamcert (`assertThat`) will
make it easier to understand the causes of failures in this test.

Backport of: #43725
2019-07-03 16:29:28 +10:00
Tim Vernum 14884c871f
Document API-Key APIs require manage_api_key priv (#43869)
Add the "Authorization" section to the API key API docs.
These APIs require The new manage_api_key cluster privilege.

Relates: #43865
Backport of: #43811
2019-07-03 13:51:44 +10:00
Jake Landis 6e9ccda2c5
ilm test - allow more time for policy completion (#43844) 2019-07-02 22:05:18 -05:00
Jake Landis 0a79f4ca70
Extend timeout for TimeSeriesLifecycleActionsIT> testFullPolicy (#43891) 2019-07-02 22:05:04 -05:00
Jake Landis 2dc056b0a0
Read the default pipeline for bulk upsert through an alias (#41963) (#42802)
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.

Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
2019-07-02 20:44:33 -05:00
Deb Adair a4e518b640 [DOCS] Revise GS intro and remove redundant conceptual content. Closes #43846. 2019-07-02 18:28:13 -07:00