Commit Graph

40684 Commits

Author SHA1 Message Date
David Roberts 2608012422
Add temporary directory cleanup workarounds (#32615)
On some Linux distributions tmpfiles.d cleans files and
directories under /tmp if they haven't been accessed for
10 days.

This can cause problems for ML as ML is currently the only
component that uses the temp directory more than a few
seconds after startup. If you didn't open an ML job for
10 days and then tried to open one then the temp directory
would have been deleted.

This commit prevents the problem occurring in the case of
Elasticsearch being managed by systemd, as systemd private
temp directories are not subject to periodic cleanup (by
default).

Additionally there are now some docs to warn people about
the risk and suggest a manual mitigation for .tar.gz users.
2018-08-07 16:59:56 +01:00
Benjamin Trent 6d50d8b5a9
Adding job process pojos to protocol pkg (#32657)
* Adding job process pojos to protocol pkg

* Removing unused `RESULTS_FIELD`

* Addressing PR comments, removing unnecessary methods
2018-08-07 10:51:52 -05:00
Lee Hinman b3e15851a2 [TEST] Comment out account breaker assertion while diagnosing
Relates to #30290
2018-08-07 09:36:37 -06:00
Yannick Welsch 45066b5e89
Verify primary mode usage with assertions (#32667)
Primary terms were introduced as part of the sequence-number effort (#10708) and added in ES
5.0. Subsequent work introduced the replication tracker which lets the primary own its replication
group (#25692) to coordinate recovery and replication. The replication tracker explicitly exposes
whether it is operating in primary mode or replica mode, independent of the ShardRouting object
that's associated with a shard. During a primary relocation, for example, the primary mode is
transferred between the primary relocation source and the primary relocation target. After
transferring this so-called primary context, the old primary becomes a replication target and the
new primary the replication source, reflected in the replication tracker on both nodes. With the
most recent PR in this area (#32442), we finally have a clean transition between a shard that's
operating as a primary and issuing sequence numbers and a shard that's serving as a replication
target. The transition from one state to the other is enforced through the operation-permit system,
where we block permit acquisition during such changes and perform the transition under this
operation block, ensuring that there are no operations in progress while the transition is being
performed. This finally allows us to turn the best-effort checks that were put in place to prevent
shards from being used in the wrong way (i.e. primary as replica, or replica as primary) into hard
assertions, making it easier to catch any bugs in this area.
2018-08-07 15:02:37 +02:00
Paul Sanwald 3ce984d746 mute test while I work on #32215 2018-08-07 08:56:00 -04:00
Andrey Ershov 6449d9bc14
Include translog path in error message when translog is corrupted (#32251)
Currently, when TranslogCorruptedException is thrown most of the times it does not contain information about the translog location on the file system. There is the translog recovery tool that accepts the translog path as an argument and users are constantly puzzled where to get the path.
This pull request adds "source" information to every TranslogCorruptedException thrown. The source could be local file, remote translog source (used for recovery), assertion (translog entry is constructed to perform some assertion) or translog constructed inside the test.
Closes #24929
2018-08-07 13:03:43 +02:00
Albert Zaharovits 1f50950099 Add @AwaitsFix for #32673 2018-08-07 13:22:12 +03:00
Parth Verma 6fe6247dc8 Ignore script fields when size is 0 (#31917)
This change adds a check so that when parsing the search source, script fields are 
ignored when the requested search result size is 0. This helps with e.g. clients like 
Kibana that sends a list of script fields that they may need for convenience, but they
don't require any hits. Before this change, user sometimes ran into confusing behaviour, 
e.g. the script compilation limit to breaking although no hits were requested.

Closes #31824
2018-08-07 10:56:44 +02:00
simonzheng ab81078949 [Docs] Correct a small typo (#32655) 2018-08-07 10:34:55 +02:00
Armin Braun f57cb10d2c
Tests: Fix Typo Causing Flaky Settings Test (#32665)
* We were comparing the wrong timeout value in the `randomValueOtherThan` call here, leading to no mutation happening for a certain seed
* closes #32639
2018-08-07 10:30:45 +02:00
Tanguy Leroux 1122314b3b
[Rollup] Remove builders from GroupConfig (#32614) 2018-08-07 09:39:24 +02:00
Jason Tedor 3fb0923182
Fix content type detection with leading whitespace (#32632)
Today content type detection on an input stream works by peeking up to
twenty bytes into the stream. If the stream is headed by more whitespace
than twenty bytes, we might fail to detect the content type. We should
be ignoring this whitespace before attempting to detect the content
type. This commit does that by ignoring all leading whitespace in an
input stream before attempting to guess the content type.
2018-08-06 18:07:46 -04:00
Jack Conradson b46e13629f
Docs: Allow snippets to have line continuation (#32649)
Currently, snippets in lists cannot be rendered correctly as a console command because the console command requires a line continuation '+'. This allows snippets to have a line continuation between the snippet and the // CONSOLE.
2018-08-06 14:43:53 -07:00
Armin Braun 4dda5a990b
INGEST: Fix ThreadWatchDog Throwing on Shutdown (#32578)
* INGEST: Fix ThreadWatchDog Throwing on Shutdown

* #32539 is caused by the fact that ThreadWatchDog.Default could throw on shutdown if the ThreadPool is interrupted while `interruptLongRunningExecutions` is in progress. This is a result of the watchdog not having a lifecycle of its own (normally it terminates when the threadpool terminates).
  * We can't easily use `org.elasticsearch.common.util.concurrent.EsRejectedExecutionException#isExecutorShutdown` to catch this state the same way other components do since thatwould require adding the core lib to Grok as a dependency
  * Since we have no knowledge of the lifecycle in this compontent since we're only passed the scheduler `BiFunction` I fixed this by only scheduling the watchdog when there's actually registered threads in it.
    * I think using the patter of locking via two `Atomic*` values should not be much of a performance concern here under load since either the integer will likely be > 0 in this case (because we have multiple Grok in parallel) or the running state will be true because there likely was at least one thread registered when the watchdog ran and so the enqueing of the watchdog task during `register` will happen very rarely here (in the worst case scenario of only a single Grok thread it will happen less frequently than once every `ingest.grok.watchdog.interval`). The atomic update on the count should not be relevant relative to the cost of adding a new node to the CHM either.
* Fixes #32539
  * Also fixes the watchdog to run if it doens't have to in general.
2018-08-06 22:46:26 +02:00
Benjamin Trent b2a0f38a0c
Adding xpack.core.ml.datafeed to protocol.xpack.ml.datafeed (#32625)
* Adding org.elasticsearch.xpack.core.ml.datafeed to org.elasticsearch.protocol.xpack.ml.datafeed

* removing unused ParseField and import

* Addressing PR feed back and fixing tests

* Simplifying Datafeed(Config|Update) ctor parser
2018-08-06 15:33:18 -05:00
DeDe Morton e01e4393a8
[Docs] Light edit to info about docker images (#32376) 2018-08-06 12:00:07 -07:00
Nhat Nguyen 919888eba7 TEST: Enable debug log testValidateFollowingIndexSettings 2018-08-06 14:55:56 -04:00
Yannick Welsch 014b2772db [TEST] Fix testReplicaTermIncrementWithConcurrentPrimaryPromotion
The assertion in the test was not broad enough. If the timing is very unlucky, the
shard is already promoted to primary before the indexOnReplica even gets to execute.

Closes #32645
2018-08-06 18:38:01 +02:00
Nhat Nguyen c394eb9ae9 CCR: Expose the operation primary term
Relates #32442
2018-08-06 10:55:37 -04:00
Nhat Nguyen 5881322b3f Merge branch 'master' into ccr
* master:
  Cross-cluster search: preserve cluster alias in shard failures (#32608)
  Handle AlreadyClosedException when bumping primary term
  [TEST] Allow to run in FIPS JVM (#32607)
  [Test] Add ckb to the list of unsupported languages (#32611)
  SCRIPTING: Move Aggregation Scripts to their own context (#32068)
  Painless: Use LocalMethod Map For Lookup at Runtime (#32599)
  [TEST] Enhance failure message when bulk updates have failures
  [ML] Add ML result classes to protocol library (#32587)
  Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613)
  [Rollup] Update wire version check after backport
  Suppress Wildfly test in FIPS JVMs (#32543)
  [Rollup] Improve ID scheme for rollup documents (#32558)
  ingest: doc: move Dot Expander Processor doc to correct position (#31743)
  [ML] Add some ML config classes to protocol library (#32502)
  [TEST]Split transport verification mode none tests (#32488)
  Core: Move helper date formatters over to java time (#32504)
  [Rollup] Remove builders from DateHistogramGroupConfig (#32555)
  [TEST} unmutes SearchAsyncActionTests and adds debugging info
  [ML] Add Detector config classes to protocol library (#32495)
  [Rollup] Remove builders from MetricConfig (#32536)
  Tests: Add rolling upgrade tests for watcher (#32428)
  Fix race between replica reset and primary promotion (#32442)
2018-08-06 10:27:18 -04:00
Igor Motov e641fccfe3
Rest HL client: Add get license action (#32438)
Rest HL client: Add get license action

Continues to use String instead of a more complex License class to
hold the license text similarly to put license.

Relates #29827
2018-08-06 07:15:40 -07:00
Yogesh Gaikwad 615aa85f4e
[Kerberos] Use canonical host name (#32588)
The Apache Http components support for Spnego scheme
uses canonical name by default.
Also when resolving host name, on centos by default
there are other aliases so adding them to the
DelegationPermission.

Closes#32498
2018-08-06 23:51:43 +10:00
Armin Braun 0a67cb4133
LOGGING: Upgrade to Log4J 2.11.1 (#32616)
* LOGGING: Upgrade to Log4J 2.11.1
* Upgrade to `2.11.1` to fix memory leaks in slow logger when logging large requests
   * This was caused by a bug in Log4J https://issues.apache.org/jira/browse/LOG4J2-2269 and is fixed in `2.11.1` via https://git-wip-us.apache.org/repos/asf?p=logging-log4j2.git;h=9496c0c
* Fixes #32537
* Fixes #27300
2018-08-06 14:56:21 +02:00
Jason Tedor 3b739b9fd5
Avoid NPE on shard changes action (#32630)
If a leader index is deleted while there is an active follower, the
follower will send shard changes requests bound for the leader
index. Today this will result in a null pointer exception because there
will not be an index routing table for the index. A null pointer
exception looks like a bug to a user so this commit addresses this by
throwing an index not found exception instead.
2018-08-06 08:01:47 -04:00
Luca Cavanna 826399f9fc
Cross-cluster search: preserve cluster alias in shard failures (#32608)
When some remote clusters return shard failures as part of a cross-cluster search request, the cluster alias currently gets lost. As a result, if the shard failures are all caused by the same error, and against indices belonging to different clusters, but with the same index name, only one failure gets returned as part of the search response, meaning that failures are grouped by index name, ignoring the cluster alias.

With this commit we make sure that `ShardSearchFailure` returns the cluster alias as part of the index name. Also, we set the fully qualfied index name when creating a `QueryShardException`. That way shard failures are grouped by cluster:index. Such fixes should cover at least most of the cases where either 1) the shard target is set but we don't have the index in the cause (we were previously reading it only from the cause that did not have the cluster alias) 2) the shard target is missing but if the cause is a `QueryShardException` the cluster alias does not get lost.

We also prevent NPE in case the failure cause is not set and test such scenario.
2018-08-06 11:48:50 +02:00
Yannick Welsch 3cf08326ab Handle AlreadyClosedException when bumping primary term
If the shard is already closed while bumping the primary term, this can result in an
AlreadyClosedException to be thrown. As we use asyncBlockOperations, the exception
will be thrown on a thread from the generic thread pool and end up in the uncaught
exception handler, failing our tests.

Relates to #32442
2018-08-06 08:34:38 +02:00
Ioannis Kakavas 66edba2012 [TEST] Allow to run in FIPS JVM (#32607)
* Change SecurityNioHttpServerTransportTests to use PEM key and
certificate files instead of a JKS keystore so that this tests
can also run in a FIPS 140 JVM
* Do not attempt to run cases with ssl.verification_mode NONE in
SessionFactoryTests so that the tests can run in a FIPS 140 JVM
2018-08-06 07:42:26 +03:00
Ioannis Kakavas ceb1ae4d7b [Test] Add ckb to the list of unsupported languages (#32611) 2018-08-06 10:00:45 +10:00
Jason Tedor 1a39f1d6c5
Fix CCR stats assertions
This commit addresses a race that can happen in the basic CCR stats REST
tests. Namely, peek reads can fire before the REST test client fires the
stats request. This means that we have to weaken our assertions about
the expected stats response.
2018-08-05 08:53:08 -04:00
Armin Braun 6fa7016bbf
SCRIPTING: Move Aggregation Scripts to their own context (#32068)
* SCRIPTING: Move Aggregation Scripts to their own context
2018-08-04 10:37:07 +02:00
Jack Conradson 6ca24e13af
Painless: Use LocalMethod Map For Lookup at Runtime (#32599)
This modifies Def to use a Map<String, LocalMethod> to look up user-defined methods at runtime 
instead of writing constant methodhandles to do the reverse lookup. This creates a consistency 
between how LocalMethods are looked up at compile-time and run-time. This consistency will allow 
this code to be more maintainable moving forward. This will also allow FunctionReference to be 
cleaned up in a follow up PR.
2018-08-03 15:22:30 -07:00
Lee Hinman 1e4751ec47 [TEST] Enhance failure message when bulk updates have failures 2018-08-03 15:27:10 -06:00
David Roberts b99aa81fe4
[ML] Add ML result classes to protocol library (#32587)
This commit adds the ML results classes to the X-Pack protocol
library used by the high level REST client.

(Other commits will add the config classes and stats classes.)

These classes:

- Are publically immutable
- Are privately mutable - this is perhaps not as nice as the
  config classes, but to do otherwise would require adding
  builders and the corresponding server-side classes that the
  old transport client used don't have builders
- Have little/no validation of field values beyond null checks
- Are convertible to and from X-Content, but NOT wire transportable
- Have lenient parsers to maximize compatibility across versions
- Have the same class names and getter names as the corresponding
  classes in X-Pack core to ease migration for transport client
  users
- Don't reproduce all the methods that do calculations or
  transformations that the the corresponding classes in X-Pack core
  have
2018-08-03 20:48:38 +01:00
Igor Motov ada80d7fc8
Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613)
The testPutLicense test tries to put a license generated using
snapshot keys into release cluster. This commit suppresses the
test during the release builds.

Closes #32580
2018-08-03 11:59:51 -07:00
Zachary Tong 992ec4be5d [Rollup] Update wire version check after backport
Bumping down the version to 6.4 since the backport is complete.  Also
adds some missing version checks to the bwc tests to make sure it
only runs on the correct versions
2018-08-03 14:09:01 -04:00
David Turner e3cc33756e
Suppress Wildfly test in FIPS JVMs (#32543)
WildflyIT fails on FIPS-enabled JVMs. This change mutes this test suite on such
JVMs. Relates #32534.
2018-08-03 17:57:30 +01:00
Zachary Tong fc9fb64ad5
[Rollup] Improve ID scheme for rollup documents (#32558)
Previously, we were using a simple CRC32 for the IDs of rollup documents.
This is a very poor choice however, since 32bit IDs leads to collisions
between documents very quickly.

This commit moves Rollups over to a 128bit ID.  The ID is a concatenation
of all the keys in the document (similar to the rolling CRC before),
hashed with 128bit Murmur3, then base64 encoded.  Finally, the job
ID and a delimiter (`$`) are prepended to the ID.

This gurantees that there are 128bits per-job.  128bits should
essentially remove all chances of collisions, and the prepended
job ID means that _if_ there is a collision, it stays "within"
the job.

BWC notes:

We can only upgrade the ID scheme after we know there has been a good
checkpoint during indexing.  We don't rely on a STARTED/STOPPED
status since we can't guarantee that resulted from a real checkpoint,
or other state.  So we only upgrade the ID after we have reached
a checkpoint state during an active index run, and only after the
checkpoint has been confirmed.

Once a job has been upgraded and checkpointed, the version increments
and the new ID is used in the future.  All new jobs use the
new ID from the start
2018-08-03 11:13:25 -04:00
Jake Landis 3d4c84f7ca
ingest: doc: move Dot Expander Processor doc to correct position (#31743)
No changes to the content.
2018-08-03 07:21:05 -07:00
Jason Tedor 32c2759bb9
Remove extra blank line in CcrStatsAction.java
This commit removes an extra blank line that was accidentally committed
to CcrStatsAction.java.
2018-08-03 09:55:04 -04:00
Jason Tedor d640c9ddf9
Introduce CCR stats endpoint (#32350)
This commit introduces the CCR stats endpoint which provides shard-level
stats on the status of CCR follower tasks.
2018-08-03 09:09:45 -04:00
David Roberts bc274b2ff2
[ML] Add some ML config classes to protocol library (#32502)
This commit adds four ML config classes to the X-Pack protocol
library used by the high level REST client.

(Other commits will add the remaining config classes, plus results
and stats classes.)

These classes:

- Are immutable
- Have little/no validation of field values beyond null checks
- Are convertible to and from X-Content, but NOT wire transportable
- Have lenient parsers to maximize compatibility across versions
- Have the same class names, member names and getter/setter names
  as the corresponding classes in X-Pack core to ease migration
  for transport client users
- Don't reproduce all the methods that do calculations or
  transformations that the the corresponding classes in X-Pack core
  have
2018-08-03 13:21:08 +01:00
Ioannis Kakavas 1ee6393117
[TEST]Split transport verification mode none tests (#32488)
This commit splits SecurityNetty4TransportTests in two methods
one handling verification mode certificate and full and one
handling verification mode none. This is done so that the second
method can be muted in a FIPS 140 JVM where verification mode none
cannot be used.
2018-08-03 14:44:40 +03:00
Alexander Reelsen 018e77cac6
Core: Move helper date formatters over to java time (#32504)
Some classes use internal date formatters, which now can be moved over
to java time using the DateFormatters class.

The same applies for a few test cases.
2018-08-03 13:21:14 +02:00
Tanguy Leroux 21f660d801
[Rollup] Remove builders from DateHistogramGroupConfig (#32555)
Same motivation as #32507 but for the DateHistogramGroupConfig
configuration object. This pull request also changes the format of the
time zone from a Joda's DateTimeZone to a simple String.

It should help to port the API to the high level rest client and allows
clients to not be forced to use the Joda Time library. Serialization is
impacted but does not need a backward compatibility layer as
DateTimeZone are serialized as String anyway. XContent also expects
a String for timezone, so I found it easier to move everything to String.

Related to #29827
2018-08-03 13:11:00 +02:00
Colin Goodheart-Smithe d05f39de8b
[TEST} unmutes SearchAsyncActionTests and adds debugging info
This unmutes the testFanOutAndCollect()` method and add a check to make
sure we aren't accidentally running something twice causing a search
phase to still be running after we have counted down the latch

Relates to #29242
2018-08-03 11:52:46 +01:00
David Roberts eb17128b9c
[ML] Add Detector config classes to protocol library (#32495)
This commit adds the Detector class and its dependencies to the
X-Pack protocol library used by the high level REST client.

(Future commits will add the remaining config classes, plus results
and stats classes.)

These classes:

- Are immutable, with builders, but the builders do no validation
  beyond null checks
- Are convertible to and from X-Content, but NOT wire transportable
- Have lenient parsers to maximize compatibility across versions
- Have the same class names, member names and getter/setter names
  as the corresponding classes in X-Pack core to ease migration
  for transport client users
- Don't reproduce all the methods that do calculations or
  transformations that the the corresponding classes in X-Pack core
  have
2018-08-03 10:39:29 +01:00
Tanguy Leroux 937dcfd716
[Rollup] Remove builders from MetricConfig (#32536)
Related to #29827
2018-08-03 10:01:20 +02:00
Alexander Reelsen f809d6fff4
Tests: Add rolling upgrade tests for watcher (#32428)
These tests ensure, that the basic watch APIs are tested in the rolling
upgrade tests. After initially adding a watch, the tests try to get,
execute, deactivate and activate a watch. Watcher stats are tested as
well, and an own java based test has been added for restarting, as that
requires waiting for a state change. Watcher history is also checked.

Closes #31216
2018-08-03 09:41:29 +02:00
Yannick Welsch 0d60e8a029
Fix race between replica reset and primary promotion (#32442)
We've recently seen a number of test failures that tripped an assertion in IndexShard (see issues
linked below), leading to the discovery of a race between resetting a replica when it learns about a
higher term and when the same replica is promoted to primary. This commit fixes the race by
distinguishing between a cluster state primary term (called pendingPrimaryTerm) and a shard-level
operation term. The former is set during the cluster state update or when a replica learns about a
new primary. The latter is only incremented under the operation block, which can happen in a
delayed fashion. It also solves the issue where a replica that's still adjusting to the new term
receives a cluster state update that promotes it to primary, which can happen in the situation of
multiple nodes being shut down in short succession. In that case, the cluster state update thread
would call `asyncBlockOperations` in `updateShardState`, which in turn would throw an exception
as blocking permits is not allowed while an ongoing block is in place, subsequently failing the shard.
This commit therefore extends the IndexShardOperationPermits to allow it to queue multiple blocks
(which will all take precedence over operations acquiring permits). Finally, it also moves the primary
activation of the replication tracker under the operation block, so that the actual transition to
primary only happens under the operation block.

Relates to #32431, #32304 and #32118
2018-08-03 09:33:08 +02:00
Nhat Nguyen 6eeb628d6d Merge branch 'master' into ccr
* master:
  HLRC: Move commercial clients from XPackClient (#32596)
  Add cluster UUID to Cluster Stats API response (#32206)
  Security: move User to protocol project (#32367)
  [TEST] Test for shard failures, add debug to testProfileMatchesRegular
  Minor fix for javadoc (applicable for java 11). (#32573)
  Painless: Move Some Lookup Logic to PainlessLookup (#32565)
  TEST: Avoid merges in testSeqNoAndCheckpoints
  [Rollup] Remove builders from HistoGroupConfig (#32533)
  Mutes failing SQL string function tests due to #32589
  fixed elements in array of produced terms (#32519)
  INGEST: Enable default pipelines (#32286)
  Remove cluster state initial customs (#32501)
  Mutes LicensingDocumentationIT due to #32580
  [ML] Remove multiple_bucket_spans (#32496)
  [ML] Rename JobProvider to JobResultsProvider (#32551)
  Correct minor typo in explain.asciidoc for HLRC
  Build: Add elastic maven to repos used by BuildPlugin (#32549)
  Clarify the error message when a pipeline agg is used in the 'order' parameter. (#32522)
  Revert "[test] turn on host io cache for opensuse (#32053)"
  Enable packaging tests on suse boxes
  [ML] Improve error when no available field exists for rule scope (#32550)
  [ML] Improve error for functions with limited rule condition support (#32548)
  Painless: Clean Up PainlessField (#32525)
  Add @AwaitsFix for #32554
  Remove broken @link in Javadoc
  Scripting: Conditionally use java time api in scripting (#31441)
  [ML] Fix thread leak when waiting for job flush (#32196) (#32541)
  Add AwaitsFix to failing test - see #32546
  Core: Minor size reduction for AbstractComponent (#32509)
  SQL: Added support for string manipulating functions with more than one parameter (#32356)
  [DOCS] Reloadable Secure Settings (#31713)
  Watcher: Reenable HttpSecretsIntegrationTests#testWebhookAction test (#32456)
  [Rollup] Remove builders from TermsGroupConfig (#32507)
  Use hostname instead of IP with SPNEGO test (#32514)
  Switch x-pack rolling restart to new style Requests (#32339)
  NETWORKING: Fix Netty Leaks by upgrading to 4.1.28 (#32511)
  [DOCS] Small fixes in rule configuration page (#32516)
  Painless: Clean up PainlessMethod (#32476)
  Build: Remove shadowing from benchmarks (#32475)
  Docs: Add all JDKs to CONTRIBUTING.md
  Add licensing enforcement for FIPS mode (#32437)
  SQL: Add test for handling of partial results (#32474)
  Mute testFilterCacheStats
  [ML][DOCS] Fix typo applied_to => applies_to
  Scripting: Fix painless compiler loader to know about context classes (#32385)
2018-08-02 23:14:37 -04:00