Commit Graph

3553 Commits

Author SHA1 Message Date
Ryan Ernst 742213d710 Improve error message when index settings are not a map (#45588)
This commit adds an explicit error message when a create index request
contains a settings key that is not a json object. Prior to this change
the user would be given a ClassCastException with no explanation of what
went wrong.

closes #45126
2019-08-16 11:39:26 -07:00
Zachary Tong 50c65d05ba Move bucket reduction from Bucket to the InternalAgg (#45566)
The current idiom is to have the InternalAggregator find all the
buckets sharing the same key, put them in a list, get the first bucket
and ask that bucket to reduce all the buckets (including itself).

This a somewhat confusing workflow, and feels like the aggregator should
be reducing the buckets (since the aggregator owns the buckets), rather
than asking one bucket to do all the reductions.

This commit basically moves the `Bucket.reduce()` method to the
InternalAgg and renames it `reduceBucket()`.  It also moves the
`createBucket()` (or equivalent) method from the bucket to the
InternalAgg as well.
2019-08-16 13:59:00 -04:00
Andrey Ershov dbc90653dc transport.publish_address should contain CNAME (#45626)
This commit adds CNAME reporting for transport.publish_address same way
it's done for http.publish_address.

Relates #32806
Relates #39970

(cherry picked from commit e0a2558a4c3a6b6fbfc6cd17ed34a6f6ef7b15a9)
2019-08-16 17:42:00 +02:00
Armin Braun d6a9edea16
Lower Limit for Maximum Message Size in TcpTransport (#44496) (#45635)
* Since we're buffering network reads to the heap and then deserializing them it makes no sense to buffer a message that is 90% of the heap size since we couldn't deserialize it anyway
* I think `30%` is a more reasonable guess here given that we can reasonably assume that the deserialized message will be larger than the serialized message itself and processing it will take additional heap as well
2019-08-16 12:27:54 +02:00
Armin Braun a48242c371
Cleanup Redundant TransportLogger Instantiation (#43265) (#45629)
* This class' methods are all effectively `static` => make them `static` and stop instantiating it needlessly
2019-08-15 21:16:56 +02:00
Zachary Tong cd441f6906 Catch AllocatedTask registration failures (#45300)
When a persistent task attempts to register an allocated task locally,
this creates the Task object and starts tracking it locally.  If there
is a failure while initializing the task, this is handled by a catch
and subsequent error handling (canceling, unregistering, etc).

But if the task fails to be created because an exception is thrown
in the tasks ctor, this is uncaught and fails the cluster update
thread.  The ramification is that a persistent task remains in the
cluster state, but is unable to create the allocated task, and the
exception prevents other tasks "after" the poisoned task from starting
too.

Because the allocated task is never created, the cancellation tools
are not able to remove the persistent task and it is stuck as a
zombie in the CS.

This commit adds exception handling around the task creation,
and attempts to notify the master if there is a failure (so the
persistent task can be removed).  Even if this notification fails,
the exception handling means the rest of the uninitialized tasks
can proceed as normal.
2019-08-15 15:14:19 -04:00
Armin Braun de58353722
Lower Painless Static Memory Footprint (#45487) (#45619)
* Painless generates a ton of duplicate strings and empty `Hashmap` instances wrapped as unmodifiable
* This change brings down the static footprint of Painless on an idle node by 20MB (after running the PMC benchmark against said node)
   * Since we were looking into ways of optimizing for smaller node sizes I think this is a worthwhile optimization
2019-08-15 19:41:45 +02:00
Alpar Torok 03a1645bc6 Use dynamic port ranges for ExternalTestCluster (#45601)
Moves methods added in #44213 and uses them to configure the port range
for `ExternalTestCluster` too.
These were still using `9300-9400` ( teh default ) and running into
races.
2019-08-15 16:40:12 +03:00
Armin Braun 1beea3588b
Make BlobStoreRepository Validation Read master.dat (#45546) (#45578)
* Fixing this for two reasons:
   1. Why not verify that the seed we wrote is actually there when we can
   2. The AWS S3 SDK started to log a bunch of WARN messages about not fully reading the stream now that we started to abuse the read blob as an `exists` check after removing that method from the blob container
2019-08-15 07:07:52 +02:00
Nick Knize 647a8308c3
[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363)
* Introduce Spatial Plugin (#44389)

Introduce a skeleton Spatial plugin that holds new licensed features coming to 
Geo/Spatial land!

* [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923)

Refactor DeprecatedParameters specific to legacy geo_shape out of
AbstractGeometryFieldMapper.TypeParser#parse.

* [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980)

Add a new ShapeFieldMapper to the xpack spatial module for
indexing arbitrary cartesian geometries using a new field type called shape.
The indexing approach leverages lucene's new XYShape field type which is
backed by BKD in the same manner as LatLonShape but without the WGS84
latitude longitude restrictions. The new field mapper builds on and
extends the refactoring effort in AbstractGeometryFieldMapper and accepts
shapes in either GeoJSON or WKT format (both of which support non geospatial
geometries).

Tests are provided in the ShapeFieldMapperTest class in the same manner
as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests.
Documentation for how to use the new field type and what parameters are
accepted is included. The QueryBuilder for searching indexed shapes is
provided in a separate commit.

* [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108)

Add a new ShapeQueryBuilder to the xpack spatial module for
querying arbitrary Cartesian geometries indexed using the new shape field
type.

The query builder extends AbstractGeometryQueryBuilder and leverages the
ShapeQueryProcessor added in the previous field mapper commit.

Tests are provided in ShapeQueryTests in the same manner as
GeoShapeQueryTests and docs are updated to explain how the query works.
2019-08-14 16:35:10 -05:00
Armin Braun e0d84e7178
Clean up Callback Chains and Duplicate in SnapshotResiliencyTests (#45398) (#45563)
* It's in the title, follow up to #45233
* Flatten more listeners into `StepListener`
* Remove duplication from repo and index bootstrap and asserting that the steps execute successfully
2019-08-14 21:53:07 +02:00
Armin Braun 5f6bc6fc2d
Prevent Leaking Search Tasks on Exceptions in FetchSearchPhase and DfsQueryPhase (#45500) (#45540)
* If `counter.onResult` throws an exception we might leak a transport task because the failure is not handled as a phase failure (instead it bubbles up in the transport service eventually hitting the `onFailure` callback again and couting down the `counter` twice).

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
2019-08-14 14:49:38 +02:00
Armin Braun 00e4fba2fb
Simplify and Optimize RestController Slightly (#45419) (#45485)
* Simplify the path iterator to generate less garbage
* `dispatchRequest` always terminates, adjust code accordingly
2019-08-13 10:43:30 +02:00
Julie Tibshirani dc1856ca53 Make sure to validate the type before attempting to merge a new mapping. (#45157)
Currently, when adding a new mapping, we attempt to parse + merge it before
checking whether its top-level document type matches the existing type. So when
a user attempts to introduce a new mapping type, we may give a confusing error
message around merging instead of complaining that it's not possible to add
more than one type ("Rejecting mapping update to [my-index] as the final
mapping would have more than 1 type...").

This PR moves the type validation to the start of
`MetaDataMappingService#applyRequest` so that we make sure the type matches
before performing any mapper merging.

We already partially addressed this issue in #29316, but the tests there
focused on `MapperService` and did not catch this problem with end-to-end
mapping updates.

Addresses #43012.
2019-08-12 14:28:03 -07:00
Zachary Tong 4d97d2c50f Revert "Only execute one final reduction in InternalAutoDateHistogram (#45359)"
This reverts commit c0ea8a867e.
2019-08-12 17:17:17 -04:00
Julie Tibshirani 8c4394d5d7 Fix a bug where mappings are dropped from rollover requests. (#45411)
We accidentally introduced this bug when adding a typeless version of the
rollover request. The bug is not present if include_type_name is set to true.
2019-08-12 12:46:27 -07:00
Michael Basnight a521e4c86f Retrieve processors instead of checking existence (#45354)
The previous hasProcessors method would validate if a processor was
present within a pipeline, but would not return the contents of the
processors. This does not allow a consumer to inspect the processor for
specific metadata. The method now returns the list of processors based
on the class of the processor passed in.
2019-08-12 13:48:17 -05:00
Zachary Tong 472f6ef41a Mute InternalAutoDateHistogramTests#testReduceRandom() 2019-08-12 14:45:08 -04:00
Zachary Tong c0ea8a867e Only execute one final reduction in InternalAutoDateHistogram (#45359)
Because auto-date-histo can perform multiple reductions while
merging buckets, we need to ensure that the intermediate reductions
are done with a `finalReduce` set to false to prevent Pipeline aggs
from generating their output.

Once all the buckets have been merged and the output is stable,
a mostly-noop reduction can be performed which will allow pipelines
to generate their output.
2019-08-12 14:07:38 -04:00
Albert Zaharovits 2cb172f079
CreateIndex and PutIndexTemplate with typeless mapping (#45120)
This commit makes sure that mapping parameters to `CreateIndex` and
`PutIndexTemplate` are keyed by the type name. 

`IndexCreationTask` expects mappings to be keyed by the type name.
It asserts this for template mappings but not for the mappings in the request.
The `CreateIndexRequest` and `RestCreateIndexAction` mostly make it sure
that the mapping is keyed by a type name, but not always.
When building the create-index request outside of the REST handler, there are
a few methods to set the mapping for the request. Some of them add the type
name some of them do not.
For example, `CreateIndexRequest#mapping(String type, Map<String, ?> source)`
adds the type name, but
`CreateIndexRequest#mapping(String type, XContentBuilder source)` does not.
This PR asserts the type name in the request mapping inside `IndexCreationTask`
and makes all `CreateIndexRequest#mapping` methods add the type name.
2019-08-12 08:05:07 +03:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Nhat Nguyen cf9a73b5ac Call afterWriteOperation after trim translog in peer recovery (#45182)
testShouldFlushAfterPeerRecovery was added #28350 to make sure the
flushing loop triggered by afterWriteOperation eventually terminates.
This test relies on the fact that we call afterWriteOperation after
making changes in translog. In #44756, we roll a new generation in
RecoveryTarget#finalizeRecovery but do not call afterWriteOperation.

Relates #28350
Relates #45073
2019-08-10 22:59:02 -04:00
Nhat Nguyen 25c6102101 Trim local translog in peer recovery (#44756)
Today, if an operation-based peer recovery occurs, we won't trim
translog but leave it as is. Some unacknowledged operations existing in
translog of that replica might suddenly reappear when it gets promoted.
With this change, we ensure trimming translog above the starting
sequence number of phase 2. This change can allow us to read translog
forward.
2019-08-10 22:59:02 -04:00
Armin Braun 1cd464d675
Isolate Request in Call-Chain for REST Request Handling (#45130) (#45417)
* Follow up to #44949
* Stop using a special code path for multi-line JSON and instead handle its detection like that of other XContent types when creating the request
* Only leave a single path that holds a reference to the full REST request
   * In the next step we can move the copying of request content to happen before the actual request handling and make it conditional on the handler in question to stop copying bulk requests as suggested in #44564
2019-08-10 10:21:01 +02:00
Armin Braun d1ed9bdbfd
Use StepListener to Simplify SnapshotResiliencyTests (#45233) (#45386)
* Reduces complicated callback relations in `testSuccessfulSnapshotAndRestore` to flat steps of sequential actions
* Will refactor the other tests in this suit as a follow up
   * This format certainly makes it easier to create more complicated tests that involve multiple subsequent snapshots as it would allow adding loops
2019-08-09 18:19:48 +02:00
Yannick Welsch 9e6d874a41
Show BWC version in ClusterFormationFailureHelper (#45352)
When having a cluster state from 6.x, display the metadata version as the cluster state version.
Avoids confusion where a cluster state from 6.x is displayed as version 0 even if has some actual
content.
2019-08-09 16:23:38 +02:00
Yannick Welsch 5ddeb488a6 Allow _update on write alias (#45318)
Using the document update API on aliases with a write index does not work.

Follow-up to #31520
2019-08-09 11:44:24 +02:00
Tal Levy 2a99eaa7c2 Revert "removes the CellIdSource abstraction from geo-grid aggs (#45307) (#45353)"
This reverts commit 7b0a8040de.
2019-08-08 17:40:03 -07:00
Armin Braun 12ed6dc999
Only retain reasonable history for peer recoveries (#45208) (#45355)
Today if a shard is not fully allocated we maintain a retention lease for a
lost peer for up to 12 hours, retaining all operations that occur in that time
period so that we can recover this replica using an operations-based recovery
if it returns. However it is not always reasonable to perform an
operations-based recovery on such a replica: if the replica is a very long way
behind the rest of the replication group then it can be much quicker to perform
a file-based recovery instead.

This commit introduces a notion of "reasonable" recoveries. If an
operations-based recovery would involve copying only a small number of
operations, but the index is large, then an operations-based recovery is
reasonable; on the other hand if there are many operations to copy across and
the index itself is relatively small then it makes more sense to perform a
file-based recovery. We measure the size of the index by computing its number
of documents (including deleted documents) in all segments belonging to the
current safe commit, and compare this to the number of operations a lease is
retaining below the local checkpoint of the safe commit. We consider an
operations-based recovery to be reasonable iff it would involve replaying at
most 10% of the documents in the index.

The mechanism for this feature is to expire peer-recovery retention leases
early if they are retaining so much history that an operations-based recovery
using that lease would be unreasonable.

Relates #41536
2019-08-09 01:56:32 +02:00
Tal Levy 7b0a8040de
removes the CellIdSource abstraction from geo-grid aggs (#45307) (#45353)
CellIdSource is a helper ValuesSource that encodes GeoPoint
into a long-encoded representation of the grid bucket the point
is associated with. This complicates thing as usage evolves to
support shapes that are associated with more than one bucket ordinal.
2019-08-08 16:33:16 -07:00
Armin Braun b19de55095
Add missing wait to testAutomaticReleaseOfIndexBlock (#45342) (#45351)
Today the test waits for one of the shards to be blocked, but this does not
mean that the block has been applied on all nodes, so a subsequent indexing
operation may still go through.

Fixes #45338
2019-08-08 22:39:22 +02:00
Henning Andersen d139896b66
Reindex share retry between hit sources (#44203) (#45348)
The client and remote hit sources had each their own retry mechanism,
which would do the same. Supporting resiliency we would have to expand
on the retry mechanisms and as a preparation for that, the retry
mechanism is now shared such that each sub class is only responsible for
sending requests and converting responses/failures to common format.

Part of #42612
2019-08-08 22:01:29 +02:00
Christoph Büscher a552b33276 Fix occasional SuggestSearchIT failure (#45330)
Refreshes happening during indexing can result differen segment counts and
slightly skewed term statistics, which in turn has the potential to change
suggestion output slightly. In order to prevent this, disable refresh for the
affected tests.

Closes #43261
2019-08-08 21:06:32 +02:00
Dimitris Athanasiou e53bb050db Mute testAutomaticReleaseOfIndexBlock
Relates #45338
2019-08-08 17:56:41 +03:00
Andrey Ershov 07c656fba9 Mute testCustomDataPaths on Windows
See #45333

(cherry picked from commit 671e1ad1068aee4b593ad0c8ab13ff60b4f125b8)
2019-08-08 16:26:56 +02:00
Zachary Tong 86d6597890 Use newIndexSearcher() instead of newSearcher() (#45248)
`newSearcher()` from lucene can randomly choose index readers which
are not compatible with our tests, like ParallelCompositeReader.
The `newIndexSearcher()` method on AggregatorTestCase is a wrapper
similar to newSearcher but compatible with our tests
2019-08-08 09:34:38 -04:00
Martijn van Groningen e066133016
Change the ingest simulate api to not include dropped documents (#44161)
If documents are dropped by the `drop` processor then
these documents are returned as a `null` value in the response.

=== Example

Create pipeline:

```
PUT _ingest/pipeline/droppipeline
{
    "processors": [
        {
            "set": {
                "field": "bla",
                "value": "val"
            }
        },
        {
            "drop": {}
        }
    ]
}
```

Simulate request:

POST _ingest/pipeline/droppipeline/_simulate
{
    "docs": [
        {
            "_source": {
                "message": "text"
            }
        }
    ]
}

Response:

```
{
    "docs": [
        null
    ]
}
```

Response if verbose is enabled:

```
{
    "docs": [
        {
            "processor_results": [
                {
                    "doc": {
                        "_index": "_index",
                        "_type": "_doc",
                        "_id": "_id",
                        "_source": {
                            "message": "text",
                            "bla": "val"
                        },
                        "_ingest": {
                            "timestamp": "2019-07-10T11:07:10.758315Z"
                        }
                    }
                },
                null
            ]
        }
    ]
}
```

Closes #36150

* Abort pipeline simulation in verbose mode when document has been dropped
by drop processor
2019-08-08 13:04:33 +02:00
Martijn van Groningen fb959d188c
Backport: Add description to force-merge tasks (#41365) (#45191)
* Add description to force-merge tasks (#41365)

This is static information that is part of the force merge request.

Relates to #15975
2019-08-08 08:15:09 +02:00
Michael Basnight 89861d0884 Add ingest processor existence helper method (#45156)
This commit adds a helper method to the ingest service allowing it to
inspect a pipeline by id and verify the existence of a processor in the
pipeline. This work exposed a potential bug in that some processors
contain inner processors that are passed in at instantiation. These
processors needed a common way to expose their inner processors, so the
WrappingProcessor was created in order to expose the inner processor.
2019-08-07 11:19:04 -05:00
Bukhtawar cd304c4def Auto-release flood-stage write block (#42559)
If a node exceeds the flood-stage disk watermark then we add a block to all of
its indices to prevent further writes as a last-ditch attempt to prevent the
node completely exhausting its disk space. However today this block remains in
place until manually removed, and this block is a source of confusion for users
who current have ample disk space and did not even realise they nearly ran out
at some point in the past.

This commit changes our behaviour to automatically remove this block when a
node drops below the high watermark again. The expectation is that the high
watermark is some distance below the flood-stage watermark and therefore the
disk space problem is truly resolved.

Fixes #39334
2019-08-07 11:03:53 +01:00
Tanguy Leroux a869342910 Restore DefaultShardOperationFailedException's reason after deserialization (#45203)
The reason field of DefaultShardOperationFailedException is lost during serialization. 
This is sad because this field is checked for nullity during xcontent generation and it 
means that the cause won't be included in the generated xcontent and won't be 
printed in two REST API responses (Close Index API and Indices Shard Stores API).

This commit simply restores the reason from the cause during deserialization.
2019-08-07 10:37:15 +02:00
Jason Tedor bd59ee6c72
Fix clock used in update requests (#45262)
We accidentally switched to using the relative time provider here. This
commit fixes this by switching to the appropriate absolute clock.
2019-08-06 21:15:21 -04:00
David Turner f5d1381e01 Remove always-true param from IndicesService#stats (#45231)
Parameter `includePrevious` is always true, so this commit inlines it.
2019-08-06 17:22:11 +01:00
David Turner 355713b9ca
Improve slow logging in MasterService (#45241)
Adds a tighter threshold for logging a warning about slowness in the
`MasterService` instead of relying on the cluster service's 30-second warning
threshold. This new threshold applies to the computation of the cluster state
update in isolation, so we get a warning if computing a new cluster state
update takes longer than 10 seconds even if it is subsequently applied quickly.
It also applies independently to the length of time it takes to notify the
cluster state tasks on completion of publication, in case any of these
notifications holds up the master thread for too long.

Relates #45007
Backport of #45086
2019-08-06 17:01:49 +01:00
Tanguy Leroux 772ce1f599
Add deprecation warning for Force Merge API (#44903)
This commit adds a deprecation warning in 7.x for the Force Merge API 
when both only_expunge_deletes and max_num_segments are set in a request.

Relates #44761
2019-08-06 16:04:24 +02:00
Jason Tedor 5b1b146099
Normalize environment paths (#45179)
This commit applies a normalization process to environment paths, both
in how they are stored internally, also their settings values. This
normalization is done via two means:
 - we make the paths absolute
 - we remove redundant name elements from the path (what Java calls
   "normalization")

This change ensures that when we compare and refer to these paths within
the system, we are using a common ground. For example, prior to the
change if the data path was relative, we would not compare it correctly
to paths from disk usage. This is because the paths in disk usage were
being made absolute.
2019-08-06 06:04:30 -04:00
Yannick Welsch 7aeb2fe73c Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-06 10:45:44 +02:00
Igor Motov b5f88120b5 Geo: add Geometry-based query builders to QueryBuilders (#45058)
Add Geometry-based method for creation of query builders in
QueryBuilder

Relates to #44715
2019-08-05 13:34:48 -04:00
Zachary Tong 3df1c76f9b Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179)
This adjusts the `buckets_path` parser so that pipeline aggs can
select specific buckets (via their bucket keys) instead of fetching
the entire set of buckets.  This is useful for bucket_script in
particular, which might want specific buckets for calculations.

It's possible to workaround this with `filter` aggs, but the workaround
is hacky and probably less performant.

- Adjusts documentation
- Adds a barebones AggregatorTestCase for bucket_script
- Tweaks AggTestCase to use getMockScriptService() for reductions and
pipelines.  Previously pipelines could just pass in a script service
for testing, but this didnt work for regular aggs.  The new
getMockScriptService() method fixes that issue, but needs to be used
for pipelines too.  This had a knock-on effect of touching MovFn,
AvgBucket and ScriptedMetric
2019-08-05 12:18:40 -04:00
Zachary Tong e5079ac288
[7.x backport] Add more flexibility to MovingFunction window alignment (#45159)
Introduce shift field to MovingFunction aggregation.

By default, shift = 0. Behavior, in this case, is the same as before.
Increasing shift by 1 moves starting window position by 1 to the right.

    To simply include current bucket to the window, use shift = 1
    For center alignment (n/2 values before and after the current bucket), use shift = window / 2
    For right alignment (n values after the current bucket), use shift = window.
2019-08-05 11:56:52 -04:00
Nhat Nguyen 56083ba1ff Remove assertion after locally recover replica (#45181)
If the disk becomes broken after we have locally recovered shard up to
the global checkpoint, then the assertion won't hold.
2019-08-05 10:48:02 -04:00
David Turner 13a167051f
Remove fileBasedRecovery flag (#45146)
Today `RecoveryTarget#prepareForTranslogOperations` takes a boolean flag
indicating whether the recovery is file-based or not. This was used in 6.x to
bootstrap some commit data that were missing in indices created in 5.x:

b506955f8d/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java (L298-L300)

This flag no longer has any effect, so this commit removes it.

Backport of #45131 to 7.x.
2019-08-05 08:17:40 +01:00
Armin Braun 41815ed614
Optimize StreamInput#readString (#44930) (#45180)
* Resolve TODO in `readString` by moving to reading chunks of `byte[]` instead of going byte by byte
* Motivated by `readString` showing up as a significant user of CPU time on the IO thread in Rally PMC benchmark
* Benchmarking this:
  * Could not reproduce a slowdown in the potential worst case (one or two non-ascii chars) since in this case the cost of creating the string itself exceeds the read times anyway
  * Speedup for 50%+ for reading 200 char ascii strings from `ByteBuf` or pages bytes backed streams
  * Longer strings obviously get bigger speedups
  * More ascii chars -> more speedup
2019-08-05 07:22:42 +02:00
Jason Tedor d78ecd9c09
Use the full hash in build info (#45163)
This commit switches to using the full hash to build into the JAR
manifest, which is used in node startup and the REST main action to
display the build hash.
2019-08-03 11:27:53 -04:00
Tim Brooks 984ba82251
Move nio channel initialization to event loop (#45155)
Currently in the transport-nio work we connect and bind channels on the
a thread before the channel is registered with a selector. Additionally,
it is at this point that we set all the socket options. This commit
moves these operations onto the event-loop after the channel has been
registered with a selector. It attempts to set the socket options for a
non-server channel at registration time. If that fails, it will attempt
to set the options after the channel is connected. This should fix
#41071.
2019-08-02 17:31:31 -04:00
Zachary Tong ffbe047c32 Revert "Add more flexibility to MovingFunction window alignment (#44360)"
This reverts commit 1a58a487f0.
2019-08-02 15:16:04 -04:00
Nikita Glashenko 1a58a487f0 Add more flexibility to MovingFunction window alignment (#44360)
Introduce shift field to MovingFunction aggregation.

By default, shift = 0. Behavior, in this case, is the same as before.
Increasing shift by 1 moves starting window position by 1 to the right.

    To simply include current bucket to the window, use shift = 1
    For center alignment (n/2 values before and after the current bucket), use shift = window / 2
    For right alignment (n values after the current bucket), use shift = window.
2019-08-02 15:10:21 -04:00
David Turner 9ff320d967
Use index for peer recovery instead of translog (#45137)
Today we recover a replica by copying operations from the primary's translog.
However we also retain some historical operations in the index itself, as long
as soft-deletes are enabled. This commit adjusts peer recovery to use the
operations in the index for recovery rather than those in the translog, and
ensures that the replication group retains enough history for use in peer
recovery by means of retention leases.

Reverts #38904 and #42211
Relates #41536
Backport of #45136 to 7.x.
2019-08-02 15:00:43 +01:00
Armin Braun 9450505d5b
Stop Passing Around REST Request in Multiple Spots (#44949) (#45109)
* Stop Passing Around REST Request in Multiple Spots

* Motivated by #44564
  * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap.
  * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.
2019-08-02 07:31:38 +02:00
Jim Ferenczi 3f94e2ea43 Sparse role queries can throw an NPE (#45053)
Sparse role queries are executed differently than other queries in order
to account for the fact that most of the documents are filtered from search.
However this special execution does not set the scorer for the query so any
collector that needs to access the score of a document fails with an NPE.
This change fixed this bug by setting the scorer before collecting any hits
when intersecting the main query and the sparse role.
2019-08-01 20:21:53 +02:00
William Brafford 5f50da947a
Fix bug in the Settings#processSetting method (#45095)
The Settings#processSetting method is intended to take a setting map and add a
setting to it, adjusting the keys as it goes in case of "conflicts" where the
new setting implies an object where there is currently a string, or vice
versa. processSetting was failing in two cases: adding a setting two levels
under a string, and adding a setting two levels under a string and four levels
under a map. This commit fixes the bug and adds test coverage for the
previously faulty edge cases.

* fix issue #43791 about settings
* add unit test in testProcessSetting()
2019-08-01 13:27:08 -04:00
Yannick Welsch 917510d3e4 Always use primary term of operation in InternalEngine (#45083)
We keep adding the current primary term to operations for which we do not assign a sequence
number. This does not make sense anymore as all operations which we care about have
sequence numbers now. The goal of this commit is to clean things up in InternalEngine and
reduce the complexity.
2019-08-01 17:30:00 +02:00
Armin Braun 48dc53f8d2
Make PathTrieIterator a Little more Memory Efficient (#44951) (#45070)
* There's no need to have the trie iterator hold another reference to the request object (which could be huge, see #44564)
* Also removed unused boolean field from trie node
2019-08-01 17:26:08 +02:00
Nhat Nguyen 3a487379c3 Tighten no pending scheduled refresh check (#45025)
Previously, we use ThreadPoolStats to ensure that the scheduledRefresh
triggered by the internal refresh setting update is executed before we
index a new document. With that change (#40387), this test did not fail for 
the last 3 months. However, using ThreadPoolStats is not entirely watertight
as both "active" and "queue" count can be 0 in a very small interval
when ThreadPoolExecutor pulls a task from the queue but before marking
the corresponding worker as active (i.e., lock it).

Closes #39565
2019-08-01 09:06:22 -04:00
David Turner c088bafbbc Wait for events in waitForRelocation (#45074)
Adds a `waitForEvents(Priority.LANGUID)` to the cluster health request in
`ESIntegTestCase#waitForRelocation()` to deal with the case that this health
request returns successfully despite the fact that there is a pending reroute task which
will relocate another shard.

Relates #44433
Fixes #45003
2019-08-01 13:47:39 +01:00
David Turner 532ade7816 More logging for slow cluster state application (#45007)
Today the lag detector may remove nodes from the cluster if they fail to apply
a cluster state within a reasonable timeframe, but it is rather unclear from
the default logging that this has occurred and there is very little extra
information beyond the fact that the removed node was lagging. Moreover the
only forewarning that the lag detector might be invoked is a message indicating
that cluster state publication took unreasonably long, which does not contain
enough information to investigate the problem further.

This commit adds a good deal more detail to make the issues of slow nodes more
prominent:

- after 10 seconds (by default) we log an INFO message indicating that a
  publication is still waiting for responses from some nodes, including the
  identities of the problematic nodes.

- when the publication times out after 30 seconds (by default) we log a WARN
  message identifying the nodes that are still pending.

- the lag detector logs a more detailed warning when a fatally-lagging node is
  detected.

- if applying a cluster state takes too long then the cluster applier service
  logs a breakdown of all the tasks it ran as part of that process.
2019-08-01 13:20:46 +01:00
Hendrik Muhs b3be8f75f0 Fix version logic after 7.3 release (BWC) (#45077)
removes unreleased version 7.2.2 after release of 7.3.0 as it breaks the version verifier, add documentation that explains the logic
2019-08-01 12:43:23 +02:00
Christoph Büscher a669efd2a4
Remove left-over AwaitsFix in RateClusterStateIT (#45043)
Issues are closed and fixes in #42580 and #42430 seem to be merged to 7.x at
least.
2019-08-01 12:03:29 +02:00
Tim Brooks aff66e3ac5
Add Cors integration tests (#44361)
This commit adds integration tests to ensure that the basic cors
functionality works for the netty and nio transports.
2019-07-31 14:24:23 -06:00
Armin Braun 8d63bd1d1e
Cleanup Various Action- Listener and Runnable Usages (#42273) (#45052)
* Dry up code for creating simple `ActionRunnable` a little
* Shorten some other code around `ActionListener` usage, in particular
when wrapping it in a `TransportResponseListener`
2019-07-31 18:55:31 +02:00
Armin Braun ee663dc9ac
Reenable Parallel Restore Test on Windows (#45037) (#45050)
* As a result of #44096 this test shouldn't fail anymore on `master` and `7.4`+ so we should reenable it there
  * For older versions we won't backport that change so the tests should stay disabled there
* Closes #44671
2019-07-31 18:35:34 +02:00
Christoph Büscher 35291ae175
Remove muted AckIT and AckClusterUpdateSettingsIT (#45044)
Reading up on #33673 it looks like parts of these tests have been reworked and
there is no intention to fix the remains on 7.x, so I think we can remove the
entire test.
2019-07-31 17:17:21 +02:00
Luca Cavanna 8cc3c0dd93 Remove task null check in TransportAction (#45014)
The task that TaskManager#register returns cannot be null. The method
enforces that it is not null after calling request#createTask. It is
then needless to check for null in the listener later. Also, added the
call to the delegate listener in a finally block, just to make sure.
2019-07-31 17:16:41 +02:00
Christoph Büscher e85b53a955
Remove left-over AwaitsFix in DedicatedClusterSnapshotRestoreIT (#45042)
The issue mentioned (#38845) seems to have been closed with #38891 so the test
can be re-activated.
2019-07-31 17:15:41 +02:00
Armin Braun c7d7230524
Stop Recreating Wrapped Handlers in RestController (#44964) (#45040)
* We shouldn't be recreating wrapped REST handlers over and over for every request. We only use this hook in x-pack and the wrapper there does not have any per request state.
  This is inefficient and could lead to some very unexpected memory behavior
   => I made the logic create the wrapper on handler registration and adjusted the x-pack wrapper implementation to correctly forward the circuit breaker and content stream flags
2019-07-31 17:11:34 +02:00
Zachary Tong c25f3dd5d0
Introduce 7.3.1 version (#45046) 2019-07-31 10:53:55 -04:00
Andrey Ershov c27ac3d24c Unmute testClusterJoinDespiteOfPublishingIssues and testElectMasterWithLatestVersion (#38555)
See my comments for #37539 and #37685

(cherry picked from commit 038d4ab2940340eca942e32b54044f183b7804d9)
2019-07-31 14:55:02 +02:00
David Roberts 5e3010a606 Use system context for looking up connected nodes (#43991)
When finding nodes in a connected cluster for cross cluster
search the requests to get cluster state on the connected
cluster should be made in the system context because
logically they are equivalent to checking a single detail
in the local cluster state and should not require that the
user who made the request that is using this method in its
implementation is authorized to view the entire cluster
state.

Fixes #43974
2019-07-31 09:09:56 +01:00
Igor Motov 1a1bb4707d Geo: move indexShape to AbstractGeometryFieldMapper.Indexer (#44979)
Move indexShape functionality into AbstractGeometryFieldMapper to make
it more unit testable.

Relates to #43644
2019-07-30 14:50:23 -04:00
Mayya Sharipova a154b73d99 Assure index ops are successful for SimpleNestedIT (#44815)
relates to #44486
2019-07-30 14:24:28 -04:00
Nhat Nguyen 979d0a71c7 Remove leniency during replay translog in peer recovery (#44989)
This change removes leniency in InternalEngine during replaying translog
in peer recovery.
2019-07-30 13:25:15 -04:00
Jake Landis 41a99c9e4a introduce 7.2.2 as a version (#44371)
* introduce 7.2.2 as a version
2019-07-30 18:52:34 +02:00
Jake Landis 03fea1c503 introduce 6.8.3 as a version (#44708) 2019-07-30 18:48:41 +02:00
David Kyle 78aa6143a6 Mute FilteringAllocationIT testTransientSettingsStillApplied
Relates to https://github.com/elastic/elasticsearch/issues/45003
2019-07-30 14:10:50 +01:00
Yannick Welsch c1b569ed4b Revert "Mute Zen1IT#testMixedClusterDisruption"
This reverts commit cf78ca58e3.
2019-07-30 13:10:14 +02:00
David Turner 55f1dd8da6 Close nodes properly in Coordinator tests (#44967)
Today closing a `ClusterNode` in an `AbstractCoordinatorTestCase` uses
`onNode()` so has no effect if the node is not in the current list of nodes.
It also discards the `Runnable` it creates without having run it, so has no
effect anyway.

This commit makes these tests much stricter about properly closing the nodes
started during `Coordinator` tests, by tracking the persisted states that are
opened, and adds an assertion to catch the trappy requirement that the closing
node still belongs to the cluster.
2019-07-30 11:47:36 +01:00
David Kyle cf78ca58e3 Mute Zen1IT#testMixedClusterDisruption 2019-07-30 11:33:39 +01:00
Jim Ferenczi 43bd8f2ba0 Fix aggregators early termination with breadth-first mode (#44963)
This commit fixes a bug when a deferred aggregator tries to early terminate the collection. In such case the CollectionTerminatedException is not caught and
the search fails on the shard. This change makes sure that we catch the exception in order to continue the deferred collection on the next leaf.

Fixes #44909
2019-07-30 11:26:40 +02:00
Andrey Ershov 5a0bd696fc
Snapshot tool S3 cleanup 7.x backport (#44575)
Backport of #44551
2019-07-30 11:02:08 +02:00
Nhat Nguyen 4813728783 Remove leniency in reset engine from translog (#44711)
Replaying operations from the local translog must never fail as those
operations were processed successfully on the primary before and the
mapping is up to update already. This change removes leniency during
resetting engine from translog in IndexShard and InternalEngine.
2019-07-29 16:31:45 -04:00
Jack Conradson 1a21682ed0 Fix JodaCompatibleZonedDateTime casts in Painless (#44874)
This is a temporary fix during the Joda to Java datetime transition. This will 
implicitly cast a JodaCompatibleZonedDateTime to a ZonedDateTime for 
both def and static types. This is necessary to insulate users from needing 
to know about JodaCompatibleZonedDateTime explicitly.
2019-07-29 12:05:26 -07:00
Igor Motov b6cef227a5 Geo: fix geo query decomposition (#44924)
The recent refactoring introduced an issue where queries where not
going through the decomposition processing.

Fixes #44891
2019-07-29 11:48:24 -04:00
Luca Cavanna a3cc32da64 TaskListener#onFailure to accept Exception instead of Throwable (#44946)
TaskListener accepts today Throwable in its onFailure method. Though
looking at where it is called (TransportAction), it can never be
notified of a Throwable.

This commit changes the signature of TaskListener#onFailure so that it
accepts an `Exception` rather than a `Throwable` as second argument.
2019-07-29 16:47:19 +02:00
Michał Perlak 245c9b7914 Optimize Min and Max BKD optimizations (#44315)
MinAggregator - skip BKD optimization when no result found after 1024 lookups.
MaxAggregator - skip unnecessary conversions.
2019-07-29 10:04:39 -04:00
Yannick Welsch 24873dd3e3 Do not block transport thread on startup (#44939)
We currently block the transport thread on startup, which has caused test failures. I think this is
some kind of deadlock situation. I don't think we should even block a transport thread, and
there's also no need to do so. We can just reject requests as long we're not fully set up. Note
that the HTTP layer is only started much later (after we've completed full start up of the
transport layer), so that one should be completely unaffected by this.

Closes #41745
2019-07-29 11:35:17 +02:00
Armin Braun f5efafd4d6
Cleanup Deadcode o.e.indices (#44931) (#44938)
* none of this is used anywhere
2019-07-29 10:38:35 +02:00
Igor Motov cfc8d17bb4 Geo: refactor geo mapper and query builder (#44884)
Refactors out the indexing and query generation logic out of the
mapper and query builder into a separate unit-testable classes.
2019-07-26 16:48:31 -04:00
Yannick Welsch 1561ab5420 Guard open connection call in RemoteClusterConnection (#44921)
Fixes an issue where a call to openConnection was not properly guarded, allowing an exception
to bubble up to the uncaught exception handler, causing test failures.

Closes #44912
2019-07-26 22:27:45 +02:00
Tanguy Leroux e1b626b947 Ensure index is green in SimpleClusterStateIT.testIndicesOptions() (#44893)
SimpleClusterStateIT testIndicesOptions failed in #44817 because it tries to close 
an index at the beginning of the test. With random index settings, it is possible that 
the index has a high number of shards (10) and replicas (1), which means that on 
CI this index can take time to be fully allocated.

The close index request can fail in the case where replicas are still recovering operations. 
Thiscommit adds a simple ensureGreen() at the beginning of the test to be sure that all 
replicas are started before trying to close the index.

closes #44817
2019-07-26 17:07:53 +02:00
Armin Braun 1340ff19bc
Fix Test Failure in ScalingThreadPoolTests (#44898) (#44901)
* Due to #44894 some constellations log a deprecation warning here now
* Fixed by checking for that
2019-07-26 17:05:50 +02:00