Commit Graph

4969 Commits

Author SHA1 Message Date
Nhat Nguyen 62763b177d Implement toString for BulkByScrollTask (#59042)
We should implement "toString" of BulkByScrollTask.StatusOrException 
to have a meaningful log message when a reindex task completes.
2020-07-05 22:06:56 -04:00
Armin Braun 071d8b2c1c
Deduplicate Empty InternalAggregations (#58386) (#59032)
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
2020-07-04 14:02:16 +02:00
Dan Hermann 7c43cbca82
[7.x] Ignore matching data streams if include_data_streams is false (#59028) 2020-07-03 14:51:32 -05:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
Armin Braun d22dd437f1
Fix Two Common Zero Len Array Instantiations (#58944) (#58993)
Two spots I found in which we commonly instatiate a non-trivial number of zero length arrays.
2020-07-03 09:18:14 +02:00
Nhat Nguyen 65645217bc Handle IOException while checking translog corruption
We can hit an IOException while reading a translog header after corrupting it.

Relates #58866
2020-07-02 22:38:05 -04:00
Tim Brooks dc9e364ff2
Count coordinating and primary bytes as write bytes (#58984)
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
2020-07-02 19:48:19 -06:00
Mark Vieira 8fca312a3a
Mute WriteMemoryLimitsIT.testWriteBytesAreIncremented 2020-07-02 16:58:23 -07:00
Tim Brooks 9d1bf383d0
Add test assertions to ensure write bytes released (#58970)
This is a follow-up to #57573. This commit ensures that the bytes marked
in WriteMemoryLimits are released by any test using an internal test
cluster.
2020-07-02 17:38:23 -06:00
Tim Brooks 1ef2cd7f1a
Add memory tracking to queued write operations (#58957)
Currently we do not track the memory consuming by in-process write
operations.

This commit adds a mechanism to track write operation memory usage.
2020-07-02 14:14:57 -06:00
Jim Ferenczi a4e08acdd1 Fix exists query on unmapped field in query_string (#58804)
Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped.
This change also introduced a bug in the `query_string` query, using an unmapped field
like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the
exception if the query is built outside of an `ExistsQueryBuilder`.

Closes #58737
2020-07-02 21:52:03 +02:00
Nhat Nguyen be804b765d
Avoid flipping translog header version (#58866)
An old translog header does not have a checksum. If we flip the header 
version of an empty translog to the older version, then we won't detect
that corruption, and translog will be considered clean as before.

Closes #58671
2020-07-02 14:34:19 -04:00
Tal Levy d516959774
Re-enable support for array-valued geo_shape fields. (#58786) (#58943)
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
2020-07-02 11:21:55 -07:00
Ryan Ernst d825d4352c
Eagerly compile condition script at processor creation (#58882)
Ingest script processors were changed to eagerly compile their scripts
when the ingest pipeline is saved, but conditional scripts were missed.
This commit adds eager compilation to ingest conditional scripts, which
will help surface errors before runtime, as well as adds tests for each
case we might encounter between inline and stored script compilation
failures.

closes #58864
2020-07-02 11:10:20 -07:00
Lee Hinman e32623ef52
[7.x] Add test for component templates updated after cluster restart (#58883) (#58914)
This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the
future.
2020-07-02 08:23:34 -06:00
Armin Braun 62152852dc
Cleanup Duplication in Snapshot ITs (#58818) (#58915)
Just a few obvious static cleanups of duplication to push back against the ever increasing complexity of these tests.
2020-07-02 16:00:01 +02:00
Alan Woodward 0cd1dc3143 Percolator keyword fields should not store norms (#58899)
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
2020-07-02 13:59:28 +01:00
Nik Everett 5e49ee800e
Drop rewriting in date_histogram (backport of #57836) (#58875)
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after #56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
2020-07-01 17:19:12 -04:00
Dan Hermann 98a62a6b2d
Make DataStream instances explicitly immutable (#58688) (#58839) 2020-07-01 11:14:01 -05:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Andrei Dan f7dc09340b
Prohibit custom _routing for index requests targetting a data stream (#58749) (#58831)
This prohibits the use of a custom _routing when the index/bulk requests are targetting a data stream.
Using a custom _routing when targetting a backing index is still permitted.

Relates to #53100

(cherry picked from commit ece6b7a318a8bd3a010499189f31fc5e3a012d4f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-01 14:54:18 +01:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Przemyslaw Gomulka 2c275913b9
[7.x] Week based parsing for ingest date processor (#58597) (#58802)
Date processor was incorrectly parsing week based dates because when a
weekbased year was provided ingest module was thinking year was not
on a date and was trying to applying the logic for dd/MM type of
dates.
Date Processor is also allowing users to specify locale parameter. It
should be taken into account when parsing dates - currently only used
for formatting. If someone specifies 'en-us' locale, then calendar data
rules for that locale should be used.
The exception is iso8601 format. If someone is using that format,
then locale should not override calendar data rules.
closes #58479
2020-07-01 15:15:56 +02:00
David Turner 822b7421ce Forbid read-only-allow-delete block in blocks API (#58727)
The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.
2020-07-01 13:18:26 +01:00
Martijn van Groningen a0df96befb
Add data stream support to put mapping and update index settings APIs. (#58758)
Backport of #58231 to 7.x branch.

Change update index setting and put mapping api
to execute on all backing indices if data stream is targeted.

Relates #53100
2020-07-01 13:32:21 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
Dan Hermann 1c2a726731
Data stream support for search shards API (#58486) (#58765) 2020-06-30 17:59:51 -05:00
Nik Everett 40850a780d
Fail variable_width_histogram that collects from many (#58619) (#58780)
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035
2020-06-30 18:26:45 -04:00
Dan Hermann cae49b0fd7
[7.x] Add data stream support to open index API (#58767) 2020-06-30 14:30:32 -05:00
Dan Hermann a84ff81743
Data stream support for get field mappings API (#58488) (#58766) 2020-06-30 13:45:04 -05:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
Boice Huang 8c93f4e154 Sort document by internal doc id in FetchPhase to better use LRU cache (#57273)
This change sorts the docIdsToLoad once instead of in each sub-phase.
2020-06-30 17:06:09 +02:00
Julie Tibshirani ab65a57d70
Merge mappings for composable index templates (#58709)
This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).

Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.

Relates to #53101.
2020-06-30 08:01:37 -07:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Armin Braun b52a764143
Fix NPE in SnapshotService CS Application (#58680) (#58735)
In the unlikely corner case of deleting a relocation (hence `WAITING`) primary shard's
index during a partial snapshot, we would throw an NPE when checking if there's any external
changes to process.
2020-06-30 15:20:49 +02:00
Yannick Welsch b885cbff1a
Add index block api (#58716)
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.

This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
2020-06-30 14:06:52 +02:00
Patrick Jiang(白泽) be20aacec3 Add `matchBoolPrefix` method to QueryBuilders (#58637) 2020-06-29 16:30:40 +02:00
Armin Braun 95d85f29f8
Fix Snapshots Capturing Incomplete Datastreams (#58630) (#58656)
Only snapshot datastreams that are recorded in `SnapshotInfo` and clean those
that aren't from the snapshotted metadata.
Do not restore all datastreams by default when restoring global metadata, use the same
mechanics used for indices here.

Closes #58544
2020-06-29 12:51:40 +02:00
Armin Braun 4f2f257b12
Fix DataStream Handling on Restore of Global Metadata (#58631) (#58649)
When restoring a global metadata snapshot we were overwriting the correctly
adjusted data streams in the metadata when looping over all custom values.

Closes #58496
2020-06-29 10:58:41 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00
Ryan Ernst 08e75abd4e
Always add Java-9 style file permissions (#46050) (#58628)
Java 9 removed pathname canonicalization, which means that we need to
add permissions for the path and also the real path when adding file
permissions. Since master requires a minimum runtime of JDK 11, we no
longer need conditional logic here to apply this pathname
canonicalization with our bares hands. This commit removes that
conditional pathname canonicalization.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-06-26 18:19:07 -07:00
Nik Everett 67e9d39932
Remove useless aggregation helper (#58571) (#58578)
`descendsFromBucketAggregator` was important before we removed
`asMultiBucketAggregator` but now that it is gone
`collectsFromSingleBucket` is good enough.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 15:58:44 -04:00
Tanguy Leroux 775fb5d4cf
Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477) (#58584)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.

This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
 SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 18:26:20 +02:00
Armin Braun 090211f768
Fix Incorrect Snapshot Shar Status for DONE Shards in Running Snapshots (#58390) (#58593)
Minor bugs/inconsistencies:

If a shard hasn't changed at all we were reporting `0` for total size and total file count
while it was ongoing.

If a data node restarts/drops out during snapshot creation the fallback logic did not load the correct statistic from the repository but just created a status with `0` counts from the snapshot state in the CS. Added a fallback to reading from the repository in this case.
2020-06-26 16:11:30 +02:00
Howard eaa60b7c54 [Docs] Fix return tuple element order (#58463) 2020-06-26 12:24:54 +02:00
Nik Everett 5f52bc4c9f
Fix two scripted_metric bugs (backport of #58547) (#58565)
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
2020-06-25 16:16:10 -04:00
Armin Braun 468e559ff7
Fix Memory Leak From Master Failover During Snapshot (#58511) (#58560)
If we failed over while the data nodes were doing their work
we would never resolve the listener and leak it.
This change fails all listeners if master fails over.
2020-06-25 20:43:08 +02:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00