Commit Graph

4911 Commits

Author SHA1 Message Date
Julie Tibshirani 1f2e05c947
Simplify mapping validation for resizing indices. (#58514)
When creating a target index from a source index, we don't allow for target
mappings to be specified. This PR simplifies the check that the target mappings
are empty.

This refactor will help when implementing composable template merging, since we
no longer need to resolve + check the target mappings when creating an index
from a template.
2020-06-24 14:07:19 -07:00
Armin Braun 9e4c5d1dde
Cleaner Handling of Snapshot Related null Custom Values in CS (#58382) (#58501)
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
2020-06-24 17:24:44 +02:00
Benjamin Trent fa88e71532
[ML] unify usages of _all and wildcard <*> (#58460) (#58494) 2020-06-24 09:47:57 -04:00
markharwood d5ac3bb87f
Field capabilities - make `keyword` a family of field types (#58315) (#58483)
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.

Relates to #53175
2020-06-24 12:32:14 +01:00
Jim Ferenczi ec8d5ec79c Fix handling of terminate_after when size is 0 (#58212)
`terminate_after` is ignored on search requests that don't return top hits (`size` set to 0)
and do not tracked the number of hits accurately (`track_total_hits`).
We use early termination when the number of hits to track is reached during collection
but this breaks the hard termination of `terminate_after` if it happens before we reached
the `terminate_after` value.
This change ensures that we continue to check `terminate_after` even if the tracking of total
hits has reached the provided value.

Closes #57624
2020-06-24 13:16:11 +02:00
David Turner 796cb9e9ca Reword INDEX_READ_ONLY_ALLOW_DELETE_BLOCK message (#58410)
Users are perennially confused by the message they get when writing to
an index is blocked due to excessive disk usage:

    TOO_MANY_REQUESTS/12/index read-only / allow delete (api)

Of course this is technically accurate but it is hard to join the dots
from this message to "your disk was too full" without some searching of
forums and documentation. Additionally in #50166 we changed the status
code to today's `429` from the previous `403` which changed the message
from the one that's widely documented elsewhere:

    FORBIDDEN/12/index read-only / allow delete (api)

Since #42559 we've considered this block to be under the sole control of
the disk-based shard allocator, and we have seen no evidence to suggest
that anyone is applying this block manually. Therefore this commit
adjusts this block's message to indicate that it's caused by a lack of
disk space.
2020-06-24 10:22:11 +01:00
Alan Woodward d251a482e9 Move MappedFieldType.similarity() to TextSearchInfo (#58439)
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.

It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
2020-06-24 10:00:32 +01:00
Ryan Ernst 89c03e593c
Create utility for custom config setup in packaging tests (#58352)
This commit creates a shared withCustomConfig method that may be used by
any packaging test. The method will copy the config directory and
override the conf path appropriately depending on the distribution type.
2020-06-23 15:12:22 -07:00
Dan Hermann b40c27698f
Fix incorrect stats warning when swap is disabled 2020-06-23 14:34:27 -05:00
James Rodewig affc3954e6
[DOCS] Fix typo in RoutingNode comment (#58079) (#58454)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-06-23 13:07:08 -04:00
Christoph Büscher 642b05a511
Fix test failure in RangeQueryBuilderTests.testToQuery (#58449)
Very rarely this test can fail if we draw a random TimeZone id that we cannot
parse with the legacy joda DateMathParser and get an IllegalArgumentException.
In addition to a "SystemV/*" time zone we also need an index "versionCreated"
before V_7_0_0 and no "format" setting in the query builder. Given how unlikely
this combination is, we should simply dissallow those time zone ids when
generating the random query builder for RangeQueryBuilderTests.

Closes #58431
2020-06-23 17:44:18 +02:00
Mark Tozzi 52806a8f89
Small VS config cleanup (#58294) (#58442) 2020-06-23 10:53:06 -04:00
Alan Woodward 8ebd341710
Add text search information to MappedFieldType (#58230) (#58432)
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.

This allows us to remove the MapperService.getLuceneFieldType() shim method.
2020-06-23 14:37:26 +01:00
Nik Everett 519f41950a
Save memory when significant_text is not on top (#58145) (#58364)
This merges the aggregator for `significant_text` into
`significant_terms`, applying the optimization built in #55873 to save
memory when the aggregation is not on top. The `significant_text`
aggregation is pretty memory intensive all on its own and this doesn't
particularly help with that, but it'll help with the memory usage of any
sub-aggregations.
2020-06-23 09:19:05 -04:00
Dan Hermann 41e8f584c1
[7.x] Minimum node version check before creating data stream (#58424) 2020-06-23 07:45:27 -05:00
Armin Braun 943efb78fd
Save Shard ID Serializations in Bulk Requests (#56209) (#58414)
Just like #56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.

Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:

* 8 bytes for the `ShardId` object (itself + one field)
   * + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
   * + 30 bytes for the `Index` uuid string
   * + all the bytes in the index name string

=> 60+ bytes per bulk request item saved on heap and over the wire
2020-06-23 12:35:52 +02:00
David Turner 256b660f0a
Remove anonymous PublicationContext implementation (#58412)
Today the `PublicationContext` interface has a single anonymous
implementation, and `PublicationTransportHandler` has various methods
that take the variables that this anonymous class captures. This commit
refactors this into a proper class with proper fields and moves the
relevant methods onto this class.

Backport of #58405 to 7.x.
2020-06-23 11:13:23 +01:00
Alan Woodward 519d1278e2
Make FieldTypeLookup immutable (#58162) (#58411)
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
2020-06-23 10:51:32 +01:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00
Przemko Robakowski a44dad9fbb
[7.x] Add support for snapshot and restore to data streams (#57675) (#58371)
* Add support for snapshot and restore to data streams (#57675)

This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.

Closes #57127

Relates to #53100

* version fix

* compilation fix

* compilation fix

* remove unused changes

* compilation fix

* test fix
2020-06-19 22:41:51 +02:00
William Brafford b3c99f06d6
Mute flaky test (#58356) 2020-06-18 15:30:11 -04:00
Andrei Dan 30e777856f
[7.x] Validate alias operations don't target data streams (#58327) (#58337)
This adds validation to make sure alias operations (add, remove, remove index)
don't target data streams or the backing indices.

(cherry picked from commit 816448990e464a02f3960f12f6f6644a8cce36a4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-18 20:23:07 +01:00
Stuart Tettemer 20abba8433
Scripting: Deprecate general cache settings (#55753) (#58283)
Backport: ef543b0
2020-06-18 11:54:23 -06:00
Alan Woodward 4b8cf2af6a
Add serialization test for FieldMappers when include_defaults=true (#58235) (#58328)
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.

Fixes #58188
2020-06-18 15:46:04 +01:00
Alan Woodward ca2d12d039 Remove Settings parameter from FieldMapper base class (#58237)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
2020-06-18 12:53:54 +01:00
Rory Hunter 4da767bb3e Fix version 2020-06-18 12:29:47 +01:00
Rory Hunter a71f0cabdc Version bump for 7.8.0 release 2020-06-18 11:04:56 +01:00
Christoph Büscher ba0b046909 Fix test compilation issue 2020-06-18 11:36:11 +02:00
Christoph Büscher 31d8e03954 Prevent BigInteger serialization errors in term queries (#57987)
When a numeric value in e.g. a `term` query doesn't fit into a long, it
curerently gets parsed to a BigInteger object, that the various term query
builders store untouched. This leads to serialization errors when these queries
are sent across the wire. Instead we can convert to a string representation
early on, since that is what we store e.g. when indexing big integers into
`keyword` fields anyway.

Closes #57917
2020-06-18 11:17:12 +02:00
Jim Ferenczi 82db0b575c
Allow index filtering in field capabilities API (#57276) (#58299)
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes #56195
2020-06-18 10:23:26 +02:00
Yannick Welsch ffeff4090e Add new flag to check whether alias exists on remove (#58100)
This allows doing true CAS operations on aliases, making sure that an alias is actually properly
moved from a given source index onto a given target index. This is useful to ensure that an
alias is actually moved from a given index to another one, and not just added to another index.
2020-06-18 10:15:26 +02:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Mark Vieira ef8899b130
Mute SpanMultiTermQueryBuilderTests.testToQueryInnerTermQuery 2020-06-17 16:27:18 -07:00
Julie Tibshirani b1161cba35 Rename SearchContext#smartNameFieldType. (#58203)
The concept of a 'smart name' doesn't make sense now that there are no mapping
types.
2020-06-17 10:38:32 -07:00
Tim Brooks 2074412d79
Retry failed replication due to transient errors (#56230)
Currently a failed replication action will fail an entire replica. This
includes when replication fails due to potentially short lived transient
issues such as network distruptions or circuit breaking errors.

This commit implements retries using the retryable action.
2020-06-17 10:17:30 -06:00
Luca Cavanna 5ddea03de7 Remove needless termsQuery implementation from StringFieldType (#57609)
The base class `TermBasedFieldType` already implements exactly the same `termsQuery` method, hence there is no need to override it.
2020-06-17 18:04:49 +02:00
GeChenxin a96f526de1 Add index name to refresh mapping task (#57598) 2020-06-17 10:49:36 -04:00
Armin Braun 41af7f5455
Fix Typo in Snapshot Abort Test (#58238) (#58247)
Forgot the brackets here in #58214 so in the rare case where the
first update seen by the listener doesn't match it will still remove
itself and never be invoked again -> timeout.
2020-06-17 14:53:39 +02:00
Nik Everett ab2c6d9696
Save memory when auto_date_histogram is not on top (backport of #57304) (#58190)
This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487
2020-06-17 08:48:41 -04:00
Jason Tedor b78b3edeea
Upgrade to JNA 5.5.0 (#58183)
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
2020-06-17 07:35:08 -04:00
Ignacio Vera b6585f2b51
Add new extensions for Lucene86 points codec to FsDirectoryFactory (#58226) (#58233) 2020-06-17 12:55:33 +02:00
Armin Braun 85be78b624
Fix Snapshot Abort Not Waiting for Data Nodes (#58214) (#58228)
This was a really subtle bug that we introduced a long time ago.
If a shard snapshot is in aborted state but hasn't started snapshotting on a node
we can only send the failed notification for it if the shard was actually supposed
to execute on the local node.
Without this fix, if shard snapshots were spread out across at least two data nodes
(so that each data node does not have all the primaries) the abort would actually
never wait on the data nodes. This isn't a big deal with uuid shard generations
but could lead to potential corruption on S3 when using numeric shard generations
(albeit very unlikely now that we have the 3 minute wait there).
Another negative side-effect of this bug was that master would receive a lot more
shard status update messages for aborted shards since each data node not assigned
a primary would send one message for that primary.
2020-06-17 11:39:50 +02:00
Armin Braun c2b416ee31
Fix DanglingIndicesIT Failures from MasterNotDiscoveredException (#58215) (#58221)
The dangling indices action is not a proper master node action so it does not
retry when executed while the cluster hasn't fully formed yet.
Since we use node restarts when setting up the dangling indices state we need
to manually ensure a fully formed cluster before moving on with the tests to avoid
failures.
2020-06-17 10:34:08 +02:00
Stuart Tettemer 01795d1925
Revert "Scripting: Deprecate general cache settings (#55753)" (#58201)
This reverts commit 88e8b34fc2.
2020-06-16 14:58:18 -06:00
Rory Hunter 03369e0980
Implement dangling indices API (#58176)
Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.

Co-authored-by: David Turner <david.turner@elastic.co>
2020-06-16 21:50:38 +01:00
Stuart Tettemer 88e8b34fc2
Scripting: Deprecate general cache settings (#55753)
Backport: ef543b0
2020-06-16 13:06:59 -06:00
Alan Woodward c6acc7c976 Correctly deal with aliases when retrieving lucene FieldType 2020-06-16 18:06:37 +01:00
Alan Woodward 12a3f6dfca
MappedFieldType should not extend FieldType (#58160)
MappedFieldType is a combination of two concerns:

* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.

Relates to #56814
2020-06-16 16:56:43 +01:00
Dan Hermann 911d46370e
Prohibit clone, shrink, and split on a data stream's write index 2020-06-16 10:53:20 -05:00
Lee Hinman 03ce0f8a4d
[7.x] Normalized prefix for rollover API (#57271) (69e1c066) (#58171)
* Normalized prefix for rollover API (#57271)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Lee Hinman <lee@writequit.org>

It fixes the issue #53388
by normalizing prefix at index creation request itself

* Fix compilation for backport

Co-authored-by: Gaurav Chandani <chngau@amazon.com>
2020-06-16 09:22:10 -06:00