Commit Graph

5704 Commits

Author SHA1 Message Date
Przemko Robakowski fdd2a9c235
Fix whitespace as a separator in CSV processor (#67045) (#67050)
This change fixes problem when using space or tab as a separator in CSV processor - we check if current character is separator before we check if it is whitespace.

This also improves tests to always check all combinations of separators and quotes.

Closes #67013
2021-01-05 23:05:10 +01:00
Jack Conradson 8e294d53bc Fix static inner class resolution in Painless (#67027)
When removing the "lexer hack" to remove type context from the lexer, static inner class resolution 
wasn't properly accounted for. This change adds code to handle static inner class resolution.
2021-01-05 11:12:11 -08:00
Julie Tibshirani 24c0f01543 Ensure all query builder tests consider older versions. (#66401)
This PR removes outdated overrides in some tests that prevent them from testing
older index versions. Also removes an old comment + logic from
AggregatorFactoriesTests.
2020-12-16 11:57:01 -08:00
Alan Woodward fb84b6710d
Restore use of default search and search_quote analyzers (#65491) (#65562)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 18:34:59 +00:00
Alan Woodward dc6b05934f
Correctly serialize search-as-you-type analyzer (#65359)
The search-as-you-type mapper was using an incorrect default analyzer when
creating its merge builder, which meant that a configured analyzer would not
be included in serialization.  This in turn resulted in the master node
getting an incorrect configuration, and search-as-you-type fields always
using a standard analyzer.

This has already been fixed via subsequent refactorings in 7.11 and master,
so this fix is for 7.10 only.

Resolves #65319
2020-11-23 14:02:05 +00:00
Jack Conradson 89ec7db26b Fix casting bug in compound assignment for String (#65329)
This change fixes a bug where when doing compound assignment involving String concatenation, the 
right-hand side will fail to cast to String appropriately and throw a ClassCastException.
2020-11-20 12:09:27 -08:00
Julie Tibshirani 3974c3b066 Move the shared fetch cache to highlighting. (#65105)
The cache is only used by highlighters, so it can be scoped to only the
highlighting context.
2020-11-16 18:54:32 -08:00
Jack Conradson 0beffcd405 Revert null-safe behavior to error at runtime instead of compiletime (#65099)
This reverts a change where null-safe was enhanced to cause a compile-time error instead of a run-
time error when the target value was a primitive type. The reason for the reversion is consistency 
across def/non-def types and versions. I've added a follow up issue to fix this behavior in general 
(#65098).
2020-11-16 10:45:38 -08:00
Alan Woodward caf143f4a5
Unused boost parameter should not throw mapping exception (#64999) (#65014)
We were correctly dealing with boosts that had an effect, but mappers
that had a silently accepted but ignored boost parameter were throwing
an error instead of continuing to ignore the boost but emitting a
warning.

Fixes #64982
2020-11-12 19:28:32 +00:00
Armin Braun 51e9d6f227
Revert Serializing Outbound Transport Messages on IO Threads (#64632) (#64654)
Serializing outbound transport message on the IO loop was introduced in https://github.com/elastic/elasticsearch/pull/56961. Unfortunately it turns out that this is incompatible with assumptions made by CCR code here: f22ddf822e/x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/repositories/GetCcrRestoreFileChunkAction.java (L60-L61) and that are not easy to work around on short notice.

Raising reverting this move (as a temporary solution, it's still a valuable change long-term) as a blocker therefore as this seriously affects the stability of the initial phase of the CCR following by causing corrupted bytes to be send to the follower.
2020-11-05 16:29:12 +01:00
Mayya Sharipova 8c25130a80
Disable using unsigned_long in scripts (#64552) (#64557)
Backport for #64523
Relates to #64361
2020-11-03 16:38:50 -05:00
Ignacio Vera 4851bc7bae
Upgrade to Lucene-8.7.0 (#64532) (#64537) 2020-11-03 16:57:04 +01:00
Nik Everett 0c47d49784
Make sure non-collecting aggs include sub-aggs (backport of #64214) (#64247)
Now that we're consistently using `cat_match` to filter which shards we
run on we can get this confusing case:
1. You have a search with, say, a range and a sub-agg.
2. That search has a query that `can_match` can recognize will match no
   docs. On *any* shard.
3. So we dutifully run it on a single shard so it can produce the
   "empty" aggs.
4. The shard we pick happens to not have the target of the range mapped.
5. This kicks in the special range aggregator that doesn't collect any
   documents.
6. Before this commit, that range aggregator *also* never produced any
   sub-aggs.

So, without this change, it was quite possible for a search that
happened to match no documents to "throw away" the sub-aggs of a range
and a few other aggs.

We've had this problem for a long, long time but it is more confusing
now because `can_match` is really kicking in and causing us to see cases
where it looks like you are targeting a lot of shards but you really are
only targeting a couple. It used to be that to get the "no sub-aggs"
behavior you had to explicitly target only shards that didn't map the
target field of the `range` agg. And, like, in that case it isn't too
bad because you targeted a sort of degenerate shard. But now that
`can_match` is doing its thing you can end up with the confusing steps
above. It took me several hours to track down what what happening I know
how the individual pieces of all of this works. It took four hours to
figure out how they fit together in this case....

Anyway! This replaces all the aggregator implementations that throw out
the sub-aggregators with ones that keep them. I think this'll be less
confusing in the future.

Closes #64142
2020-10-28 08:38:05 -04:00
Jay Modi 0fab675e95
Correct system index names in Kibana module (#64011)
This commit updates the list of system index names to be complete and
correct for Kibana and APM. The pattern `.kibana*` is too inclusive for
system indices and actually includes the
`.kibana-event-log-${version}-${int}` pattern for the Kibana event log,
which should only be hidden and not a system index. Additionally, the
`.apm-custom-link` index was not included in the list of system
indices. Finally, the reporting pattern has been updated to match that
of the permissions given to the kibana_system role.

Backport of #63950
2020-10-21 12:11:10 -06:00
Jay Modi f0e6684a95
Add APM configuration index to Kibana system indices (#64007)
* Add APM index to Kibana system indices, making it
accessible through the _kibana endpoint and giving it the
same access privileges as the other Kibana system indices.
* Parameterize kibana system index tests by index name

Backport of #63756

Co-authored-by: William Brafford <williamrandolphbrafford@gmail.com>
2020-10-21 10:55:23 -06:00
Ignacio Vera d0f5066310
Upgrade to lucene-8.7.0-snapshot-72d8528c3a6 (#63912) (#63928) (#63933) 2020-10-20 15:08:06 +02:00
Nik Everett 5583db5a73
Fix broken parent and child aggregator (backport #63811) (#63892)
In #57892 I broke *some* sub-aggregations inside of the `parent` and
`child` aggregator, specifically any sub-aggregations that do work in
the `postCollect` phase. This fixes it by delaying the post collect
phase of aggs under `parent` and `child` until `beforeBuildingBuckets`
because, well, we haven't done *any* collection until after that phase.
2020-10-19 13:05:22 -04:00
Ioannis Kakavas 364511395d
[7.10] Move RestRequestFilter to core (#63507)
Move RestRequestFilter to core so that Rest requests outside xpack can use 
it to filter fields and expand its usage.

Backport of #63507
2020-10-16 13:57:52 +03:00
Julie Tibshirani 9e52513c7b
Add support for missing value fetchers. (#63585)
This PR implements value fetching for the following field types:
* `text` phrase and prefix subfields
* `search_as_you_type`, plus its subfields
* `token_count`, which is implemented by fetching doc values

Supporting these types helps ensure that retrieving all fields through
`"fields": ["*"]` doesn't fail because of unsupported value fetchers.
2020-10-12 17:34:21 -07:00
Julie Tibshirani ae2fc4118d Add factory methods for common value fetchers. (#63438)
This PR adds factory methods for the most common implementations:
* `SourceValueFetcher.identity` to pass through the source value untouched.
* `SourceValueFetcher.toString` to simply convert the source value to a string.
2020-10-08 12:14:53 -07:00
Dan Hermann f822bb732c
Fix failure in AppendProcessorTests.testAppendingToListWithDuplicatesDisallowed (#62842) (#63499) 2020-10-08 13:15:29 -05:00
Dan Hermann 0024cf33e3
Fix failure in AppendProcessorTests.testAppendingUniqueValueToScalar (#62453) (#63493) 2020-10-08 10:39:57 -05:00
Dan Hermann 85cd8e4868
Fix failure in AppendProcessorTests.testAppendingToListWithDuplicatesDisallowed (#62385) (#63488) 2020-10-08 09:32:55 -05:00
Dan Hermann 85886e71c2
Handle error conditions when simulating ingest pipelines with verbosity enabled (#63327) (#63484) 2020-10-08 09:21:05 -05:00
Mayya Sharipova e022b78198
Upgrade to lucene-8.7.0-snapshot-5c4168d (#63466)
This disables sort optim on _doc, which may still be unstable.
Backport for #63444
2020-10-08 08:20:43 -04:00
Mayya Sharipova e236ea43e9 Upgrade to lucene-8.7.0-snapshot-e914862 (#63401)
Backport for: #63395
2020-10-07 09:45:14 -04:00
Stuart Tettemer 8a61b95a0f
Scripting: JSON parsing and writing in watcher (#63278) (#63377)
Co-authored-by: Honza Král
Co-authored-by: Jack Conradson
Backport of: f43e52d
2020-10-06 23:39:40 -05:00
Mayya Sharipova f2ba62b894
Upgrade to lucene- 8.7.0-snapshot-66c49a35402 (#63372)
This includes fixing a bug in doc iteration during sort optimization

Backport for #63349
2020-10-06 22:38:58 -04:00
Julie Tibshirani f17ca18dfa
Make array value parsing flag more robust. (#63371)
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.

This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.

Follow-up to #62974.
2020-10-06 17:49:25 -07:00
Jack Conradson 2b838d1ea6 fix expression type for null safe operator (#63367)
An invalid void expression type from a null safe operator caused ClassFormatError for the script Map 
x= ['0': 0]; x?.0 > 1. This change sets and propagates the correct expression type for the null safe 
operator to be written out.
2020-10-06 16:02:56 -07:00
Gordon Brown 5c8b0662df
Deprecate REST access to System Indices (#63274) (Original #60945)
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.

Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:

- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`

Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
2020-10-06 13:41:40 -06:00
Dan Hermann 7a59ae8fa2
[7.x] Allow_duplicates option for append processor (#61916) (#63257) 2020-10-06 09:03:47 -05:00
Luca Cavanna ca68298e89
Remove MapperService argument from IndexFieldData.Builder#build (#63197) (#63311)
MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way.
2020-10-06 15:04:23 +02:00
Christoph Büscher 82096d3971
Enable SourceLookup to leverage sequential stored fields reader (#63035) (#63316)
In #62509 we already plugged faster sequential access for stored fields in the fetch phase.
This PR now adds using the potentially better field reader also in SourceLookup.
Rally exeriments are showing that this speeds up e.g. when runtime fields that are using
"_source" are added e.g. via "docvalue_fields" or are used in queries or aggs.

Closes #62621
2020-10-06 14:34:39 +02:00
Nhat Nguyen 1a6837883a Upgrade to Lucene-8.7.0-snapshot-77396dbf339 (#63222)
Includes LUCENE-9554, which exposes the pendingNumDocs from IndexWriter.
2020-10-05 14:39:30 -04:00
Stuart Tettemer 791a9d5102
Scripting: enable regular expressions by default (#63029) (#63272)
* Setting `script.painless.regex.enabled` has a new option,
  `use-factor`, the default.  This defaults to using regular
  expressions but limiting the complexity of the regular
  expressions.

  In addition to `use-factor`, the setting can be `true`, as
  before, which enables regular expressions without limiting them.

  `false` totally disables regular expressions, which was the
  old default.

* New setting `script.painless.regex.limit-factor`.  This limits
  regular expression complexity by limiting the number characters
  a regular expression can consider based on input length.

  The default is `6`, so a regular expression can consider
  `6` * input length number of characters.  With input
  `foobarbaz` (length `9`), for example, the regular expression
  can consider `54` (`6 * 9`) characters.

  This reduces the impact of exponential backtracking in Java's
  regular expression engine.

* add `@inject_constant` annotation to whitelist.

  This annotation signals that a compiler settings will
  be injected at the beginning of a whitelisted method.

  The format is `argnum=settingname`:
  `1=foo_setting 2=bar_setting`.

  Argument numbers must start at one and must be sequential.

* Augment
  `Pattern.split(CharSequence)`
  `Pattern.split(CharSequence, int)`,
  `Pattern.splitAsStream(CharSequence)`
  `Pattern.matcher(CharSequence)`
  to take the value of `script.painless.regex.limit-factor` as a
  an injected parameter, limiting as explained above when this
  setting is in use.

Fixes: #49873
Backport of: 93f29a4
2020-10-05 13:17:47 -05:00
Jack Conradson d134b4f70b Make location final in IRNode (#63078)
This change makes Location a final member of IRNode as opposed to possibly changing it. This 
ensures that all ir nodes have a Location for error information upon creation that cannot be updated 
so each node can be tracked as where it came from originally.
2020-10-05 10:16:31 -07:00
Armin Braun cf75abb021
Optimize XContentParserUtils.ensureExpectedToken (#62691) (#63253)
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
2020-10-05 19:08:32 +02:00
Alan Woodward 01950bc80f
Move FieldMapper#valueFetcher to MappedFieldType (#62974) (#63220)
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
2020-10-04 14:54:59 +01:00
William Brafford 6899ce6309
System index auto-creation should not be disabled by user settings (#62984) (#63147)
* Add System Indices check to AutoCreateIndex

By default, Elasticsearch auto-creates indices when a document is
submitted to a non-existent index. There is a setting that allows users
to disable this behavior. However, this setting should not apply to
system indices, so that Elasticsearch modules and plugins are able to
use auto-create behavior whether or not it is exposed to users.

This commit constructs the AutoCreateIndex object with a reference to
the SystemIndices object so that we bypass the check for the user-facing
autocreate setting when it's a system index that is being autocreated.

We also modify the logic in TransportBulkAction to make sure that if a
system index is included in a bulk request, we don't skip the
autocreation step.
2020-10-01 16:26:07 -04:00
Tim Brooks 7f6d1981a1
Transfer network bytes to smaller buffer (#62673)
Currently we read in 64KB blocks from the network. When TLS is not
enabled, these bytes are normally passed all the way to the application
layer (some exceptions: compression). For the HTTP layer this means that
these bytes can live throughout the entire lifecycle of an indexing
request.

The problem is that if the reads from the socket are small, this means
that 64KB buffers can be consumed by 1KB or smaller reads. If the socket
buffer or TCP buffer sizes are small, the leads to massive memory
waste. It has been identified as a major source of OOMs on coordinating
nodes as Elasticsearch easily exhausts the heap for these network bytes.

This commit resolves the problem by placing a handler after the TLS
handler to copy these bytes to a more appropriate buffer size as
necessary. This comes after TLS, because TLS is a framing layer which
often resolves this problem for us (the 64KB buffer will be decoded
into a more appropriate buffer size). However, this extra handler will
solve it for the non-TLS pipelines.
2020-10-01 10:39:24 -06:00
Jake Landis 0795f4b898
[7.x] Add network from MaxMind Geo ASN database (#61676) (#62898)
This adds the network property from the MaxMind Geo ASN database. 
This enables analysis of IP data based on the subnets that MaxMind have 
previously identified for ASN networks.

closes #60942

Co-authored-by: Peter Ansell <p_ansell@yahoo.com>
2020-10-01 11:01:44 -05:00
Dan Hermann fbf552d24c
Add country_name to the default properties of geoip ingest processor (#62915) (#63124) 2020-10-01 08:47:51 -05:00
Mayya Sharipova abfae16517 Unmute test DeleteByQueryConcurrentTests
Unmute DeleteByQueryConcurrentTests
testConcurrentDeleteByQueriesOnDifferentDocs test.

LUCENE-9449 introduced a bug in sorting on _doc,
which resulted in failure of this test. As Lucene bug
has been fixed, this reenables the test.

Closes #62609
2020-09-30 09:06:42 -04:00
Alan Woodward 2f5a813589
Convert all FieldMappers in mapper-extras to parametrized form (#62938) (#63034)
This converts RankFeatureFieldMapper, RankFeaturesFieldMapper,
SearchAsYouTypeFieldMapper and TokenCountFieldMapper to
parametrized forms. It also adds a TextParams utility class to core
containing functions that help declare text parameters - mainly shared
between SearchAsYouTypeFieldMapper and KeywordFieldMapper at
the moment, but it will come in handy when we convert TextFieldMapper
and friends.

Relates to #62988
2020-09-29 20:50:34 +01:00
Alan Woodward de08ba58bf Convert percolator, murmur3 and histogram mappers to parametrized form (#63004)
Relates to #62988
2020-09-29 14:42:26 +01:00
Mayya Sharipova 4c8c3c8df6
Upgrade lucene to lucene-8.7.0-snapshot-3b59906 (#62978)
Backport for #62970
2020-09-28 16:52:31 -04:00
Tim Brooks 59dd889c10
Split up large HTTP responses in outbound pipeline (#62666)
Currently Netty will batch compression an entire HTTP response
regardless of its content size. It allocates a byte array at least of
the same size as the uncompressed content. This causes issues with our
attempts to remove humungous G1GC allocations. This commit resolves the
issue by split responses into 128KB chunks.

This has the side-effect of making large outbound HTTP responses that
are compressed be send as chunked transfer-encoding.
2020-09-24 16:35:52 -06:00
Tim Brooks 43a4882951
Move CorsHandler to server (#62007)
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
2020-09-24 16:32:59 -06:00
Mayya Sharipova 54064a1eec
Unsigned long 64bits(#62892)
Introduce 64-bit unsigned long field type

This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
  to double and can be imprecise for large values.

Backport for #60050
Closes #32434
2020-09-24 16:51:47 -04:00