This change fixes problem when using space or tab as a separator in CSV processor - we check if current character is separator before we check if it is whitespace.
This also improves tests to always check all combinations of separators and quotes.
Closes#67013
When removing the "lexer hack" to remove type context from the lexer, static inner class resolution
wasn't properly accounted for. This change adds code to handle static inner class resolution.
This PR removes outdated overrides in some tests that prevent them from testing
older index versions. Also removes an old comment + logic from
AggregatorFactoriesTests.
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.
Fixes#65434
The search-as-you-type mapper was using an incorrect default analyzer when
creating its merge builder, which meant that a configured analyzer would not
be included in serialization. This in turn resulted in the master node
getting an incorrect configuration, and search-as-you-type fields always
using a standard analyzer.
This has already been fixed via subsequent refactorings in 7.11 and master,
so this fix is for 7.10 only.
Resolves#65319
This change fixes a bug where when doing compound assignment involving String concatenation, the
right-hand side will fail to cast to String appropriately and throw a ClassCastException.
This reverts a change where null-safe was enhanced to cause a compile-time error instead of a run-
time error when the target value was a primitive type. The reason for the reversion is consistency
across def/non-def types and versions. I've added a follow up issue to fix this behavior in general
(#65098).
We were correctly dealing with boosts that had an effect, but mappers
that had a silently accepted but ignored boost parameter were throwing
an error instead of continuing to ignore the boost but emitting a
warning.
Fixes#64982
Now that we're consistently using `cat_match` to filter which shards we
run on we can get this confusing case:
1. You have a search with, say, a range and a sub-agg.
2. That search has a query that `can_match` can recognize will match no
docs. On *any* shard.
3. So we dutifully run it on a single shard so it can produce the
"empty" aggs.
4. The shard we pick happens to not have the target of the range mapped.
5. This kicks in the special range aggregator that doesn't collect any
documents.
6. Before this commit, that range aggregator *also* never produced any
sub-aggs.
So, without this change, it was quite possible for a search that
happened to match no documents to "throw away" the sub-aggs of a range
and a few other aggs.
We've had this problem for a long, long time but it is more confusing
now because `can_match` is really kicking in and causing us to see cases
where it looks like you are targeting a lot of shards but you really are
only targeting a couple. It used to be that to get the "no sub-aggs"
behavior you had to explicitly target only shards that didn't map the
target field of the `range` agg. And, like, in that case it isn't too
bad because you targeted a sort of degenerate shard. But now that
`can_match` is doing its thing you can end up with the confusing steps
above. It took me several hours to track down what what happening I know
how the individual pieces of all of this works. It took four hours to
figure out how they fit together in this case....
Anyway! This replaces all the aggregator implementations that throw out
the sub-aggregators with ones that keep them. I think this'll be less
confusing in the future.
Closes#64142
This commit updates the list of system index names to be complete and
correct for Kibana and APM. The pattern `.kibana*` is too inclusive for
system indices and actually includes the
`.kibana-event-log-${version}-${int}` pattern for the Kibana event log,
which should only be hidden and not a system index. Additionally, the
`.apm-custom-link` index was not included in the list of system
indices. Finally, the reporting pattern has been updated to match that
of the permissions given to the kibana_system role.
Backport of #63950
* Add APM index to Kibana system indices, making it
accessible through the _kibana endpoint and giving it the
same access privileges as the other Kibana system indices.
* Parameterize kibana system index tests by index name
Backport of #63756
Co-authored-by: William Brafford <williamrandolphbrafford@gmail.com>
In #57892 I broke *some* sub-aggregations inside of the `parent` and
`child` aggregator, specifically any sub-aggregations that do work in
the `postCollect` phase. This fixes it by delaying the post collect
phase of aggs under `parent` and `child` until `beforeBuildingBuckets`
because, well, we haven't done *any* collection until after that phase.
This PR implements value fetching for the following field types:
* `text` phrase and prefix subfields
* `search_as_you_type`, plus its subfields
* `token_count`, which is implemented by fetching doc values
Supporting these types helps ensure that retrieving all fields through
`"fields": ["*"]` doesn't fail because of unsupported value fetchers.
This PR adds factory methods for the most common implementations:
* `SourceValueFetcher.identity` to pass through the source value untouched.
* `SourceValueFetcher.toString` to simply convert the source value to a string.
When constructing a value fetcher, the 'parsesArrayValue' flag must match
`FieldMapper#parsesArrayValue`. However there is nothing in code or tests to
help enforce this.
This PR reworks the value fetcher constructors so that `parsesArrayValue` is
'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must
explicitly set it to true and ensure the behavior is covered by tests.
Follow-up to #62974.
An invalid void expression type from a null safe operator caused ClassFormatError for the script Map
x= ['0': 0]; x?.0 > 1. This change sets and propagates the correct expression type for the null safe
operator to be written out.
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.
Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:
- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`
Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way.
In #62509 we already plugged faster sequential access for stored fields in the fetch phase.
This PR now adds using the potentially better field reader also in SourceLookup.
Rally exeriments are showing that this speeds up e.g. when runtime fields that are using
"_source" are added e.g. via "docvalue_fields" or are used in queries or aggs.
Closes#62621
* Setting `script.painless.regex.enabled` has a new option,
`use-factor`, the default. This defaults to using regular
expressions but limiting the complexity of the regular
expressions.
In addition to `use-factor`, the setting can be `true`, as
before, which enables regular expressions without limiting them.
`false` totally disables regular expressions, which was the
old default.
* New setting `script.painless.regex.limit-factor`. This limits
regular expression complexity by limiting the number characters
a regular expression can consider based on input length.
The default is `6`, so a regular expression can consider
`6` * input length number of characters. With input
`foobarbaz` (length `9`), for example, the regular expression
can consider `54` (`6 * 9`) characters.
This reduces the impact of exponential backtracking in Java's
regular expression engine.
* add `@inject_constant` annotation to whitelist.
This annotation signals that a compiler settings will
be injected at the beginning of a whitelisted method.
The format is `argnum=settingname`:
`1=foo_setting 2=bar_setting`.
Argument numbers must start at one and must be sequential.
* Augment
`Pattern.split(CharSequence)`
`Pattern.split(CharSequence, int)`,
`Pattern.splitAsStream(CharSequence)`
`Pattern.matcher(CharSequence)`
to take the value of `script.painless.regex.limit-factor` as a
an injected parameter, limiting as explained above when this
setting is in use.
Fixes: #49873
Backport of: 93f29a4
This change makes Location a final member of IRNode as opposed to possibly changing it. This
ensures that all ir nodes have a Location for error information upon creation that cannot be updated
so each node can be tracked as where it came from originally.
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
* Add System Indices check to AutoCreateIndex
By default, Elasticsearch auto-creates indices when a document is
submitted to a non-existent index. There is a setting that allows users
to disable this behavior. However, this setting should not apply to
system indices, so that Elasticsearch modules and plugins are able to
use auto-create behavior whether or not it is exposed to users.
This commit constructs the AutoCreateIndex object with a reference to
the SystemIndices object so that we bypass the check for the user-facing
autocreate setting when it's a system index that is being autocreated.
We also modify the logic in TransportBulkAction to make sure that if a
system index is included in a bulk request, we don't skip the
autocreation step.
Currently we read in 64KB blocks from the network. When TLS is not
enabled, these bytes are normally passed all the way to the application
layer (some exceptions: compression). For the HTTP layer this means that
these bytes can live throughout the entire lifecycle of an indexing
request.
The problem is that if the reads from the socket are small, this means
that 64KB buffers can be consumed by 1KB or smaller reads. If the socket
buffer or TCP buffer sizes are small, the leads to massive memory
waste. It has been identified as a major source of OOMs on coordinating
nodes as Elasticsearch easily exhausts the heap for these network bytes.
This commit resolves the problem by placing a handler after the TLS
handler to copy these bytes to a more appropriate buffer size as
necessary. This comes after TLS, because TLS is a framing layer which
often resolves this problem for us (the 64KB buffer will be decoded
into a more appropriate buffer size). However, this extra handler will
solve it for the non-TLS pipelines.
This adds the network property from the MaxMind Geo ASN database.
This enables analysis of IP data based on the subnets that MaxMind have
previously identified for ASN networks.
closes#60942
Co-authored-by: Peter Ansell <p_ansell@yahoo.com>
Unmute DeleteByQueryConcurrentTests
testConcurrentDeleteByQueriesOnDifferentDocs test.
LUCENE-9449 introduced a bug in sorting on _doc,
which resulted in failure of this test. As Lucene bug
has been fixed, this reenables the test.
Closes#62609
This converts RankFeatureFieldMapper, RankFeaturesFieldMapper,
SearchAsYouTypeFieldMapper and TokenCountFieldMapper to
parametrized forms. It also adds a TextParams utility class to core
containing functions that help declare text parameters - mainly shared
between SearchAsYouTypeFieldMapper and KeywordFieldMapper at
the moment, but it will come in handy when we convert TextFieldMapper
and friends.
Relates to #62988
Currently Netty will batch compression an entire HTTP response
regardless of its content size. It allocates a byte array at least of
the same size as the uncompressed content. This causes issues with our
attempts to remove humungous G1GC allocations. This commit resolves the
issue by split responses into 128KB chunks.
This has the side-effect of making large outbound HTTP responses that
are compressed be send as chunked transfer-encoding.
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
Introduce 64-bit unsigned long field type
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
to double and can be imprecise for large values.
Backport for #60050Closes#32434