During the bulk action a hierachy of tasks is getting created: bulk->bulk[s] (coord node) -> bulk[s] (primary shard node) -> bulk[s][p] and bulk[s][r]. Due to a bug the first bulk[s] task didn't have bulk's task id is set as a parent id. This commit fixes this bug.
Intellij has a model of "everything is a source dir unless you say
otherwise". This change fixes the intellij configuration to not think
the gradle or eclipse build dirs are source dirs.
Handling of the current path when parsing a document is very sensitive.
This fixes a subtle bug in array parsing, where the path that was added
by parsing an array would not be cleared. It also adds a hard state
check at the end of parsing to ensure we ended with a clean path.
The doc parser uses a context object to store the state of parsing,
namely the existing mappings, new mappings, and the parsed document.
Currently this uses a threadlocal which is "reset" for each doc parsed.
However, the thread local doesn't actually save anything, since
resetting is constructing new objects. This change removes the thread
local, which also simplifies the mapper service as it now does not need
to be closeable.
This commit really reverts the inadvertent removal of allowing duplicate
calls to Node#start to be a no-op (but was mistakenly restored to
Node#stop in ddfa3a661510f25c2ce431dfd6fb86ac11eb8888).
The checkstyle configuration files were being accessed as resources within the project and
being converted from a URL to a File by gradle. This works when the build tools project is being
referenced as a local project. However, when using the published jar the URL points to a resource
in the jar file, that URL cannot be converted to a File object and causes the build to fail.
This change copies the files into a `checkstyle` directory in the project build folder and always uses
File objects pointing to the copied files.
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.
Notes about how the change works:
- Numeric mappers have been split into a legacy version that is essentially
the current mapper, and a new version that uses points, eg.
LegacyDateFieldMapper and DateFieldMapper.
- Since new and old fields have the same names, the decision about which one
to use is made based on the index creation version.
- If you try to force using a legacy field on a new index or a field that uses
points on an old index, you will get an exception.
- IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
and sortable).
- The internal MappedFieldType that is stored by the new mappers does not have
any of the points-related properties set. Instead, it keeps setting the index
options when parsing the `index` property of mappings and does
`if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
when parsing documents.
Known issues that won't fix:
- You can't use numeric fields in significant terms aggregations anymore since
this requires document frequencies, which points do not record.
- Term queries on numeric fields will now return constant scores instead of
giving better scores to the rare values.
Known issues that we could work around (in follow-up PRs, this one is too large
already):
- Range queries on `ip` addresses only work if both the lower and upper bounds
are inclusive (exclusive bounds are not exposed in Lucene). We could either
decide to implement it, or drop range support entirely and tell users to
query subnets using the CIDR notation instead.
- Since IP addresses now use a different representation for doc values,
aggregations will fail when running a terms aggregation on an ip field on a
list of indices that contains both pre-5.0 and 5.0 indices.
- The ip range aggregation does not work on the new ip field. We need to either
implement range aggs for SORTED_SET doc values or drop support for ip ranges
and tell users to use filters instead. #17700Closes#16751Closes#17007Closes#11513
This commit contains the following improvements/fixes:
1. Renaming method names and variables to better reflect the purpose
of the method and the semantics of the variable.
2. For deleting indexes, replace the closed parameter passed to the
delete index/store methods with obtaining the index's state from the
IndexSettings that is already passed in.
3. Added tests to the IndexWithShadowReplicaIT suite, some of which
show issues in the shadow replica delete process that are captured in
Github issue 17695.
Closes#17638
When we pass both XContentParser and QueryParseContext to a method this can be trappy because
we cannot make sure that the parser contained in the context and the parser passed as an argument
are the same.
This removes the parser argument from methods where we currently have both the parser and the parse
context as arguments and instead retrieves the parse from the context inside the method.
The change adds a new option to the geo_* queries: ignore_unmapped. If this option is set to false, the toQuery method on the QueryBuilder will throw an exception if the field specified in the query is unmapped. If the option is set to true, the toQuery method on the QueryBuilder will return a MatchNoDocsQuery. The default value is false so the queries work how they do today (throwing an exception on unmapped field)
It's use tempted the creation of PROTOTYPEs. The only classes that
legitimately implement a readFrom method are those extending from
Diffable - such behavior is part of cluster state management and
out of scope for the PROTOTYPE cleanup.
When we pass down both parser and QueryParseContext to a method, we cannot
make sure that the parser contained in the context and the parser that is
parsed as an argument have the same state. This removes the parser argument
from methods where we currently have both the parser and the parse context
as arguments and instead retrieves the parse from the context inside the
method.
Since OpenJDK virtual machines have G1 GC but do not have a java.vm.name
that contains HotSpot, this test fails on OpenJDK. Instead, the
java.vm.name condition should be expanded to include OpenJDK virtual
machines.
* Inner hits can now only be provided and prepared via setter in the nested, has_child and has_parent query.
* Also made `score_mode` a required constructor parameter.
* Moved has_child's min_child/max_children validation from doToQuery(...) to a setter.
This commit adds a simple test that JvmInfo is correctly able to
determine whether or not G1 GC is running. As the JvmInfo G1 GC logic is
only applies to HotSpot, the test is constructed to do the same. The
test determines whether or not G1 GC is running by inspecting the test
JVM argument line.
The change adds a new option to the `nested`, `has_parent`, `has_children` and `parent_id` queries: `ignore_unmapped`. If this option is set to false, the `toQuery` method on the QueryBuilder will throw an exception if the type/path specified in the query is unmapped. If the option is set to true, the `toQuery` method on the QueryBuilder will return a MatchNoDocsQuery. The default value is `false`so the queries work how they do today (throwing an exception on unmapped paths/types)
With the previous breaker limit spurious failures on transport level
can happen in the test. With the new limit we ensure that the breaker
breaks due to a too large HTTP request instead.
Previously the sigma variable in the `extended_stats_bucket` pipeline aggregation was not being serialised in `ExtendedStatsBucketPipelineAggregator`. This PR fixes that.
It also corrects the initial value of sumOfSquares to be 0.
Closes#17701