We currently compute initial sizings based on the cardinality of our fields.
This can be highly exagerated for sub aggregations, for example if there is a
parent terms aggregation that is executed over a field that has a very long
tail: most buckets will only collect a couple of documents.
Close#5994
We switched to Lucene's SloppyMath way of computing an approximate value of
the eath diameter given a latitude in order to compute distances, yet the
bounding box optimization of the geo distance filter still assumed a constant
earth diameter, equal to the average.
Close#6008
This relates to #6040, the fix is twofold, first, not handling missing context specifically in the search code, but behave the same as we do in non scroll search, where if all the shards failed, raise an exception. The second is to apply this logic in both scroll cases.
Update `geo-shape-type.asciidoc` to include all `GeoShapeType`s supported by the `org.elasticsearch.common.geo.builders.ShapeBuilder`.
Changes include:
1. A tabular mapping of GeoJSON types to Elasticsearch types
2. Listing all types, with brief examples, for all support Elasticsearch types
3. Putting non-standard types to the bottom (really just moving Envelope to the bottom)
4. Linking to all GeoJSON types.
5. Adding whitespace around tightly nested arrays (particularly `multipolygon`) for readability
Way back when, when ES started, there was an idea for a dump infrastructure, but it ended up supporting its serviceability aspects through APIs, remove the unused code
Removed MetaData#concreteIndices variations that didn't require an IndicesOptions argument. Every caller should specify how indices should be resolved to concrete indices based on the indices options argument.
Closes#6059
RootMapper.validate was only used by the routing field mapper, which makes
buggy assumptions about how fields are indexed. For example, it assumes that
the index representation of a field is the same as its external representation.
Close#5844
A bad/non-existing scroll ID used to return a 200, however a 404 might be more useful.
Also, this PR returns the right Exception (SearchContextMissingException) in the Java API.
Additionally: Added StatusToXContent interface and RestStatusToXContentListener listener, so
the appropriate RestStatus can be returned
Closes#5729
Similar to search removal, the operation threading options are not really ued, and the default should always be used. This also considerably simplifies the code.
A side affect is that we can now remove the ShardIterator#firstOrNull method, which can cause for sneaky bugs to occur.
closes#6044
The analyze API used the standard analyzer from lucene and therefore removed
stopwords instead of using the elasticsearch default analyzer.
Closes#5974
The possibility of filtering for index templates in the cluster state API
had been introduced before there was a dedicated index templates API. This
commit removes this support from the cluster state API, as it was not really
clean, requiring you to specify the metadata and the index templates.
Closes#4954
Search operation threading is an option that is not really used, and current non default implementations are flawed. Handling it also creates quite the complexity in the search handling codebase...
This is a breaking change, but one that is actually a good one, since I haven't seen/heard anybody use it, and if its used, its problematic...
closes#6042
Change #5561 introduced a potential bug in that iterations that are performed
on a thread are might not be visible to other threads due to the removal of the
`volatile` keyword.
Close#6039
When a thread pool rejects the execution on the local node, the search might not return.
This happens due to the fact that we move to the next shard only *within* the execution on the thread pool in the start method. If it fails to submit the task to the thread pool, it will go through the fail shard logic, but without "counting" the current shard itself. When this happens, the relevant shard will then execute more times than intended, causing the total opes counter to skew, and for example, if on another shard the search is successful, the total ops will be incremented *beyond* the expectedTotalOps, causing the check on == as the exit condition to never happen.
The fix here makes sure that the shard iterator properly progresses even in the case of rejections, and also includes improvement to when cleaning a context is sent in case of failures (which were exposed by the test).
Though the change fixes the problem, we should work on simplifying the code path considerably, the first suggestion as a followup is to remove the support for operation threading (also in broadcast), and move the local optimization execution to SearchService, this will simplify the code in different search action considerably, and will allow to remove the problematic #firstOrNull method on the shard iterator.
The second suggestion is to move the optimization of local execution to the TransportService, so all actions will not have to explicitly do the mentioned optimization.
fixes#4887
Changed getFiniteStrings to use an iterative implementation instead of
recursive, so we don't use a Java stack-frame per character for each
suggestion at build & query time.
The defaults we have today in our data intensive memory structures don't properly add up to properly protected from potential OOM.
The circuit breaker, today at 80%, aims at protecting from extensive field data loading. The default threshold today is too permissive and can still cause OOMs.
The filter cache today is at 20%, and its too high when adding it to other limits we have, reduce it to 10%, which is still a big enough portion of the heap, yet provides improved safety measure.
closes#5990
The regex tests are formatted with blocks for readability. Previously,
they were formatted using folded style blocks (e.g. using `>`). Folded
blocks convert newlines into spaces. This is problematic for our regex,
since comments can only be terminated with a newline.
Effectively, anything after a comment will be commented out, making many
of the regex "silently pass".
This commit replaces them with scalar-style blocks (e.g. using `|`), which
treats newlines as significant, and thus correctly terminates comments
inside the regex.
Also fixes a regex test (`cat.thread_pool/10_basic.yaml`) that started
to fail after the block was fixed. The test was missing a `\s+` before
the closing newline.
If a source node disconnect during recover, the target node will respond by canceling the recovery. Typically the master will respond by removing the disconnected node from the cluster state, promoting another shard to become primary. This is sent it to all nodes and the target node will start recovering from the new primary. However, if the drop of a node caused the node count to go bellow min_master_node, the master will step down and will not promote shard immediately. When a new master is elected we may publish a new cluster state (who's point is to notify of a new master) which is not yet updated. This caused the node to start a recovery to a non existent node. Before we aborted the recovery without cleaning up the shard, causing subsequent correct cluster states to be ignored. We should not start the recovery process but wait for another cluster state to come in.
Closes#6024
A boost terms factor of 1.0 is not the same as no boosting of terms.
The desired behavior is to deactivate boosting by default. If the user
specifies any value other than 0, then boosting is activated.
Closes#6021
Updating to this version allows to configure a special JNA directory,
in case the /tmp directory is mounted with the noexec option, as JNA
extracts some data and tries to execute parts of it.
Also updated documentation to clarify mlockall and memory settings as well
as pointing to the new jna.tmpdir system property.
Closes#5493
In addition, add a special warning if the misplaced function is a "boost_factor"
function to avoid confusion of "boost" and "boost_function".
closes#5995