Multi fields currently parse any field type passed in. However, they
were only intended to support copying simple values from the outter
field. This change adds validation to ensure object and nested
fields are not used within multi fields.
closes#10745
now that delete by query is out, we don't need this infrastructure code. The delete by query will be implenented as a plugin, with scan scroll + bulk delete, so it will not need this infra anyhow
The current implementation is dangerous: it unexpectedly refreshes,
which can quickly cause an unhealthy index (segment explosion). It
can also delete different documents on primary vs replicas, causing
inconsistent replicas.
For 2.0 we will replace this with an optional plugin that does a
scan/scroll search and then issues bulk delete requests.
Closes#10859
During pinging we open light , temporary connections to the the unicast hosts. After the pinging is done we close those. At the moment we do so before returning the results of the pings to the caller. On the other hand, in our transport logic we acquire a lock specific to the node id while opening a connection. When disconnecting from node, we have to acquire the same lock in order to guarantee the the connection opening has finished. This can cause big delays in environments where opening a connection is very slow, as the connection closing has to wait *after* the pinging was done.. This can be problematic as it causes master election to use stale data.
Closes#10849
The original goal of this cache was to avoid parsing the same query several
times in case several shards are held on the same node. While this might
sound like a good idea, this would only help when parsing the query takes
non-negligible time compared to actually running the query, which should not
be the case.
if we don't have an ElasticsearchException as the wrapper of the
actual cause we don't render a root cause today. This commit adds
support for 3rd party exceptions as root causes.
Closes#10836
We deployed our own code to check if directories are closed etc an d
if serachers are still open. Yet, since we don't have a global cluster
anymore we can just use lucene's internal mechanism to do that. This commit
removes all special handling and usese LuceneTestCase.closeAfterSuite to
fail if certain resources are not closed
Closes#10853
field_value_factor now takes a default that is used if the document doesn't
have a value for that field. It looks like:
"field_value_factor": {
"field": "popularity",
"missing": 1
}
Closes#10841
This commit adds support for running with only one node and sets the
maximum number of nodes to 3 by default. if run with test.nighly=true
at most 6 nodes are used. This gave a 20% speed improvement compared to
the previoulys minimum number of nodes of 3.
Groovy sandboxing was disabled by default from 1.4.3 on though since we found out that it could be worked around, so it makes little sense to keep it and maintain it.
Closes#10156Closes#10480
Best effort to print out the search source depending on how it was set to the SearchRequestBuilder, don't call `internalBuilder() as that causes the content of the request to be wiped.
Closes#5576
This pull request replaces the current self-made implementation of JSON encoding special chars with re-using the Jackson JsonStringEncoder. Turns out the previous implementation also missed a few special chars so had to adjust the tests accordingly (looked at RFC 4627 for reference).
Note: There's another JSON String encoder on our classpath (org.apache.commons.lang3.StringEscapeUtils) that essentially does the same thing but adds quoting to more characters than the Jackson Encoder above.
Relates to #5473
The _field_names field was fixed in 1.5.1 (#10268) to correctly be
disabled for indexes before 1.3.0. However, only the exists filter
was updated to check this enabled flag on 1.x/1.5. The missing
filter on those branches still checks the field type to see if it
is indexed, which causes the filter to always try and use
the _field_names field for those old indexes.
This change adds a test to the old index tests for missing filter.
closes#10842
1. initialize SM after things like mlockall. Their tests currently
don't run with securitymanager enabled, and its simpler to just
run mlockall etc first.
2. remove redundant test permissions (junit4.childvm.cwd/temp). This
is alreay added as java.io.tmpdir.
3. improve tests to load the generated policy with some various
settings and assert things about the permissions on configured
directories.
4. refactor logic to make it easier to fine-grain the permissions later.
for example we currently allow write access to conf/. In the future
I think we can improve testing so we are able to make improvements here.
Since the circuit breaking service doesn't actually change for
BigArrays, we can eagerly create a new instance only once and use that
for all further invocations of `withCircuitBreaking`.
This changes the default ranking behaviour of single-term queries on numeric fields to use the usual Lucene TermQuery scoring logic rather than a constant-scoring wrapper.
Closes#10628
We have some duplication in TransportIndexAction/TransportShardBulkAction due
to the fact that we have totally different branches for INDEX and CREATE
operations. This commit tries to share the logic better between these two cases.
Adds support for calculating and sending diffs instead of full cluster state of the most frequently changing elements - cluster state, meta data and routing table.
Closes#6295
Whenever a transport client executes a request, it uses a built-in
RetryListener which tries to execute the request on another node.
However, if a connection error occurs, the onFailure() callback of
the listener is triggered, the netty I/O thread might still be used
to whatever failure has been added.
This commit offloads the onFailure handling to the generic thread pool.
the translog might be reused across engines which is currently a problem
in the design such that we have to allow calls to `close` more than once.
This moves the closed check for snapshot on the actual file to exit the loop.
Relates to #10807
If the translog is closed while a snapshot opertion is in progress
we must fail the snapshot operation otherwise we end up in an endless
loop.
Closes#10807
Internal: Change BigArrays to not extend AbstractComponent
In order to avoid the getLogger(getClass()) calls in the
AbstractComponent constructor.
Seems like BigArrays used to be a Singleton but it actually
no longer is one. Every time a SearchContext is created a
new BigArrays instance is created via the
withCircuitBreaking call.
closes#10798
In order to avoid the ``getLogger(getClass())`` calls in the
AbstractComponent constructor.
Seems like BigArrays used to be a Singleton but it actually
no longer is one. Every time a SearchContext is created a
new BigArrays instance is created via the
``withCircuitBreaking`` call.
Due to timing issues, mappings that are required to index a document might not
be available on the replica at indexing time. In that case the replica starts
listening to cluster state changes and re-parses the document until no dynamic
mappings updates are generated.
MergeContext currently exists to store conflicts, and providing
a mechanism to add dynamic fields. MergeResults store the same
conflicts. This change merges the two classes together, as well
as removes the MergeFlags construct.
This is in preparation for simplifying the callback structures
to dynamically add fields, which will require storing the mapping
updates in the results, instead of having a sneaky callback to
the DocumentMapper instance. It also just makes more sense that
the "results" of a merge are conflicts that occurred, along with
updates that may have occurred. For MergeFlags, any future needs
for parameterizing the merge (which seems unlikely) can just be
added directly to the MergeResults as simlulate is with this change.
This refactoring and cleanup is that each request handler ends up
implementing too many methods that can be provided when the request handler itself
is registered, including a prototype like class that can be used to instantiate
new request instances for streaming.
closes#10730
Today we have a lot of bloat in the IndexStore and related classes. THe IndexStore interface
is unneeded as we always subclass AbstractIndexStore and it hides circular dependencies
that are problematic when added. Guice proxies them if you have an interface which is bad in
general. This commit removes most of the bloat classes and unifies all the classes we have
into a single one since they all just structural and don't encode any functionality.
Refactor TransportShardReplicationOperationAction state management into clear separate Primary phase and Replication phase. The primary phase is responsible for routing the request to the node holding the primary, validating it and performing the operation on the primary. The Replication phase is responsible for sending the request to the replicas and managing their responses.
This also adds unit test infrastructure for this class, and some basic tests. We can extend later as we continue developing.
Closes#10749
This commit adds support for structural errors / failures / exceptions
on the elasticsearch REST layer. Exceptions are rendering with at least
a `type` and a `reason` corresponding to the exception name and the message.
Some expcetions like the ones associated with an index or a shard will have
additional information about the index the exception was triggered on or the
shard respectivly.
Each rendered response will also contain a list of root causes which is a list
of distinct shard level errors returned for the request. Root causes are the lowest
level elasticsearch exception found per shard response and are intended to be displayed
to the user to indicate the soruce of the exception.
Shard level response are by-default grouped by their type and reason to reduce the amount
of duplicates retunred. Yet, the same exception retunred from different indices will not be
grouped.
Closes#3303
Disabling doc values or trying to index hash values are not
correct uses of this the murmur3 field type, and just cause
problems. This disallows changing doc values or index options
for 2.0+.
closes#10465
This commit splits the current ClusterBlockLevel.METADATA into two disctins ClusterBlockLevel.METADATA_READ and ClusterBlockLevel.METADATA_WRITE blocks. It allows to make a distinction between
an operation that modifies the index or cluster metadata and an operation that does not change any metadata.
Before this commit, many operations where blocked when the cluster was read-only: Cluster Stats, Get Mappings, Get Snapshot, Get Index Settings, etc. Now those operations are allowed even when
the cluster or the index is read-only.
Related to #8102, #2833Closes#3703Closes#5855Closes#10521Closes#10522
While dynamic mappings updates are using the same code path as updates from the
API when applied on a data node since #10593, they were still using a different
code path on the master node. This commit makes dynamic updates processed the
same way as updates from the API, which also seems to do a better way at
acknowledgements (I could not reproduce the ConcurrentDynamicTemplateTests
failure anymore). It also adds more checks, like for instance that indexing on
replicas should not trigger dynamic mapping updates since they should have been
handled on the primary before.
Close#10720
The field stats api returns field level statistics such as lowest, highest values and number of documents that have at least one value for a field.
An api like this can be useful to explore a data set you don't know much about. For example you can figure at with the lowest and highest response times are, so that you can create a histogram or range aggregation with sane settings.
This api doesn't run a search to figure this statistics out, but rather use the Lucene index look these statics up (using Terms class in Lucene). So finding out these stats for fields is cheap and quick.
The min/max values are based on the type of the field. So for a numeric field min/max are numbers and date field the min/max date and other fields the min/max are term based.
Closes#10523
- Randomizes the metric type between min/max/avg. Should have identical behavior, but good to test
- Fixes improper handling of gaps due to a bug in the production of the "expected" dataset. Due to this fix,
randomization of gap policy was re-enabled
- Bunch of renaming to be more descriptive and less verbose
This commit adds the ability for moving average models to output a "prediction" based on the current
moving average model. For simple, linear and single, this prediction is simply converges on the
moving average's mean at the last point, leading to a straight line. For double, this will
predict in the direction of the linear trend (either globally or locally, depending on beta).
Also adds some more tests.
Closes#10545
Added an initial script construct to unify the parameters typically
passed to methods in the ScriptService. This changes the way several public
methods are called in the ScriptService along with all the callers
since they must wrap the parameters passed in into a script object. In the
future, parsing parameters can also be moved into this construct along with
ToXContent.
closes#10649
When using a filesystem that may have lag between an index being created
on the primary and a on the replica, creation of the ShadowEngine can
fail because there are no segments in the directory.
In these situations, we retry during engine creation to wait until an
index is present in the directory. The number wait delay is
configurable, defaulting to waiting for 5 seconds from an index to
become available.
Resolves#10637