When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.
The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.
Closes#46678
We should only snapshot the index we're going to
restore in the next step. Otherwise, we will
potentially not get the correct response or
fail restoring outright due to internal indices
getting created concurrently when running against
the x-pack distribution.
Closes#46844
Follow up from #42346. Since the `methods` array is in order of
preference when calling the index API with an `{id}` we prefer to use
the `PUT` http method.
The fact that this test randomly uses a relatively large number
of nodes and hence Netty worker threads created a problem with
running out of direct memory on CI.
Tests run with 512M heap (and hence 512M direct memory) by default.
On a CI worker with 16 cores, this means Netty will by default set
up 32 transport workers. If we get unlucky and a lot of them
actually do work (and thus instantiate a `CopyBytesSocketChannel`
which costs 1M per thread for the thread-local IO buffer) we
would run out of memory.
This specific failure was only seen with `NativeRealmIntegTests` so I
only added the constraint on the Netty worker count here.
We can add it to other tests (or `SecurityIntegTestCase`) if need be
but for now it doesn't seem necessary so I opted for least impact.
Closes#46803
In #46250 we will have to move to two different delete
paths for BwC. Both paths will share the initial
reads from the repository though so I extracted/separated the
delete logic away from the initial reads to significantly simplify
the diff against #46250 here.
Also, I added some JavaDoc from #46250 here as well which makes the
code a little easier to follow even ignoring #46250 I think.
* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
This commit disables caching of BWC snapshot distributions in the "trunk" (aka master) branch.
Since the previous major release branches move quickly we rarely get cache hits for these
tasks, and the artifacts themselves are very large. This means the overhead here is high and
savings basically zero. We conditionally disable task output caching in this scenario in CI to
avoid excessive build cache overhead as well as causing too much turn in the cache itself which
would lead to lots of cache entry evictions.
This commit reuses the same state processor that is used for autodetect
to parse state output from data frame analytics jobs. We then index the
state document into the state index.
Backport of #46804
* [ML][Transforms] remove `force` flag from _start (#46414)
* [ML][Transforms] remove `force` flag from _start
* fixing expected error message
* adjusting bwc version
Prior to this commit terminate_after was sent as request body parameter
(via SearchSourceBuilder), which is not possible in the count api.
Closes#46446
It is possible for a running analytics job that its config is removed
from the '.ml-config' index (perhaps the user deleted the entire index,
etc.). In that case the task remains without a matching config. I have
raised #46781 to discuss how to deal with this issue.
This commit focuses on `MlMemoryTracker` and changes it so that when
we get the configs for the running tasks we leniently ignore missing ones.
This at least means memory tracking will keep working for other jobs
if one or more are missing.
In addition, this commit makes the cleanup code for native analytics
tests more robust by explicitly stopping all jobs and force-stopping
if an error occurs. This helps so that a single failing test does
not cause other tests fail due to pending tasks.
Backport of #46789
Suppress the reasonable-history check in this test to guarantee we're always getting ops based recovery even after a background sync.
Closes#45953
Co-Authored-By: David Turner <david.turner@elastic.co>
This adds an assert to make sure we're not leaking
index-N blobs on the shard level to the repo consistency checks.
It is ok to have a single redundant index-N blob in a failure scenario
but additional index-N should always be cleaned up before adding more.
Since the `resolveAllDependencies` task resolves all the congfigurations
it can find, this was not caught by our testing, but it's required to be
configuraed specifically.
We should probably cut-over to the new configurations at some point to
avoid problems like this.
Closeselastic/infra#14580
This commit adds support for Put Block API to the internal HTTP server
used in Azure repository integration tests. This allows to test the
behavior of the Azure SDK client when the Azure Storage service
returns errors when uploading Blob in multiple blocks or when
downloading a blob using ranged downloads.
This commit adds support for resumable uploads to the internal HTTP
server used in GoogleCloudStorageBlobStoreRepositoryTests. This
way we can also test the behavior of the Google's client when the
service returns server errors in response to resumable upload requests.
The BlobStore implementation for GCS has the choice between 2
methods to upload a blob: resumable and multipart. In the current
implementation, the client executes a resumable upload if the blob
size is larger than LARGE_BLOB_THRESHOLD_BYTE_SIZE,
otherwise it executes a multipart upload. This commit makes this
logic overridable in tests, allowing to randomize the decision of
using one method or the other.
The commit add support for single request resumable uploads
and chunked resumable uploads (the blob is uploaded into multiple
2Mb chunks; each chunk being a resumable upload). For this last
case, this PR also adds a test testSnapshotWithLargeSegmentFiles
which makes it more probable that a chunked resumable upload is
executed.
In the case that an ingest processor factory relies on other configuration
in the cluster state in order to construct a processor instance then
it is currently undetermined if a processor facotry can be notified about
a change if multiple cluster state updates are bundled together and
if a processor implement `ClusterStateApplier` interface.
(IngestService implements this interface too)
The idea with ingest cluster state listener is that it is guaranteed to
update the processor factory first before the ingest service creates
a pipeline with their respective processor instances.
Currently this concept is used in the enrich branch:
https://github.com/elastic/elasticsearch/blob/enrich/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichProcessorFactory.java#L21
In this case it a processor factory is interested in enrich indices' _meta
mapping fields.
This is the third PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.
Changes to the server module are merged separately from the pr that will
merge enrich into master, so that these changes can be reviewed in isolation.
With the next minor release of Elasticsearch we will drop support for
JDK 12 and bump to JDK 13. While we want to use AdoptOpenJDK as the
bundled JDK, we are waiting for a release there. This commit moves to
OpenJDK 13 for now, and we will move to AdoptOpenJDK 13 as soon as its
available. Since macOS Catalina is delayed until October, we have some
time to update this.
* Give kibana user reserved role privileges on .apm-* to create APM agent configuration index.
* fixed test to include checking all .apm-* permissions
* changed pattern from ".apm-*" to the more specific ".apm-agent-configuration"
This commit teaches the build how to bundle AdoptOpenJDK with our
artifacts, and switches to AdoptOpenJDK as the bundled JDK. We keep the
functionality to also bundle Oracle OpenJDK distributions.
Previously, cross-cluster replication (CCR) documentation was located in
the Stack Overview:
https://www.elastic.co/guide/en/elastic-stack-overview/master/xpack-ccr.html
This adds CCR documentation to the Elasticsearch Reference Guide with a
level offset for headings.
The level offset and CCR Stack Overview docs will be removed in later
commits.
A resumable upload session can fail on with a 410 error and should
be retried in that case. I added retrying twice using resetting of
the given `InputStream` as the retry mechanism since the same
approach is used by the AWS S3 SDK already as well and relied upon
by the S3 repository implementation.
Related GCS documentation:
https://cloud.google.com/storage/docs/json_api/v1/status-codes#410_Gone
Reenable this test since it was fixed by #45689 in production
code (specifically, the fact that we write the `snap-` blobs
without overwrite checks now).
Only required adding the assumed blocking on index file writes
to test code to properly work again.
* Closes#25281
There were some issues with the Azure implementation requiring
permissions to list all containers ue to a container exists
check. This was caught in CI this time, but going forward we
should ensure that CI is executed using a token that does not
allow listing containers.
Relates #43288
When encountering only indices with empty mapping, the IndexResolver
throws an exception as it expects to find at least one entry.
This commit fixes this case so that an empty mapping is returned.
Fix#46757
(cherry picked from commit 5f4f5807acb93b5fab36718c092c328977a396b6)
Today if the connection to S3 times out or drops after starting to download an
object then the SDK does not attempt to recover or resume the download, causing
the restore of the whole shard to fail and retry. This commit allows
Elasticsearch to detect such a mid-stream failure and to resume the download
from where it failed.
* Write metadata during snapshot finalization after segment files to prevent outdated metadata in case of dynamic mapping updates as explained in #41581
* Keep the old behavior of writing the metadata beforehand in the case of mixed version clusters for BwC reasons
* Still overwrite the metadata in the end, so even a mixed version cluster is fixed by this change if a newer version master does the finalization
* Fixes#41581
Handle queries with implicit GROUP BY where the aggregation is not in
the projection/SELECT but inside the filter/HAVING such as:
SELECT 1 FROM x HAVING COUNT(*) > 0
The engine now properly identifies the case and handles it accordingly.
Fix#37051
(cherry picked from commit fa53ca05d8219c27079b50b4a5b7aeb220c7cde2)