* Decouple XContentBuilder from BytesReference
This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.
While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).
Relates to #28504
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.
Which allows running:
```
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
```
Then index either:
```
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
```
Which will use the default value (or the one defined by `indexed_chars`)
Or
```
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
```
Closes#28942
As we have factored Elasticsearch into smaller libraries, we have ended
up in a situation that some of the dependencies of Elasticsearch are not
available to code that depends on these smaller libraries but not server
Elasticsearch. This is a good thing, this was one of the goals of
separating Elasticsearch into smaller libraries, to shed some of the
dependencies from other components of the system. However, this now
means that simple utility methods from Lucene that we rely on are no
longer available everywhere. This commit copies IOUtils (with some small
formatting changes for our codebase) into the fold so that other
components of the system can rely on these methods where they no longer
depend on Lucene.
With this commit we skip all GeoIpProcessorFactoryTests on Windows.
These tests use a MappedByteBuffer which will keep its file mappings
until it is garbage-collected. As a consequence, the corresponding
file appears to be still in use, Windows cannot delete it and the test
will fail in teardown.
Closes#29001
Windows has some strong limitations on command line arguments,
specially when it's too long. In the googlecloudstoragefixture anttask
the classpath argument is very long and the command fails. This commit
removes the classpath as an argument and uses the CLASSPATH
environment variable instead.
With this commit we reduce heap usage of the ingest-geoip plugin by
memory-mapping the database files. Previously, we have stored these
files gzip-compressed but this has resulted that data are loaded on the
heap.
Closes#28782
This commit adds a GoogleCloudStorageFixture that uses the
logic of a GoogleCloudStorageTestServer (added in #28576)
to emulate a remote Google Cloud Storage service.
By adding this fixture and a more complete integration test, we
should be able to catch more bugs when upgrading the client library.
The fixture is started by the googleCloudStorageFixture task
and a custom Service Account file is created and added to the
Elasticsearch keystore for each test.
This is related to #27260. The transport-nio plugin needs socket
permissions to operate as a transport. This commit gives it these
permissions in the policy file.
This commit is related to #27260. Currently there is a weird
relationship between channel contexts and nio channels. The selectors
use the context for read and writing. But the selector operates directly
on the nio channel for registering, closing, and connecting.
This commit works on improving this relationship. The selector operates
directly on the context which wraps the low level java.nio.channels. The
NioChannel class is simply an API that is used to interact with the
channel (sending messages from outside the selector event loop,
scheduling a close, adding listeners, etc). The context is only used
internally by the channel to implement these apis and by the selector to
perform these operations.
Similarly to what has been done for s3 and azure, this commit removes
the repository settings `application_name` and `connect/read_timeout`
in favor of client settings. It introduce a GoogleCloudStorageClientSettings
class (similar to S3ClientSettings) and a bunch of unit tests for that,
it aligns the documentation to be more coherent with the S3 one, it
documents the connect/read timeouts that were not documented at all and
also adds a new client setting that allows to define a custom endpoint.
This is related to #28662. It wraps the azure repository inputstream in
an inputstream that ensures `read` calls have socket permissions. This
is because the azure inputstream internally makes service calls.
This pull request extracts in a dedicated class the request/response
logic that "emulates" a Google Cloud Storage service in our
repository-gcs tests.
The idea behind this is to make the logic more reusable. The class
MockHttpTransport has been renamed to MockStorage which now
only takes care of instantiating a Storage client and does the low-level
request/response plumbing needed by this client.
The "Google Cloud Storage" logic has been extracted from
MockHttpTransport and put in a new GoogleCloudStorageTestServer
that is now independent from the google client testing framework.
GceDiscoverTests can be simplified in a similar manner than #27945. It
now uses a mocked GceInstancesService that exposes internal test cluster
nodes as if they were real GCE nodes. It should also make the test more
robust by not using a HTTP server anymore.
closes#24313
The TikaImpl#parse method comment sounds like this method is only used
in the same package for testing, but AttachmentProcessor uses it outside
of testing, so we should remove this comment.
Tika parsers need accessDeclaredMembers because ZipFile needs
accessDeclaredMembers on JDK 10. This commit guards adding this
permission to parsers so that the permission is only granted on JDK
10. Additionally, we add an assertion that forces us to check if the
permission is still needed in JDK 11.
Relates #28603
Tests on jdk10 were failing because of a change in its ZipFile implementation
that now needs `accessDeclaredMembers` permissions. This change adds
the missing permission to the plugins security policy and TikaImpl.
Closes#28568
* Move to non-deprecated XContentHelper.createParser(...)
This moves away from one of the now-deprecated XContentHelper.createParser
methods in favor of specifying the deprecation logger at parser creation time.
Relates to #28449
Note that this doesn't move all the `createParser` calls because some of them
use the already-deprecated method that doesn't specify the XContentType.
* Remove the deprecated (and now non-needed) createParser method
This pull request replaces the jvm-example plugin (from the jvm/site plugins era) by two new plugins: a custom-settings that shows how to register and use custom settings (including secured settings) in a plugin, and rest-handler plugin that shows how to register a rest handler.
The two plugins now reside in the plugins/examples project. They can serve as sample plugins for users, a special attention has been put on documentation. The packaging tests have been adapted to use the custom-settings plugin.
This commit is related to #27260. Currently have a channel context that
implements reading and writing logic for socket channels. Additionally,
we have exception contexts to handle exceptions. And accepting contexts
to handle accepted channels. This PR introduces a ChannelContext that
handles close and exception handling for all channel types.
Additionally, it has implementers that provide specific functionality
for socket channels (read and writing). And specific functionality for
server channels (accepting).
This commit adds a gradle plugin to ease development of meta plugins.
Applying the plugin will generated the meta plugin properties based on
the es_meta_plugin configuration object, which includes name and
description. The plugins to include within the meta plugin are
configured through the `plugins` list. An integ test task is also
automatically added.
This commit is related to #27260. Right now we have separate read and
write contexts for implementing specific protocol logic. However, some
protocols require a closer relationship between read and write
operations than is allowed by our current model. An example is HTTP
which might require a write if some problem with request parsing was
encountered.
Additionally, some protocols require close messages to be sent when a
channel is shutdown. This is also problematic in our current model,
where we assume that channels should simply be queued for close and
forgotten.
This commit transitions to a single ChannelContext which implements
all read, write, and close logic for protocols. It is the job of the
context to tell the selector when to close the channel. A channel can
still be manually queued for close with a selector. This is how server
channels are closed for now. And this route allows timeout mechanisms on
normal channel closes to be implemented.
This one is interesting. The third party audit task runs inside the
Gradle JVM. This means that if Gradle is started on JDK 8, the third
party audit tasks will fail as a result of the changes to support
building Elasticsearch with the JDK 9 compiler. This commit reverts the
third party audit changes to support running this task when Gradle is
started with JDK 8.
Relates #28256
This commit modifies the build to require JDK 9 for
compilation. Henceforth, we will compile with a JDK 9 compiler targeting
JDK 8 as the class file format. Optionally, RUNTIME_JAVA_HOME can be set
as the runtime JDK used for running tests. To enable this change, we
separate the meaning of the compiler Java home versus the runtime Java
home. If the runtime Java home is not set (via RUNTIME_JAVA_HOME) then
we fallback to using JAVA_HOME as the runtime Java home. This enables:
- developers only have to set one Java home (JAVA_HOME)
- developers can set an optional Java home (RUNTIME_JAVA_HOME) to test
on the minimum supported runtime
- we can test compiling with JDK 9 running on JDK 8 and compiling with
JDK 9 running on JDK 9 in CI
This commit adds a PainlessExtension which may be plugged in via SPI to
add additional classes, methods and members to the painless whitelist on
a per context basis. An example plugin adding and using a whitelist is
also added.
This commit changes the phonetic filter factory to use a DaitchMokotoffSoundexFilter
instead of a PhoneticFilter with a daitch_mokotoff encoder when daitch_mokotoff is selected.
The latter does not hanlde branching when computing the soundex and fails to encode multiple
variations when possible.
Closes#28211
The method `initiateChannel` on `TcpTransport` is explicit in that
channels can be connect asynchronously. All production implementations
do connect asynchronously. Only the blocking `MockTcpTransport`
connects in a synchronous manner. This avoids testing some of the
blocking code in `TcpTransport` that waits on connections to complete.
Additionally, it requires a more extensive method signature than
required for other transports.
This commit modifies the `MockTcpTransport` to make these connections
asynchronously on a different thread. Additionally, it simplifies that
`initiateChannel` method signature.
* This change makes sure that we don't detect a file path containing a ':' as
a maven coordinate (e.g.: `file:C:\path\to\zip`)
* restore test muted on master
This commit adds the ability to package multiple plugins in a single zip.
The zip file for a meta plugin must contains the following structure:
|____elasticsearch/
| |____ <plugin1> <-- The plugin files for plugin1 (the content of the elastisearch directory)
| |____ <plugin2> <-- The plugin files for plugin2
| |____ meta-plugin-descriptor.properties <-- example contents below
The meta plugin properties descriptor is mandatory and must contain the following properties:
description: simple summary of the meta plugin.
name: the meta plugin name
The installation process installs each plugin in a sub-folder inside the meta plugin directory.
The example above would create the following structure in the plugins directory:
|_____ plugins
| |____ <name_of_the_meta_plugin>
| | |____ meta-plugin-descriptor.properties
| | |____ <plugin1>
| | |____ <plugin2>
If the sub plugins contain a config or a bin directory, they are copied in a sub folder inside the meta plugin config/bin directory.
|_____ config
| |____ <name_of_the_meta_plugin>
| | |____ <plugin1>
| | |____ <plugin2>
|_____ bin
| |____ <name_of_the_meta_plugin>
| | |____ <plugin1>
| | |____ <plugin2>
The sub-plugins are loaded at startup like normal plugins with the same restrictions; they have a separate class loader and a sub-plugin
cannot have the same name than another plugin (or a sub-plugin inside another meta plugin).
It is also not possible to remove a sub-plugin inside a meta plugin, only full removal of the meta plugin is allowed.
Closes#27316
This commit is related to #27260. It moves the TcpChannelFactory into
NioTransport so that consumers do not have to be passed around.
Additionally it deletes an unused read handler.
This is related to #27260. This commit moves the NioTransport from
:test:framework to a new nio-transport plugin. Additionally, supporting
tcp decoding classes are moved to this plugin. Generic byte reading and
writing contexts are moved to the nio library.
Additionally, this commit adds a basic MockNioTransport to
:test:framework that is a TcpTransport implementation for testing that
is driven by nio.
This commit adds the infrastructure to plugin building and loading to
allow one plugin to extend another. That is, one plugin may extend
another by the "parent" plugin allowing itself to be extended through
java SPI. When all plugins extending a plugin are finished loading, the
"parent" plugin has a callback (through the ExtensiblePlugin interface)
allowing it to reload SPI.
This commit also adds an example plugin which uses as-yet implemented
extensibility (adding to the painless whitelist).
This commit changes some Azure tests so that they do not rely on
MockZenPing and TestZenDiscovery anymore, but instead use a mocked
AzureComputeService that exposes internal test cluster nodes as if
they were real Azure nodes.
Related to #27859Closes#27917, #11533
TestZenDiscovery is used to allow discovery based on in memory structures. This isn't a relevant for the cloud providers tests (but isn't a problem at the moment either)
* Fixes ByteSizeValue to serialise correctly
This fix makes a few fixes to ByteSizeValue to make it possible to perform round-trip serialisation:
* Changes wire serialisation to use Zlong methods instead of VLong methods. This is needed because the value `-1` is accepted but previously if `-1` is supplied it cannot be serialised using the wire protocol.
* Limits the supplied size to be no more than Long.MAX_VALUE when converted to bytes. Previously values greater than Long.MAX_VALUE bytes were accepted but would be silently interpreted as Long.MAX_VALUE bytes rather than erroring so the user had no idea the value was not being used the way they had intended. I consider this a bug and so fine to include this bug fix in a minor version but I am open to other points of view.
* Adds a `getStringRep()` method that can be used when serialising the value to JSON. This will print the bytes value if the size is positive, `”0”` if the size is `0` and `”-1”` if the size is `-1`.
* Adds logic to detect fractional values when parsing from a String and emits a deprecation warning in this case.
* Modifies hashCode and equals methods to work with long values rather than doubles so they don’t run into precision problems when dealing with large values. Previous to this change the equals method would not detect small differences in the values (e.g. 1-1000 bytes ranges) if the actual values where very large (e.g. PBs). This was due to the values being in the order of 10^18 but doubles only maintaining a precision of ~10^15.
Closes#27568
* Fix bytes settings default value to not use fractional values
* Fixes test
* Addresses review comments
* Modifies parsing to preserve unit
This should be bwc since in the case that the input is fractional it reverts back to the old method of parsing it to the bytes value.
* Addresses more review comments
* Fixes tests
* Temporarily changes version check to 7.0.0
This will be changed to 6.2 when the fix has been backported