The index prefix field is normally indexed as docs-only, given that it cannot
be used in phrases. However, in the case that the parent field has been indexed
with offsets, or has term-vector offsets, we should also store this in the index
prefix field for highlighting.
Note that this commit does not implement highlighting on prefix fields, but
rather ensures that future work can implement this without a backwards-break
in index data.
Closes#28994
The cd command on Windows has an oddity regarding changing
directories. If the drive of the current directory is a different drive
than than of the directory that was passed to the cd command, cd acts in
query mode and does not change the current directory. Instead, a flag is
needed to put the cd command into set mode so that the directory
actually changes. This causes a problem when starting Elasticsearch from
a directory different than the one where it is installed and this commit
fixes the issue.
The method `PersistentTasksClusterService.finishTask()` has been
modified since it was added and does not use any `removeOncompletion`
flag anymore. Its behavior is now similar to `removeTask()` and can be
replaced by this one. When a non existing task is removed, the cluster
state update task will fail and its `source` will still indicate
`finish persistent task`/`remove persistent task`.
Additionally:
* Included the existing update by query java api docs in java-api docs.
(for some reason it was never included, it needed some tweaking and
then it was good to go)
* moved delete-by-query / update-by-query code samples to java file so
that we can verify that these samples at least compile.
Closes#24203
Today we allow any other method of starting Elastisearch to override
jvm.options via ES_JAVA_OPTS. Yet, for some settings in the Windows
service, we do not allow this. This commit removes this in favor of
being consistent with other packaging choices.
This commit enhances the error messages reported when JAVA_HOME and
RUNTIME_JAVA_HOME are not correctly set to point towards the minimum
compiler and minimum runtime JDKs that are expected by the builds. The
previous error message would say:
Java 1.9 or above is required to build Elasticsearch
which is confusing if the user does have a JDK 9 installation and is
even the version that they have on their path yet they have JAVA_HOME
pointing to another JDK installation. The error message reported after
this change is:
the environment variable JAVA_HOME must be set to a JDK installation directory for Java 1.9 but is [/usr/java/jdk-8] corresponding to [1.8]
* CLI Command: MultiCommand must close subcommands to release resources properly
- Changes are done to override the close method and call close on subcommands using IOUtils#close
- Unit Test
Closes#28953
Currently ESIndexLevelReplicationTestCase executes write operations
without acquiring index shard permit. This may prevent the primary term
on replica from being updated or cause a race between resync and
indexing on primary. This commit ensures that write operations are
always executed under shard permit like the production code.
Changes made in #28972 seems to have changed some assumptions about how
SMILE and CBOR write byte[] values and how this is tested. This changes
the generation of the randomized DocumentField values back to BytesArray
while expecting the JSON and YAML deserialisation to produce Base64
encoded strings and SMILE and CBOR to parse back BytesArray instances.
Closes#29080
I also had to make the test more lenient. This is due to the fact that
Lucene's RamUsageTester was changed in order not to reflect `java.*`
classes and the way that it estimates ram usage of maps is by assuming
it has similar memory usage to an `Object[]` array that stores all keys
and values. The implementation in `LiveVersionMap` tries to be slightly
more realistic by taking the load factor and linked lists into account,
so it usually gives a higher estimate which happens to be closer to
reality.
Closes#22548
Provide more actionable error message when installing an offline plugin
in the plugins directory, and the `plugins` directory for the node
contains plugin distribution.
Closes#27401
When parsing GetResponse it was possible that the equality check failed because
items in the map were in a different order (in the `.equals` implementation).
The Java API documentation for index administration currenty is wrong because
the PutMappingRequestBuilder#setSource(Object... source) an
CreateIndexRequestBuilder#addMapping(String type, Object... source) methods
delegate to methods that check that the input arguments are valid key/value
pairs. This changes the docs so the java api code examples are included from
documentation integration tests so we detect compile and runtime issues earlier.
Closes#28131
By the time the master branch is released the deprecated url
parameters in the `/_cache/clear` API will have been deprecated
for a couple of minor releases. Since master will be the next
major release we are fine with removing these parameters.
Currently we have a fairly complicated logic in the engine constructor logic to deal with all the
various ways we want to mutate the lucene index and translog we're opening.
We can:
1) Create an empty index
2) Use the lucene but create a new translog
3) Use both
4) Force a new history uuid in all cases.
This leads complicated code flows which makes it harder and harder to make sure we cover all the
corner cases. This PR tries to take another approach. Constructing an InternalEngine always opens
things as they are and all needed modifications are done by static methods directly on the
directory, one at a time.
* Decouple XContentBuilder from BytesReference
This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.
While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).
Relates to #28504
When we copied IOUtils into the Elasticsearch codebase from Lucene, we
brought with it its handling of throwables which are out of whack with
how we handle throwables in our codebase. This commit modifies our copy
of IOUtils to be consistent with how we handle throwables today: do not
catch them. We take advantage of this cleanup to simplify IOUtils.
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.
Which allows running:
```
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
```
Then index either:
```
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
```
Which will use the default value (or the one defined by `indexed_chars`)
Or
```
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
```
Closes#28942
If the default java.io.tmpdir is used then the startup script creates
it, but if a custom java.io.tmpdir is used then the user must ensure it
exists before running Elasticsearch. If they forget then it can cause
errors that are hard to understand, so this change adds an explicit
check early in the bootstrap and reports a clear error if java.io.tmpdir
is not an accessible directory.
Eclipse Oxygen doesn't seem to be able to infer the correct type
arguments for Arrays::asList in the given test context. Adding cast to
make this more explicit.
The REST status 503 means "I can not handle the request that you sent
me." However today we respond to a main request with a 503 when there
are certain cluster blocks despite still responding with an actual main
response. This is broken, we should respond with a 200 status. This
commit removes this silliness.
When converting the source for an indexing request to JSON, the
conversion can throw an I/O exception which we swallow and proceed with
logging to the slow log. The cause of the I/O exception is lost. This
commit changes this behavior and chooses to drop the entry from the slow
logs and instead lets an exception percolate up to the indexing
operation listener loop. Here, the exception will be caught and logged
at the warn level.
Today we can end up in a situation where the cluster state contains
unknown or invalid settings. This can happen easily during a rolling
upgrade. For example, consider two nodes that are on a version that
considers the setting foo.bar to be known and valid. Assume one of these
nodes is restarted on a higher version that considers foo.bar to now be
either unknown or invalid, and then the second node is restarted
too. Now, both nodes will be on a version that consider foo.bar to be
unknown or invalid yet this setting will still be contained in the
cluster state. This means that if a cluster settings update is applied
and we validate the settings update with the existing settings then
validation will fail. In such a state, the offending setting can not
even be removed. This commit helps out with this situation by archiving
any settings that are unknown or invalid at the time that a settings
update is applied. This allows the setting update to go through, and the
archived settings can be removed at a later time.
This commit adds a JVM flag to ensure that the JVM fatal error logs land
in the default log directory. Users that wish to use an alternative
location should change the path configured here.
These can be seen at the debug level via cluster state update logging
but really they should be more visible like index creation and
deletion. This commit adds info-level logging for template puts and
deletes.
This interning is completely unnecessary because we look up the marker
by the prefix (value, not identity) anyway. This means that regardless
of the identity of the prefix, we end up with the same marker. That is
all that we really care about here.