This adds the security `_authenticate` API to the HLREST client.
It is unlike some of the other APIs because the request does not
have a body.
The commit also creates the `User` entity. It is important
to note that the `User` entity does not have the `enabled`
flag. The `enabled` flag is part of the response, alongside
the `User` entity.
Moreover this adds the `SecurityIT` test class
(extending `ESRestHighLevelClientTestCase`).
Relates #29827
This PR renames the CRUD APIS for ILM
GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy
PUT _ilm/<policy> -> _ilm/policy/<policy>
DELETE _ilm/<policy> -> _ilm/policy/<policy>
closes#34929.
The `random_score` function produces values between 0 (inclusive) and 1
(exclusive) and documented it with fancy methematical range notation. It
is so fancy I thought it was a typo. This changes the documentation to
use words.
Relates to #35084
This changes the RollupSearch endpoint to proactively resolve index
patterns. If the index pattern(s) match more than one rollup index,
an exception is throw as before. But if the pattern only matches one
rollup index, execution is allowed to continue (unlike before where
it would assume all patterns were for raw data).
This also allows the search endpoint to resolve aliases that point to
a rollup index.
Also tweaks the documentation to make this clear.
Closes#34828
* Remove a tip about ignore_above that only makes sense with multiple types.
* Remove a line from the percolator documentation that refers to multiple types.
This commit adds a new single value metric aggregation that calculates
the statistic called median absolute deviation, which is a measure of
variability that works on more types of data than standard deviation
Our calculation of MAD is approximated using t-digests. In the collect
phase, we collect each value visited into a t-digest. In the reduce
phase, we merge all value t-digests, then create a t-digest of
deviations using the first t-digest's median and centroids
Bulk Request in High level rest client should be consistent with what is
possible in Rest API, therefore should support global parameters. Global
parameters are passed in URL in Rest API.
Some parameters are mandatory - index, type - and would fail validation
if not provided before before the bulk is executed.
Optional parameters - routing, pipeline.
The usage of these should be consistent across sync/async execution,
bulk processor and BulkRequestBuilder
closes#26026
When combine_script and reduce_script were made into required
parameters for Scripted Metric aggregations in #33452, the docs were
not updated to reflect that. This marks those parameters as required
in the documentation.
Deprecates `_source_include` and `_source_exclude` url parameters
in favor of `_source_inclues` and `_source_excludes` because those
are consistent with the rest of Elasticsearch's APIs.
Relates to #22792
This further applies the pattern set in #34125 to reduce copy-and-paste
in the multi-document CRUD portion of the High Level REST Client docs.
It also adds line wraps to snippets that are too wide to fit into the box
when rendered in the docs, following up on the work started in #34163.
This commit fixes two issues with the CCR API specification:
- remove the CCR stats endpoint, it is not currently implemented
- fix the documentation links
The file structure finder endpoint can find the NDJSON
(newline-delimited JSON) file format, but called it
`json`. This change renames the `format` for this file
structure to `ndjson`, which is more precise and will
hopefully avoid confusion.
* Changed the auto follow stats to also include follow stats.
* Renamed the auto follow stats api to stats api and changed its url path
from `/_ccr/auto_follow/stats` `/_ccr/stats`.
* Removed `/_ccr/stats` url path for the follow stats api, which makes
the index parameter a required parameter.
* Fixed docs.
Adds support for the Clear Roles Cache API to the High Level Rest
Client. As part of this a helper class, NodesResponseHeader, has been
added that enables parsing the nodes header from responses that are
node requests.
Relates to #29827
This commit is our first introduction to cross-cluster replication
docs. In this commit, we introduce the cross-cluster replication API
docs. We also add skelton docs for additional content that will be added
in a series of follow-up commits.
Documents the new structured logfile format for auditing
that was introduced by #31931. Most changes herein
are for 6.x . In 7.0 the deprecated format is gone and a
follow-up PR is in order.
This change adds a section about the global search setting
`indices.query.bool.max_clause_count` that limits the number of boolean clauses
allowed in a Lucene BooleanQuery.
Closes#19858
These were disabled because ingest attachement was not playing well with
JDK 11. We have addressed that by upgrading the Tika dependency, yet
forgot to reenable these. This commit enables these tests again.
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
We should delete a job by directly talking to the allocated
task and telling it to shutdown. Today we shut down a job
via the persistent task framework. This is not ideal because,
while the job has been removed from the persistent task
CS, the allocated task continues to live until it gets the
shutdown message.
This means a user can delete a job, immediately delete
the rollup index, and then see new documents appear in
the just-deleted index. This happens because the indexer
in the allocated task is still running and indexes a few
more documents before getting the shutdown command.
In this PR, the transport action is changed to a TransportTasksAction,
and we invoke onCancelled() directly on the matching job.
The race condition still exists after this PR (albeit less likely),
but this was a precursor to fixing the issue and a self-contained
chunk of code. A second PR will followup to fix the race itself.
Extend querying support on multiple indices from being strictly
identical to being just compatible.
Use FieldCapabilities API (extended through #33803) for mapping merging.
Close#31837#31611
Implement the functionality to translate the
`field IN (value1, value2,...)` expressions to proper Lucene queries
or painless script or local processors depending on the use case.
The `IN` expression can be used in SELECT, WHERE and HAVING clauses.
Closes: #32955
`CONVERT` works exactly like cast with slightly different syntax:
`CONVERT(<value>, <data_type)` as opposed to `CAST(<value> AS <data_type>)`
Moreover it support format of the MS-SQL data types `SQL_<type>`,
e.g.: `SQL_INTEGER`
Closes: #34513
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
This commit switches to using a trial license in the docs tests that run
on the default distribution. This is needed so that docs tests can be
executed against non-basic features.
With remote clusters taking on a larger role, we have make the
infrastructure more generic than being tied to cross-cluster search
(CCS). We want to refer to the remote clusters configuration in the
cross-cluster replication (CCR) docs. Yet, these docs are still tied to
CCS. This commit extracts the remote clusters docs from CCS (with some
wording changes to make them more general) so that we can refer to them
in the CCR docs.
When a envelope that crosses the dateline is specified as a part of
geo_shape query is parsed it shouldn't have its left and right points
flipped.
Fixes#34418
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Generalize NULLS LAST/FIRST
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place (introduce schema:: prefix for declaration)
upfront.
Fix#32079
The `term` and `phrase` suggesters have different options to filter candidates
based on their frequencies. The `popular` mode for instance filters candidate
terms that occur in less docs than the original term. However when we compute this threshold
we use the total term frequency of a term instead of the document frequency. This is not inline
with the actual filtering which is always based on the document frequency. This change fixes
this discrepancy and clarifies the meaning of the different frequencies in use in the suggesters.
It also ensures that the threshold doesn't overflow the maximum allowed value (Integer.MAX_VALUE).
Closes#34282
Add example for selectively clearing just the request, query or fielddata cache
and for selectively clearing the cache for specific fields.
Closes#34287
* Adding new xpack.ml.max_lazy_ml_nodes setting to docs
* Fixing docs, making it clearer what the setting does
* Adding note about external process need
We'd disabled them because we didn't have a way to clean up after each
test. I implemented #34342 which adds the clean ups so now we can
re-enable the tests.
In the `setup` sections we have to use `raw` requests instead of
`x-pack` requests because we don't have the json config for x-pack.
Closes#33319
This commit moves the definition of domainSplit into java and exposes it
as a painless whitelist extension. The method also no longer needs
params, and version which ignores params is added and deprecated.
Introduces client-specific request and response classes that do not
depend on the server
The `type` parameter is named `licenseType` in the response class to be
more descriptive. The parts that make up the acknowledged-required
response are given slightly different names than their server-response
types to be consistent with the naming in the put license API
Tests do not cover all cases because the integ test cluster starts up
with a trial license - this will be addressed in a future commit
Tweak the upgrade instructions for moving from pre-6.3-with-x-pack to
post-6.3-default distribution. Specifically, you have to remove the
x-pack plugin before upgrading because 6.4 doesn't understand how to
remove it.
Relates to #34307
* Replace deprecated field `code` with `source` for stored scripts (#25127)
* Replace examples using the deprecated endpoint `{index}/{type}/_search`
with `{index}/_search` (#29468)
* Use a system property to avoid deprecation warnings after the Update
Scripts have been moved to their own context (#32096)
This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so
it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts
in 6x in order to ensure that users are aware when they'll upgrade to ES 7.
Relates #33309
We added support for role mapper expression DSL in #33745,
that allows us to build the role mapper expression used in the
role mapping (as rules for determining user roles based on what
the boolean expression resolves to).
This change now adds support for create/update role mapping
API to the high-level rest client.
The ingest pipeline that is produced is very simple. It
contains a grok processor if the format is semi-structured
text, a date processor if the format contains a timestamp,
and a remove processor if required to remove the interim
timestamp field parsed out of semi-structured text.
Eventually the UI should offer the option to customize the
pipeline with additional processors to perform other data
preparation steps before ingesting data to an index.
* New OCTET_LENGTH function
* Changed the way the FunctionRegistry stores functions, considering the alphabetic ordering by name
* Added documentation for the RANDOM function
* HLRC: ML Add preview datafeed api
* Changing deprecation handling for parser
* Removing some duplication in docs, will address other APIs in another PR
* HLRC: ML Cleanup docs
* updating get datafeed stats docs
This further applies the pattern set in #34125 to reduce copy-and-paste
in the single document CRUD portion of the High Level REST Client docs.
It also adds line wraps to snippets that are too wide to fit into the box
when rendered in the docs, following up on the work started in #34163.
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
The `status` part of the tasks API reflects the internal status of a
running task. In general, we do not make backwards breaking changes to
the `status` but because it is internal we reserve the right to do so. I
suspect we will very rarely excercise that right but it is important
that we have it so we're not boxed into any particular implementation
for a request.
In some sense this is policy making by documentation change. In another
it is clarification of the way we've always thought of this field.
I also reflect the documentation change into the Javadoc in a few
places. There I acknowledge Kibana's "special relationship" with
Elasticsearch. Kibana parses `_reindex`'s `status` field and, because
we're friends with those folks, we should talk to them before we make
backwards breaking changes to it. We *want* to be friends with everyone
but there is only so much time in the day and we don't *want* to make
backwards breaking fields to `status` at all anyway. So we hope that
breaking changes documentation should be enough for other folks.
Relates to #34245.
* HLRC: ML Add preview datafeed api
* Changing deprecation handling for parser
* Removing some duplication in docs, will address other APIs in another PR
We generate tests from our documentation, including assertions about the
responses returned by a particular API. But sometimes we *can't* assert
that the response is correct because of some defficiency in our tooling.
Previously we marked the response `// NOTCONSOLE` to skip it, but this
is kind of odd because `// NOTCONSOLE` is really to mark snippets that
are json but aren't requests or responses. This introduces a new
construct to skip response assertions:
```
// TESTRESPONSE[skip:reason we skipped this]
```
This enables Elasticsearch to use the JVM-wide configured
PKCS#11 token as a keystore or a truststore for its TLS configuration.
The JVM is assumed to be configured accordingly with the appropriate
Security Provider implementation that supports PKCS#11 tokens.
For the PKCS#11 token to be used as a keystore or a truststore for an
SSLConfiguration, the .keystore.type or .truststore.type must be
explicitly set to pkcs11 in the configuration.
The fact that the PKCS#11 token configuration is JVM wide implies that
there is only one available keystore and truststore that can be used by TLS
configurations in Elasticsearch.
The PIN for the PKCS#11 token can be set as a truststore parameter in
Elasticsearch or as a JVM parameter ( -Djavax.net.ssl.trustStorePassword).
The basic goal of enabling PKCS#11 token support is to allow PKCS#11-NSS in
FIPS mode to be used as a FIPS 140-2 enabled Security Provider.
* Make text message not required in constructor for slack
* Remove unnecessary comments in test file
* Throw exception when reduce or combine is not provided; update tests
* Update integration tests for scripted metrics to always include reduce and combine
* Remove some old changes from previous branches
* Rearrange script presence checks to be earlier in build
* Change null check order in script builder for aggregated metrics; correct test scripts in IT
* Add breaking change details to PR
This change adds throttling to the update-by-query and delete-by-query cases
similar to throttling for reindex. This mostly means additional methods on the
client class itself, since the request hits the same RestHandler, just with
slightly different endpoints, and also the return values are similar.
As user-defined cluster metadata is accessible to anyone with access to
get the cluster settings, stored in the logs, and likely to be tracked
by monitoring solutions, it is useful to clarify in the documentation
that it should not be used to store secret information.
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
Previously, parsing an arithmetic expression with `*` and no spaces,
e.g.: `2*i` threw a parsing exception as the grammar rule for
tableIdentifier was clashing with the rule for arithmetic operator `*`.
This issue comes already in the lexer and the left part of the
expression (in our example `2*`) was recognised as a
TABLE_IDENTIFIER token.
The solution adopted is to allow the `*` wildcard in the table name
only if it's surrounded with double quotes, e.g.: `"my*index"`
Closes: #33957
We use wrap code in `// tag` and `//end` to include it in our docs. Our
current docs style wraps code snippets in a box that is only wide enough
for 76 characters and adds a horizontal scroll bar for wider snippets
which makes the snippet much harder to read. This adds a checkstyle check
that looks for java code that is included in the docs and is wider than
that 76 characters so all snippets fit into the box. It solves many of
the failures that this catches but suppresses many more. I will clean
those up in a follow up change.
This change fixes a potential deadlock problem in the unit
test introduced in #34117.
It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0
Relates to #31389
This can be used to restrict the amount of CPU a single
structure finder request can use.
The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.
The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
Previously the timestamp_formats field in the response
from the find_file_structure endpoint contained Joda
timestamp formats. This change makes that clear by
renaming the field to joda_timestamp_formats, and also
adds a java_timestamp_formats field containing the
equivalent Java time format strings.
With this commit we remove a leftover in the docs about the `format`
field being updatable. This is not true since we removed support for
updates in #25285.
Closes#33986
Relates #25285
Relates #34006
* Changed the format of the String functions documentation page.
* Adopted the same format for Math functions, but completely changed the examples.
* Added missing documentation for Math functions.
Previously numeric values in the field_stats created by the
find_file_structure endpoint were always output with a
decimal point. This looked unfriendly and unnatural for
fields that clearly store integer values. This change
converts integer values to type Integer before output in
the file structure field stats.
* Added TRUNCATE function, modified ROUND to accept two parameters instead of one. Made the second parameter optional for both functions.
* Added documentation for both functions.
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to
know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This
special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional
filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they
appear before them in a chain, because they produce multiple tokens at the same position.
This commit adds two methods to the TokenFilterFactory interface.
* `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding
filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and
`CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`.
* `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym
list `Analyzer`. By default it returns `true`.
Fixes#33609
The documentation currently tells users to use `doc['event_date'].value.getMillis` to access
milliseconds in a date. It turns out the way it works is `doc['event_date'].value.millis`. This
change corrects this and gives a hint at how other date related methods work.
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
The original statement "Runs a match_phrase query on each field and combines the _score from each field." for the phrase type is a but misleading. The phrase type behaves like the best_fields type and does not combine the scores of each fields.
In #33241 we moved the file-based discovery functionality to core
Elasticsearch, but preserved the `discovery-file` plugin, and support for the
existing location of the `unicast_hosts.txt` file, for BWC reasons. This commit
completes the removal of this plugin.
New plugin for annotated_text field type.
Largely a copy of `text` field type but adds ability to include markdown-like syntax in the text.
The “AnnotatedText” class parses text+markup and converts into plain text and AnnotationTokens.
The annotation token values are injected unchanged alongside the regular text tokens to provide a
form of additional indexed overlay useful in positional searches and highlighting.
Annotated_text fields do not support fielddata as we want to phase this out.
Also includes a new "annotated" highlighter type that retains annotations and merges in search
hits as additional annotation markup.
Closes#29467
* Implement xpack.monitoring.elasticsearch.collection.enabled setting
* Fixing line lengths
* Updating constructor calls in test
* Removing unused import
* Fixing line lengths in test classes
* Make monitoringService.isElasticsearchCollectionEnabled() return true for tests
* Remove wrong expectation
* Adding unit tests for new flag to be false
* Fixing line wrapping/indentation for better readability
* Adding docs
* Fixing logic in ClusterStatsCollector::shouldCollect
* Rebasing with master and resolving conflicts
* Simplifying implementation by gating scheduling
* Doc fixes / improvements
* Making methods package private
* Fixing wording
* Fixing method access