* SQL: Add option to provide the delimiter for the CSV format (#59907)
* Add option to provide the delimiter to the CSV fmt
This adds the option to provide the desired character as the separator
for the CSV format (the default remains comma).
A set of characters are excluded though - like CR, LF, `"` - to avoid
slipping onto the CSV-dialects slope. The tab is also forbidden, the
user needs to choose the "tsv" format explicitely.
Update the doc to make it clear that the textual CSV, TSV and TXT
formats pass the cursor back to the user through the Cursor HTTP header.
(cherry picked from commit 3a8b00cc7480f7ada57fcea3cbac957facac08fc)
* Java8 fixes
- replace Set#of();
- URLDecoder#decode() requires a string (vs a charset) as 2nd arg.
Changes:
* Adds the `number_of_routing_shards` index setting to index modules docs.
* Updates the split API docs to mention that `number_of_routing_shards`
is a static setting.
Today there are a few places in the transport layer docs where we talk
about communication between nodes _within a cluster_. We also use the
transport layer for remote cluster connections, and these statements
also apply there, but this is not clear from today's docs. This commit
generalises these statements to make it clear that they apply to remote
cluster connections too.
It also adds a link from the docs on configuring TCP retries to the
(deeply-buried) docs on preserving long-lived connections.
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.
Addresses #49028 and #55363.
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.
This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
* Adds table with icons for simplicity.
* Updating table for clarity.
* Changing table formatting and incorporating more feedback.
* Changing table alignment.
Keepalive options are not well-documented (only in transport section, although also available at http and network level).
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.
This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
Moves the search sort docs from the deprecated 'Request Body Search'
page to a new subpage of 'Run a search'.
No substantive changes were made to the content.
This PR contains the deprecation notice that `create`, `create_doc`, `index` and
`write` ingest privileges do not permit mapping updates in version 8. It also
updates the docs description of said privileges.
This should've been part of #58784
This improves modularity and also fixes some issues when `docvalues_fields` is
used within `inner_hits` or the `top_hits` agg:
* We previously didn't resolve wildcards in field names.
* We also forgot to enforce the limit `index.max_docvalue_fields_search`.
This page previously documented `xpack.sql.enabled`.
However, in 7.8 and above, `xpack.sql.enabled` is always enabled and
the setting has no effect. There is no reason to maintain this page.
* Adding new page for restore snapshot API.
* Improving test cases, lots of edits, and streamlining content.
* Incorporating review suggestions and feedback.
* Specify `index alias` vs `alias`
* Change parameter order
* Provide clarity around regular expression
* Add link to SLM parameters
* Split sentences in example
* Adding link to master node page.
Adds a new `my-index-00001` REST test for docs snippets.
This test can serve as a lightweight replacement for
our existing `twitter` REST tests.
The new dataset is:
* Based on Apache logs, which is better aligned with Elastic use cases
* Compliant with ECS
* Similar to the existing `twitter` data set, containing the same field data types
* Lightweight, which should keep existing test runtimes roughly the same
Also updates the search API reference docs to use the new test.
This commit allows customizing the word delimiter token filters to skip processing
tokens tagged as keyword through the `ignore_keywords` flag Lucene's
WordDelimiterGraphFilter already exposes.
Fix for #59491
Before it was missing from the list. This PR also renames the 'geo data types'
section to 'spatial data types' and consolidates the geo and cartesian types
into that section.
The clock resolution for this API is our default 200ms. It is unlikely but
possible that a shard snapshot starts and ends on separate clock ticks and that breaks the test.
Just allowing any value here seems fine to me (seems we can't match for integer specifically).
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.
This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
Corrects the `requests_per_second` query parameter used in the reindex,
delete by query, and update by query API docs.
The parameter defaults to `-1` (no throttle). `0` is not an allowed value.
This cleans up a few rough edged in the `variable_width_histogram`,
mostly found by @wwang500:
1. Setting its tuning parameters in an unexpected order could cause the
request to fail.
2. We checked that the maximum number of buckets was both less than
50000 and MAX_BUCKETS. This drops the 50000.
3. Fixes a divide by 0 that can occur of the `shard_size` is 1.
4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows
a signed int.
5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less
than `buckets` we will very consistently return fewer buckets than
requested. For the most part we expect folks to leave it at the
default. If they change it, we expect it to be much bigger than
`buckets`.
6. Allocate a smaller `mergeMap` in when initially bucketing requests
that don't use the entire `shard_size * 3 / 4`. Its just a waste.
7. Default `shard_size` to `10 * buckets` rather than `100`. It *looks*
like that was our intention the whole time. And it feels like it'd
keep the algorithm humming along more smoothly.
8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like
we've documented it rather than `5000`. Like the point above, this
feels like the right thing to do to keep the algorithm happy.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Updating snapshot/restore pages to align with API changes (#59730)
* Updating snapshot/restore pages to align with API changes.
* Fixing texts in delete snapshot page.
* Removing duplicate code sample and making editorial changes.
* Change "deleted" to "delete"
* Incorporating review feedback and making minor editorial changes.
* Remove titleabbrev
* Add paragraph break
* Remove titleabbrev from restore page
* Remove titleabbrev from create page
* Change "Create" to lowercase
* Change API names to lowercase
* Remove extraneous delimiters
* Change "Delete" to lowercase
* Single-sourcing warning and clarifying warning text.
* Fixing tests and removing erroneous example.
Introduce a fix to tests by snapshotting a single index+shard in the snapshot that
we get the status for and verifying consistency instead of equality
for total file counts.
Co-authored-by: Armin Braun <me@obrown.io>
Moves the highlighting docs from the deprecated 'Request Body Search'
chapter to the new subpage of the 'Run a search chapter' section.
No substantive changes were made to the content.
* Adding new `require_alias` option to indexing requests (#58917)
This commit adds the `require_alias` flag to requests that create new documents.
This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.
When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.
This is useful when an alias is required instead of a concrete index.
closes https://github.com/elastic/elasticsearch/issues/55267
* Add doc runtime class path
* Use getAllHttpSocketURI.get(0) instead of getAllHttpSocketURI to get a single
test cluster URL rather than a list
Backport: 3057e0f
Implement DATE_PARSE(<date_str>, <pattern_str>) function
which allows to parse a date string according to the specified
pattern into a date object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Closes#54962
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com>
(cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)
* Adding get snapshot status API docs.
* Adding more fields and a link to the new page.
* Adding missing spaces in TESTRESPONSES
* Adding more parameters and making some edits.
* Marking snapshot as optional
* Marking repository as optional
* Add data type for stats
* Add data type for shard_stats
* Incorporating review feedback.
* Lots of review feedback incorporated.
* Fixing tests to unbreak CI builds.
* Changing indices to index.
* [ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.
`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.
Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).
This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.