Commit Graph

183 Commits

Author SHA1 Message Date
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Jason Tedor bdfcc326d7
Enable avoiding mmap bootstrap check (#32421)
The maximum map count boostrap check can be a hindrance to users that do
not own the underlying platform on which they are executing
Elasticsearch. This is because addressing it requires tuning the kernel
and a platform provider might now allow this, especially on shared
infrastructure. However, this bootstrap check is not needed if mmapfs is
not in use. Today we do not have a way for the user to communicate that
they are not going to use mmapfs. This commit therefore adds a setting
that enables the user to disallow mmapfs. When mmapfs is disallowed, the
maximum map count bootstrap check is not enforced. Additionally, we
fallback to a different default index store and prevent the explicit use
of mmapfs for an index.
2018-08-21 11:02:25 -04:00
Nik Everett 22459576d7
Logging: Make node name consistent in logger (#31588)
First, some background: we have 15 different methods to get a logger in
Elasticsearch but they can be broken down into three broad categories
based on what information is provided when building the logger.

Just a class like:
```
private static final Logger logger = ESLoggerFactory.getLogger(ActionModule.class);
```
or:
```
protected final Logger logger = Loggers.getLogger(getClass());
```

The class and settings:
```
this.logger = Loggers.getLogger(getClass(), settings);
```

Or more information like:
```
Loggers.getLogger("index.store.deletes", settings, shardId)
```

The goal of the "class and settings" variant is to attach the node name
to the logger. Because we don't always have the settings available, we
often use the "just a class" variant and get loggers without node names
attached. There isn't any real consistency here. Some loggers get the
node name because it is convenient and some do not.

This change makes the node name available to all loggers all the time.
Almost. There are some caveats are testing that I'll get to. But in
*production* code the node name is node available to all loggers. This
means we can stop using the "class and settings" variants to fetch
loggers which was the real goal here, but a pleasant side effect is that
the ndoe name is now consitent on every log line and optional by editing
the logging pattern. This is all powered by setting the node name
statically on a logging formatter very early in initialization.

Now to tests: tests can't set the node name statically because
subclasses of `ESIntegTestCase` run many nodes in the same jvm, even in
the same class loader. Also, lots of tests don't run with a real node so
they don't *have* a node name at all. To support multiple nodes in the
same JVM tests suss out the node name from the thread name which works
surprisingly well and easy to test in a nice way. For those threads
that are not part of an `ESIntegTestCase` node we stick whatever useful
information we can get form the thread name in the place of the node
name. This allows us to keep the logger format consistent.
2018-07-31 10:54:24 -04:00
Jason Tedor 588db621ac
Remove reference to non-existent store type (#32418)
We removed the default_fs store type yet the docs still contain a
reference to them. This commit addresses that by removing this
reference, and changing a reference to this section of the docs to
instead refer to mmapfs.
2018-07-27 11:24:03 -04:00
Adrien Grand f5073813ef
Docs: Clarify constraints on scripted similarities. (#31076)
Scripted similarities provide a lot of flexibility but they still need to obey
some rules to not confuse Lucene.
2018-06-05 08:51:00 +02:00
srini-raman 0592b685b9 [Docs] Improve section detailing translog usage (#30573) 2018-05-15 10:43:57 +02:00
Sue Gallagher dd666599f7
[DOCS] Added 'on a single shard' to description of max_thread_count. Closes 28518 (#29686) 2018-04-27 09:29:27 -07:00
Adrien Grand 569d0c0e89
Improve similarity integration. (#29187)
This improves the way similarities are plugged in in order to:
 - reject the classic similarity on 7.x indices and emit a deprecation
   warning otherwise
 - reject unkwown parameters on 7.x indices and emit a deprecation
   warning otherwise

Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.

Closes #23208
Closes #29035
2018-04-03 16:45:25 +02:00
Adrien Grand 1d6ed824c7
Improve similarity docs. (#29089)
This adds links to the relevant Lucene javadocs and warnings regarding
similarities that might return 0 as a score.

Close #29015
2018-03-21 10:41:10 +01:00
Jason Tedor bddf9df8b4
Add search slowlog level to docs (#29040)
This commit adds an indication how to set the search slowlog level to
the docs.
2018-03-13 18:27:14 -04:00
David Turner 0a4a4c8a0e
Minor improvements to translog docs (#28237)
The use of the phrase "translog" vs "transaction log" was inconsistent, and
it was apparently unclear that the translog was stored on every shard copy.
2018-01-19 10:17:22 +00:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. (#27816)
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
2017-12-14 17:47:53 +01:00
Christoph Büscher 0d11b9fe34
[Docs] Unify spelling of Elasticsearch (#27567)
Removes occurences of "elasticsearch" or "ElasticSearch" in favour of
"Elasticsearch" where appropriate.
2017-11-29 09:44:25 +01:00
Alexander Reelsen 66b5a43d0e
Logging: Unify log rotation for index/search slow log (#27298)
The existing log rotation configuration allowed the index
and search slow log to grow unbounded. This commit removes the
date based rotation and adds the same size based rotation, that
the depreciation log already has.
2017-11-15 10:01:32 +01:00
agent5566 93a47cf860 Fix a typo in the similarity docs (#26970) 2017-10-12 09:29:25 -07:00
Jim Ferenczi 74473c1c3d Early termination with index sorting should not set terminated_early in the response (#26597)
Early termination with index sorting always return the best top N in the response but set the flag `terminated_early`
in the response. This can be confusing because we use the same flag for `terminate_after` which on the contrary returns partial results.
This change removes the flag when results are not partial (early termination due to index sorting) and keeps it only when `terminate_after` is used.

Closes #26408
2017-09-26 11:37:11 +02:00
Tanguy Leroux db54c4dc7c [Docs] Convert more doc snippets (#26404)
This commit converts some remaining doc snippets so that they are now
testable.
2017-08-30 09:30:36 +02:00
Adrien Grand f0cba4fce5 Add a scripted similarity. (#25831)
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
2017-08-08 08:55:12 +02:00
Adrien Grand 1b34f691e5 Remove reference to 32-bit systems. (#25971)
They are not supported anymore as of #25435.
2017-07-31 09:55:09 +02:00
Clinton Gormley ff4a2519f2 Update experimental labels in the docs (#25727)
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Boaz Leskes d963882053 Enable a long translog retention policy by default (#25294)
#25147  added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours.

In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations.

Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number.

Relates to #10708
2017-06-22 17:08:14 +02:00
Deb Adair dbe2de0891 [DOCS] Fixed callout reference error. 2017-06-08 16:47:13 -07:00
Jim Ferenczi 36a5cf8f35 Automatically early terminate search query based on index sorting (#24864)
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called `track_total_hits`.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information and 
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and `track_total_hits` is false.

Relates #6720
2017-06-08 12:10:46 +02:00
Adrien Grand bbdf50f6bd Docs: More search speed advices. (#24802) 2017-06-01 17:23:22 +02:00
Jim Ferenczi f05af0a382 Enable index-time sorting (#24055)
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.

Relates #6720
2017-04-19 14:36:11 +02:00
Adrien Grand 4632661bc7 Upgrade to a Lucene 7 snapshot (#24089)
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.

Some notes about the change:
 - it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
 - it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
 - two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
2017-04-18 15:17:21 +02:00
Igor Motov 93b5e55660 Restores the original default format of search slow log
In 5.0, the search slow log switched to the multi-line format with no option to get back to the origin single-line format that was used prior to 5.0 by default. This commit removes the reformat option from the search slow log and returns the search slow log back to the single-line format.

Closes #21711
2016-12-09 12:38:28 -05:00
Loek van Gool 1a23739211 Update store.asciidoc (#21353)
* Update store.asciidoc

* Update store.asciidoc

* Update store.asciidoc
2016-11-05 14:58:16 +01:00
Adriel Dean-Hall b72a708c0d Add docs with up to date instructions on updating default similarity (#21242)
* Add docs with up to date instructions on updating default similarity

The default similarity can no longer be set in the configuration file
(you will get an error on startup). Update the docs with the method
that works.

* Add instructions for changing similarity on index creation
2016-11-01 16:14:20 -04:00
Jason Tedor 96aa5e33ce Fix slowlog docs
This commit fixes two issues with the slow log docs:
 - clarifies that these settings are per index
 - updates index slow log configuration for Log4j 2

Relates #20976
2016-10-17 10:50:32 -04:00
Pascal Borreli fcb01deb34 Fixed typos (#20843) 2016-10-10 14:51:47 -06:00
Jason Tedor 750033dc4b Update docs for Log4j 2
This commit updates the logging docs for Elasticsearch to reflect the
migration to Log4j 2.
2016-08-31 15:51:52 -04:00
Lee Hinman 0ade5a207d Add documentation for the 'elasticsearch-translog' tool
This adds documentation to the translog page for the CLI truncation
tool.
2016-08-02 16:26:28 -06:00
Sakthipriyan Vairamani 8d5a5e500a file is -> file name (#18994) 2016-06-21 13:20:56 +02:00
Jim Ferenczi 423291b6bc Change default similarity to BM25
The default similarity was set to `classic` which refers to TFIDF and has not been moved after the upgrade to Lucene 6.

Though moving to BM25 could have some downside for queries that relies on coordination factor (match_query, multi_match_query) ?

relates #18944
2016-06-21 11:29:36 +02:00
Adrien Grand 93415d4506 Expose MMapDirectory.preLoad(). #18880
The MMapDirectory has a switch that allows the content of files to be loaded
into the filesystem cache upon opening. This commit exposes it with the new
`index.store.pre_load` setting.
2016-06-20 13:42:56 +02:00
eratio08 26aacfff72 default values for BM25 Similarity (#18778)
assuming elasticsearch uses the lucene default values
2016-06-13 18:57:44 +02:00
trangvh c0da8e4060 Fix some typos (#18746)
* Update java-doc of SearchResponse.getProfileResults()

* Fix a trivial typo in Reference document
2016-06-07 16:41:39 +02:00
Nik Everett 72eb621bce Docs: Replace [source,json] with [source,js]
The syntax highlighter only supports [source,js].

Also adds a check to the rest test generator that runs during
the build that'll fail the build if it sees `[source,json]`.
2016-05-24 11:17:27 -04:00
eratio08 7e00a1c1a3 Added Type name for DFI (#18480) 2016-05-20 11:02:06 +02:00
Jason Tedor c257e2c51f Remove settings and system properties entanglement
Today when parsing settings during bootstrap, we add a system property
for every Elasticsearch setting. Additionally, settings can be set via
system properties. This commit simplifies this situation.
 - settings are no longer propogated to system properties
 - system properties can not be used to set settings
 - the "es." prefix on settings is no longer required (nor permitted)
 - test logging has a dedicated system property (tests.logger.level)

Relates #18198
2016-05-19 14:08:08 -04:00
Clinton Gormley 3f594089c2 Renamed all AUTOSENSE snippets to CONSOLE (#18210) 2016-05-09 15:42:23 +02:00
Nik Everett 4b1c116461 Generate and run tests from the docs
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.

By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.

Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.

See docs/README.asciidoc for lots more.

Closes #12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
2016-05-05 13:58:03 -04:00
Adrien Grand 51a53c55cb Update store documentation after #17616. 2016-05-04 08:53:11 +02:00
Simon Willnauer 2514681f66 updateing filtering.asciidoc to also use 'node.attr' namespace 2016-03-30 14:11:59 +02:00
Adrien Grand b42f66c8ac Document 5.0 mapping changes. 2016-03-22 16:22:58 +01:00
Jason Tedor 8a05c2a2be Bootstrap does not set system properties
Today, certain bootstrap properties are set and read via system
properties. This action-at-distance way of managing these properties is
rather confusing, and completely unnecessary. But another problem exists
with setting these as system properties. Namely, these system properties
are interpreted as Elasticsearch settings, not all of which are
registered. This leads to Elasticsearch failing to startup if any of
these special properties are set. Instead, these properties should be
kept as local as possible, and passed around as method parameters where
needed. This eliminates the action-at-distance way of handling these
properties, and eliminates the need to register these non-setting
properties. This commit does exactly that.

Additionally, today we use the "-D" command line flag to set the
properties, but this is confusing because "-D" is a special flag to the
JVM for setting system properties. This creates confusion because some
"-D" properties should be passed via arguments to the JVM (so via
ES_JAVA_OPTS), and some should be passed as arguments to
Elasticsearch. This commit changes the "-D" flag for Elasticsearch
settings to "-E".
2016-03-13 20:09:15 -04:00
Clinton Gormley 6d7e8814d6 Redocument the `index.merge.scheduler.max_thread_count` setting
Closes #16961
2016-03-05 16:28:43 +01:00
Boaz Leskes 4a7980f96c Merge pull request #16766 from rstruber/patch-1
fix grammar in Total Shards Per Node docs
2016-02-22 08:42:42 -08:00
Dongjoon Hyun 21ea552070 Fix typos in docs. 2016-02-09 02:07:32 -08:00