OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	4fa8f6f61f	Doc values integration. This commit allows for using Lucene doc values as a backend for field data, moving the cost of building field data from the refresh operation to indexing. In addition, Lucene doc values can be stored on disk (partially, or even entirely), so that memory management is done at the operating system level (file-system cache) instead of the JVM, avoiding long pauses during major collections due to large heaps. So far doc values are supported on numeric types and non-analyzed strings (index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values which is the only type to support multi-valued fields. Since the field data API set is a bit wider than the doc values API set, some operations are not supported: - field data filtering: this will fail if doc values are enabled, - field data cache clearing, even for memory-based doc values formats, - getting the memory usage for a specific field, - knowing whether a field is actually multi-valued. This commit also allows for configuring doc-values formats on a per-field basis similarly to postings formats. In particular the doc values format of the _version field can be configured through its own field mapper (it used to be handled in UidFieldMapper previously). Closes #3806	2013-10-09 16:34:30 +02:00
Adrien Grand	97958ed02a	Improved warm-up of new segments. * Merged segments are now warmed-up at the end of the merge operation instead of _refresh, so that _refresh doesn't pay the price for the warm-up of merged segments, which is often higher than flushed segments because of their size. * Even when no _warmer is registered, some basic warm-up of the segments is performed: norms, doc values (_version). This should help a bit people who forget to register warmers. * Eager loading support for the parent id cache and field data: when one can't predict what terms will be present in the index, it is tempting to use a match_all query in a warmer, but in that case, query execution might not be much faster than field data loading so having a warmer that only loads field data without running a query can be useful. Closes #3819	2013-10-08 23:06:55 +02:00
Lee Hinman	ba40aa374e	Uniquify anchor links to fix asciidoc/docbook generation	2013-09-30 15:32:00 -06:00
Lee Hinman	0442b737be	Add more anchor links to documentation Related to #3679	2013-09-30 13:13:16 -06:00
Clinton Gormley	422eed7985	[Docs] Added an added[0.90.4] flag to the disk based allocator	2013-09-16 15:57:07 +02:00
Lee Hinman	7d52d58747	Add AllocationDecider that takes free disk space into account This commit adds two main pieces, the first is a ClusterInfoService that provides a service running on the master nodes that fetches the total/free bytes for each data node in the cluster as well as the sizes of all shards in the cluster. This information is gathered by default every 30 seconds, and can be changed dynamically by setting the `cluster.info.update.interval` setting. This ClusterInfoService can hopefully be used in the future to weight nodes for allocation based on their disk usage, if desired. The second main piece is the DiskThresholdDecider, which can disallow a shard from being allocated to a node, or from remaining on the node depending on configuration parameters. There are three main configuration parameters for the DiskThresholdDecider: `cluster.routing.allocation.disk.threshold_enabled` controls whether the decider is enabled. It defaults to false (disabled). Note that the decider is also disabled for clusters with only a single data node. `cluster.routing.allocation.disk.watermark.low` controls the low watermark for disk usage. It defaults to 0.70, meaning ES will not allocate new shards to nodes once they have more than 70% disk used. It can also be set to an absolute byte value (like 500mb) to prevent ES from allocating shards if less than the configured amount of space is available. `cluster.routing.allocation.disk.watermark.high` controls the high watermark. It defaults to 0.85, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 85%. It can also be set to an absolute byte value (similar to the low watermark) to relocate shards once less than the configured amount of space is available on the node. Closes #3480	2013-09-09 09:49:30 -06:00
Clinton Gormley	8257aba166	[DOCS] Fixed fielddata regex syntax	2013-09-04 23:20:56 +02:00
Clinton Gormley	393c28bee4	[DOCS] Removed outdated new/deprecated version notices	2013-09-03 21:28:31 +02:00
Clinton Gormley	822043347e	Migrated documentation into the main repo	2013-08-29 01:24:34 +02:00

9 Commits