Commit Graph

159 Commits

Author SHA1 Message Date
Adrien Grand 4fa8f6f61f Doc values integration.
This commit allows for using Lucene doc values as a backend for field data,
moving the cost of building field data from the refresh operation to indexing.
In addition, Lucene doc values can be stored on disk (partially, or even
entirely), so that memory management is done at the operating system level
(file-system cache) instead of the JVM, avoiding long pauses during major
collections due to large heaps.

So far doc values are supported on numeric types and non-analyzed strings
(index:no or index:not_analyzed). Under the hood, it uses SORTED_SET doc values
which is the only type to support multi-valued fields. Since the field data API
set is a bit wider than the doc values API set, some operations are not
supported:
 - field data filtering: this will fail if doc values are enabled,
 - field data cache clearing, even for memory-based doc values formats,
 - getting the memory usage for a specific field,
 - knowing whether a field is actually multi-valued.

This commit also allows for configuring doc-values formats on a per-field basis
similarly to postings formats. In particular the doc values format of the
_version field can be configured through its own field mapper (it used to be
handled in UidFieldMapper previously).

Closes #3806
2013-10-09 16:34:30 +02:00
Adrien Grand 97958ed02a Improved warm-up of new segments.
* Merged segments are now warmed-up at the end of the merge operation instead
  of _refresh, so that _refresh doesn't pay the price for the warm-up of merged
  segments, which is often higher than flushed segments because of their size.
* Even when no _warmer is registered, some basic warm-up of the segments is
  performed: norms, doc values (_version). This should help a bit people who
  forget to register warmers.
* Eager loading support for the parent id cache and field data: when one
  can't predict what terms will be present in the index, it is tempting to use
  a match_all query in a warmer, but in that case, query execution might not be
  much faster than field data loading so having a warmer that only loads field
  data without running a query can be useful.

Closes #3819
2013-10-08 23:06:55 +02:00
Lee Hinman ba40aa374e Uniquify anchor links to fix asciidoc/docbook generation 2013-09-30 15:32:00 -06:00
Lee Hinman 0442b737be Add more anchor links to documentation
Related to #3679
2013-09-30 13:13:16 -06:00
Clinton Gormley 422eed7985 [Docs] Added an added[0.90.4] flag to the disk based allocator 2013-09-16 15:57:07 +02:00
Lee Hinman 7d52d58747 Add AllocationDecider that takes free disk space into account
This commit adds two main pieces, the first is a ClusterInfoService
that provides a service running on the master nodes that fetches the
total/free bytes for each data node in the cluster as well as the
sizes of all shards in the cluster. This information is gathered by
default every 30 seconds, and can be changed dynamically by setting
the `cluster.info.update.interval` setting. This ClusterInfoService
can hopefully be used in the future to weight nodes for allocation
based on their disk usage, if desired.

The second main piece is the DiskThresholdDecider, which can disallow
a shard from being allocated to a node, or from remaining on the node
depending on configuration parameters. There are three main
configuration parameters for the DiskThresholdDecider:

`cluster.routing.allocation.disk.threshold_enabled` controls whether
the decider is enabled. It defaults to false (disabled). Note that the
decider is also disabled for clusters with only a single data node.

`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.

`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.

Closes #3480
2013-09-09 09:49:30 -06:00
Clinton Gormley 8257aba166 [DOCS] Fixed fielddata regex syntax 2013-09-04 23:20:56 +02:00
Clinton Gormley 393c28bee4 [DOCS] Removed outdated new/deprecated version notices 2013-09-03 21:28:31 +02:00
Clinton Gormley 822043347e Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00