Merge pull request #13199 from mikemccand/remove_merge_docs

Move expert segment merge settings documentation off site into javadocs.
This commit is contained in:
Michael McCandless 2015-08-31 09:52:19 -04:00
commit a49217949f
6 changed files with 114 additions and 124 deletions

View File

@ -28,6 +28,92 @@ import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.index.settings.IndexSettingsService;
/**
* A shard in elasticsearch is a Lucene index, and a Lucene index is broken
* down into segments. Segments are internal storage elements in the index
* where the index data is stored, and are immutable up to delete markers.
* Segments are, periodically, merged into larger segments to keep the
* index size at bay and expunge deletes.
*
* <p>
* Merges select segments of approximately equal size, subject to an allowed
* number of segments per tier. The merge policy is able to merge
* non-adjacent segments, and separates how many segments are merged at once from how many
* segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
*
* <p>
* All merge policy settings are <b>dynamic</b> and can be updated on a live index.
* The merge policy has the following settings:
*
* <ul>
* <li><code>index.merge.policy.expunge_deletes_allowed</code>:
*
* When expungeDeletes is called, we only merge away a segment if its delete
* percentage is over this threshold. Default is <code>10</code>.
*
* <li><code>index.merge.policy.floor_segment</code>:
*
* Segments smaller than this are "rounded up" to this size, i.e. treated as
* equal (floor) size for merge selection. This is to prevent frequent
* flushing of tiny segments, thus preventing a long tail in the index. Default
* is <code>2mb</code>.
*
* <li><code>index.merge.policy.max_merge_at_once</code>:
*
* Maximum number of segments to be merged at a time during "normal" merging.
* Default is <code>10</code>.
*
* <li><code>index.merge.policy.max_merge_at_once_explicit</code>:
*
* Maximum number of segments to be merged at a time, during optimize or
* expungeDeletes. Default is <code>30</code>.
*
* <li><code>index.merge.policy.max_merged_segment</code>:
*
* Maximum sized segment to produce during normal merging (not explicit
* optimize). This setting is approximate: the estimate of the merged segment
* size is made by summing sizes of to-be-merged segments (compensating for
* percent deleted docs). Default is <code>5gb</code>.
*
* <li><code>index.merge.policy.segments_per_tier</code>:
*
* Sets the allowed number of segments per tier. Smaller values mean more
* merging but fewer segments. Default is <code>10</code>. Note, this value needs to be
* >= than the <code>max_merge_at_once</code> otherwise you'll force too many merges to
* occur.
*
* <li><code>index.merge.policy.reclaim_deletes_weight</code>:
*
* Controls how aggressively merges that reclaim more deletions are favored.
* Higher values favor selecting merges that reclaim deletions. A value of
* <code>0.0</code> means deletions don't impact merge selection. Defaults to <code>2.0</code>.
* </ul>
*
* <p>
* For normal merging, the policy first computes a "budget" of how many
* segments are allowed to be in the index. If the index is over-budget,
* then the policy sorts segments by decreasing size (proportionally considering percent
* deletes), and then finds the least-cost merge. Merge cost is measured by
* a combination of the "skew" of the merge (size of largest seg divided by
* smallest seg), total merge size and pct deletes reclaimed, so that
* merges with lower skew, smaller size and those reclaiming more deletes,
* are favored.
*
* <p>
* If a merge will produce a segment that's larger than
* <code>max_merged_segment</code> then the policy will merge fewer segments (down to
* 1 at once, if that one has deletions) to keep the segment size under
* budget.
*
* <p>
* Note, this can mean that for large shards that holds many gigabytes of
* data, the default of <code>max_merged_segment</code> (<code>5gb</code>) can cause for many
* segments to be in an index, and causing searches to be slower. Use the
* indices segments API to see the segments that an index has, and
* possibly either increase the <code>max_merged_segment</code> or issue an optimize
* call for the index (try and aim to issue it on a low traffic time).
*/
public final class MergePolicyConfig implements IndexSettingsService.Listener{
private final TieredMergePolicy mergePolicy = new TieredMergePolicy();
private final ESLogger logger;
@ -187,4 +273,4 @@ public final class MergePolicyConfig implements IndexSettingsService.Listener{
return Double.toString(ratio);
}
}
}
}

View File

@ -24,7 +24,30 @@ import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.util.concurrent.EsExecutors;
/**
*
* The merge scheduler (<code>ConcurrentMergeScheduler</code>) controls the execution of
* merge operations once they are needed (according to the merge policy). Merges
* run in separate threads, and when the maximum number of threads is reached,
* further merges will wait until a merge thread becomes available.
*
* <p>The merge scheduler supports the following <b>dynamic</b> settings:
*
* <ul>
* <li> <code>index.merge.scheduler.max_thread_count</code>:
*
* The maximum number of threads that may be merging at once. Defaults to
* <code>Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))</code>
* which works well for a good solid-state-disk (SSD). If your index is on
* spinning platter drives instead, decrease this to 1.
*
* <li><code>index.merge.scheduler.auto_throttle</code>:
*
* If this is true (the default), then the merge scheduler will rate-limit IO
* (writes) for merges to an adaptive value depending on how many merges are
* requested over time. An application with a low indexing rate that
* unluckily suddenly requires a large merge will see that merge aggressively
* throttled, while an application doing heavy indexing will see the throttle
* move higher to allow merges to keep up with ongoing indexing.
* </ul>
*/
public final class MergeSchedulerConfig {

View File

@ -152,10 +152,6 @@ Other index settings are available in index modules:
Enable or disable dynamic mapping for an index.
<<index-modules-merge,Merging>>::
Control over how shards are merged by the background merge process.
<<index-modules-similarity,Similarities>>::
Configure custom similarity settings to customize how search results are
@ -181,8 +177,6 @@ include::index-modules/allocation.asciidoc[]
include::index-modules/mapper.asciidoc[]
include::index-modules/merge.asciidoc[]
include::index-modules/similarity.asciidoc[]
include::index-modules/slowlog.asciidoc[]

View File

@ -1,113 +0,0 @@
[[index-modules-merge]]
== Merge
experimental[All of the settings exposed in the `merge` module are expert only and may be removed in the future]
A shard in elasticsearch is a Lucene index, and a Lucene index is broken
down into segments. Segments are internal storage elements in the index
where the index data is stored, and are immutable up to delete markers.
Segments are, periodically, merged into larger segments to keep the
index size at bay and expunge deletes.
Merges segments of approximately equal size, subject to an allowed
number of segments per tier. The merge policy is able to merge
non-adjacent segments, and separates how many segments are merged at once from how many
segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
[float]
[[merge-settings]]
=== Merge policy settings
All merge policy settings are _dynamic_ and can be updated on a live index.
The merge policy has the following settings:
`index.merge.policy.expunge_deletes_allowed`::
When expungeDeletes is called, we only merge away a segment if its delete
percentage is over this threshold. Default is `10`.
`index.merge.policy.floor_segment`::
Segments smaller than this are "rounded up" to this size, i.e. treated as
equal (floor) size for merge selection. This is to prevent frequent
flushing of tiny segments, thus preventing a long tail in the index. Default
is `2mb`.
`index.merge.policy.max_merge_at_once`::
Maximum number of segments to be merged at a time during "normal" merging.
Default is `10`.
`index.merge.policy.max_merge_at_once_explicit`::
Maximum number of segments to be merged at a time, during optimize or
expungeDeletes. Default is `30`.
`index.merge.policy.max_merged_segment`::
Maximum sized segment to produce during normal merging (not explicit
optimize). This setting is approximate: the estimate of the merged segment
size is made by summing sizes of to-be-merged segments (compensating for
percent deleted docs). Default is `5gb`.
`index.merge.policy.segments_per_tier`::
Sets the allowed number of segments per tier. Smaller values mean more
merging but fewer segments. Default is `10`. Note, this value needs to be
>= than the `max_merge_at_once` otherwise you'll force too many merges to
occur.
`index.merge.policy.reclaim_deletes_weight`::
Controls how aggressively merges that reclaim more deletions are favored.
Higher values favor selecting merges that reclaim deletions. A value of
`0.0` means deletions don't impact merge selection. Defaults to `2.0`.
For normal merging, the policy first computes a "budget" of how many
segments are allowed to be in the index. If the index is over-budget,
then the policy sorts segments by decreasing size (proportionally considering percent
deletes), and then finds the least-cost merge. Merge cost is measured by
a combination of the "skew" of the merge (size of largest seg divided by
smallest seg), total merge size and pct deletes reclaimed, so that
merges with lower skew, smaller size and those reclaiming more deletes,
are favored.
If a merge will produce a segment that's larger than
`max_merged_segment` then the policy will merge fewer segments (down to
1 at once, if that one has deletions) to keep the segment size under
budget.
Note, this can mean that for large shards that holds many gigabytes of
data, the default of `max_merged_segment` (`5gb`) can cause for many
segments to be in an index, and causing searches to be slower. Use the
indices segments API to see the segments that an index has, and
possibly either increase the `max_merged_segment` or issue an optimize
call for the index (try and aim to issue it on a low traffic time).
[float]
[[merge-scheduling]]
=== Merge scheduling
The merge scheduler (ConcurrentMergeScheduler) controls the execution of
merge operations once they are needed (according to the merge policy). Merges
run in separate threads, and when the maximum number of threads is reached,
further merges will wait until a merge thread becomes available.
The merge scheduler supports the following _dynamic_ settings:
`index.merge.scheduler.max_thread_count`::
The maximum number of threads that may be merging at once. Defaults to
`Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
which works well for a good solid-state-disk (SSD). If your index is on
spinning platter drives instead, decrease this to 1.
`index.merge.scheduler.auto_throttle`::
If this is true (the default), then the merge scheduler will rate-limit IO
(writes) for merges to an adaptive value depending on how many merges are
requested over time. An application with a low indexing rate that
unluckily suddenly requires a large merge will see that merge aggressively
throttled, while an application doing heavy indexing will see the throttle
move higher to allow merges to keep up with ongoing indexing.

View File

@ -115,7 +115,7 @@ process all data -- it just needs to be long enough to process the previous
batch of results. Each `scroll` request (with the `scroll` parameter) sets a
new expiry time.
Normally, the <<index-modules-merge,background merge process>> optimizes the
Normally, the background merge process optimizes the
index by merging together smaller segments to create new bigger segments, at
which time the smaller segments are deleted. This process continues during
scrolling, but an open search context prevents the old segments from being

View File

@ -136,8 +136,8 @@ payloads or weights. This form does still work inside of multi fields.
NOTE: The suggest data structure might not reflect deletes on
documents immediately. You may need to do an <<indices-optimize>> for that.
You can call optimize with the `only_expunge_deletes=true` to only cater for deletes
or alternatively call a <<index-modules-merge>> operation.
You can call optimize with the `only_expunge_deletes=true` to only target
deletions for merging.
[[querying]]
==== Querying