Merge pull request #13199 from mikemccand/remove_merge_docs
Move expert segment merge settings documentation off site into javadocs.
This commit is contained in:
commit
a49217949f
|
@ -28,6 +28,92 @@ import org.elasticsearch.common.unit.ByteSizeUnit;
|
|||
import org.elasticsearch.common.unit.ByteSizeValue;
|
||||
import org.elasticsearch.index.settings.IndexSettingsService;
|
||||
|
||||
/**
|
||||
* A shard in elasticsearch is a Lucene index, and a Lucene index is broken
|
||||
* down into segments. Segments are internal storage elements in the index
|
||||
* where the index data is stored, and are immutable up to delete markers.
|
||||
* Segments are, periodically, merged into larger segments to keep the
|
||||
* index size at bay and expunge deletes.
|
||||
*
|
||||
* <p>
|
||||
* Merges select segments of approximately equal size, subject to an allowed
|
||||
* number of segments per tier. The merge policy is able to merge
|
||||
* non-adjacent segments, and separates how many segments are merged at once from how many
|
||||
* segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
|
||||
*
|
||||
* <p>
|
||||
* All merge policy settings are <b>dynamic</b> and can be updated on a live index.
|
||||
* The merge policy has the following settings:
|
||||
*
|
||||
* <ul>
|
||||
* <li><code>index.merge.policy.expunge_deletes_allowed</code>:
|
||||
*
|
||||
* When expungeDeletes is called, we only merge away a segment if its delete
|
||||
* percentage is over this threshold. Default is <code>10</code>.
|
||||
*
|
||||
* <li><code>index.merge.policy.floor_segment</code>:
|
||||
*
|
||||
* Segments smaller than this are "rounded up" to this size, i.e. treated as
|
||||
* equal (floor) size for merge selection. This is to prevent frequent
|
||||
* flushing of tiny segments, thus preventing a long tail in the index. Default
|
||||
* is <code>2mb</code>.
|
||||
*
|
||||
* <li><code>index.merge.policy.max_merge_at_once</code>:
|
||||
*
|
||||
* Maximum number of segments to be merged at a time during "normal" merging.
|
||||
* Default is <code>10</code>.
|
||||
*
|
||||
* <li><code>index.merge.policy.max_merge_at_once_explicit</code>:
|
||||
*
|
||||
* Maximum number of segments to be merged at a time, during optimize or
|
||||
* expungeDeletes. Default is <code>30</code>.
|
||||
*
|
||||
* <li><code>index.merge.policy.max_merged_segment</code>:
|
||||
*
|
||||
* Maximum sized segment to produce during normal merging (not explicit
|
||||
* optimize). This setting is approximate: the estimate of the merged segment
|
||||
* size is made by summing sizes of to-be-merged segments (compensating for
|
||||
* percent deleted docs). Default is <code>5gb</code>.
|
||||
*
|
||||
* <li><code>index.merge.policy.segments_per_tier</code>:
|
||||
*
|
||||
* Sets the allowed number of segments per tier. Smaller values mean more
|
||||
* merging but fewer segments. Default is <code>10</code>. Note, this value needs to be
|
||||
* >= than the <code>max_merge_at_once</code> otherwise you'll force too many merges to
|
||||
* occur.
|
||||
*
|
||||
* <li><code>index.merge.policy.reclaim_deletes_weight</code>:
|
||||
*
|
||||
* Controls how aggressively merges that reclaim more deletions are favored.
|
||||
* Higher values favor selecting merges that reclaim deletions. A value of
|
||||
* <code>0.0</code> means deletions don't impact merge selection. Defaults to <code>2.0</code>.
|
||||
* </ul>
|
||||
*
|
||||
* <p>
|
||||
* For normal merging, the policy first computes a "budget" of how many
|
||||
* segments are allowed to be in the index. If the index is over-budget,
|
||||
* then the policy sorts segments by decreasing size (proportionally considering percent
|
||||
* deletes), and then finds the least-cost merge. Merge cost is measured by
|
||||
* a combination of the "skew" of the merge (size of largest seg divided by
|
||||
* smallest seg), total merge size and pct deletes reclaimed, so that
|
||||
* merges with lower skew, smaller size and those reclaiming more deletes,
|
||||
* are favored.
|
||||
*
|
||||
* <p>
|
||||
* If a merge will produce a segment that's larger than
|
||||
* <code>max_merged_segment</code> then the policy will merge fewer segments (down to
|
||||
* 1 at once, if that one has deletions) to keep the segment size under
|
||||
* budget.
|
||||
*
|
||||
* <p>
|
||||
* Note, this can mean that for large shards that holds many gigabytes of
|
||||
* data, the default of <code>max_merged_segment</code> (<code>5gb</code>) can cause for many
|
||||
* segments to be in an index, and causing searches to be slower. Use the
|
||||
* indices segments API to see the segments that an index has, and
|
||||
* possibly either increase the <code>max_merged_segment</code> or issue an optimize
|
||||
* call for the index (try and aim to issue it on a low traffic time).
|
||||
*/
|
||||
|
||||
public final class MergePolicyConfig implements IndexSettingsService.Listener{
|
||||
private final TieredMergePolicy mergePolicy = new TieredMergePolicy();
|
||||
private final ESLogger logger;
|
||||
|
|
|
@ -24,7 +24,30 @@ import org.elasticsearch.common.settings.Settings;
|
|||
import org.elasticsearch.common.util.concurrent.EsExecutors;
|
||||
|
||||
/**
|
||||
* The merge scheduler (<code>ConcurrentMergeScheduler</code>) controls the execution of
|
||||
* merge operations once they are needed (according to the merge policy). Merges
|
||||
* run in separate threads, and when the maximum number of threads is reached,
|
||||
* further merges will wait until a merge thread becomes available.
|
||||
*
|
||||
* <p>The merge scheduler supports the following <b>dynamic</b> settings:
|
||||
*
|
||||
* <ul>
|
||||
* <li> <code>index.merge.scheduler.max_thread_count</code>:
|
||||
*
|
||||
* The maximum number of threads that may be merging at once. Defaults to
|
||||
* <code>Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))</code>
|
||||
* which works well for a good solid-state-disk (SSD). If your index is on
|
||||
* spinning platter drives instead, decrease this to 1.
|
||||
*
|
||||
* <li><code>index.merge.scheduler.auto_throttle</code>:
|
||||
*
|
||||
* If this is true (the default), then the merge scheduler will rate-limit IO
|
||||
* (writes) for merges to an adaptive value depending on how many merges are
|
||||
* requested over time. An application with a low indexing rate that
|
||||
* unluckily suddenly requires a large merge will see that merge aggressively
|
||||
* throttled, while an application doing heavy indexing will see the throttle
|
||||
* move higher to allow merges to keep up with ongoing indexing.
|
||||
* </ul>
|
||||
*/
|
||||
public final class MergeSchedulerConfig {
|
||||
|
||||
|
|
|
@ -152,10 +152,6 @@ Other index settings are available in index modules:
|
|||
|
||||
Enable or disable dynamic mapping for an index.
|
||||
|
||||
<<index-modules-merge,Merging>>::
|
||||
|
||||
Control over how shards are merged by the background merge process.
|
||||
|
||||
<<index-modules-similarity,Similarities>>::
|
||||
|
||||
Configure custom similarity settings to customize how search results are
|
||||
|
@ -181,8 +177,6 @@ include::index-modules/allocation.asciidoc[]
|
|||
|
||||
include::index-modules/mapper.asciidoc[]
|
||||
|
||||
include::index-modules/merge.asciidoc[]
|
||||
|
||||
include::index-modules/similarity.asciidoc[]
|
||||
|
||||
include::index-modules/slowlog.asciidoc[]
|
||||
|
|
|
@ -1,113 +0,0 @@
|
|||
[[index-modules-merge]]
|
||||
== Merge
|
||||
|
||||
experimental[All of the settings exposed in the `merge` module are expert only and may be removed in the future]
|
||||
|
||||
A shard in elasticsearch is a Lucene index, and a Lucene index is broken
|
||||
down into segments. Segments are internal storage elements in the index
|
||||
where the index data is stored, and are immutable up to delete markers.
|
||||
Segments are, periodically, merged into larger segments to keep the
|
||||
index size at bay and expunge deletes.
|
||||
|
||||
Merges segments of approximately equal size, subject to an allowed
|
||||
number of segments per tier. The merge policy is able to merge
|
||||
non-adjacent segments, and separates how many segments are merged at once from how many
|
||||
segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
|
||||
|
||||
[float]
|
||||
[[merge-settings]]
|
||||
=== Merge policy settings
|
||||
|
||||
All merge policy settings are _dynamic_ and can be updated on a live index.
|
||||
The merge policy has the following settings:
|
||||
|
||||
`index.merge.policy.expunge_deletes_allowed`::
|
||||
|
||||
When expungeDeletes is called, we only merge away a segment if its delete
|
||||
percentage is over this threshold. Default is `10`.
|
||||
|
||||
`index.merge.policy.floor_segment`::
|
||||
|
||||
Segments smaller than this are "rounded up" to this size, i.e. treated as
|
||||
equal (floor) size for merge selection. This is to prevent frequent
|
||||
flushing of tiny segments, thus preventing a long tail in the index. Default
|
||||
is `2mb`.
|
||||
|
||||
`index.merge.policy.max_merge_at_once`::
|
||||
|
||||
Maximum number of segments to be merged at a time during "normal" merging.
|
||||
Default is `10`.
|
||||
|
||||
`index.merge.policy.max_merge_at_once_explicit`::
|
||||
|
||||
Maximum number of segments to be merged at a time, during optimize or
|
||||
expungeDeletes. Default is `30`.
|
||||
|
||||
`index.merge.policy.max_merged_segment`::
|
||||
|
||||
Maximum sized segment to produce during normal merging (not explicit
|
||||
optimize). This setting is approximate: the estimate of the merged segment
|
||||
size is made by summing sizes of to-be-merged segments (compensating for
|
||||
percent deleted docs). Default is `5gb`.
|
||||
|
||||
`index.merge.policy.segments_per_tier`::
|
||||
|
||||
Sets the allowed number of segments per tier. Smaller values mean more
|
||||
merging but fewer segments. Default is `10`. Note, this value needs to be
|
||||
>= than the `max_merge_at_once` otherwise you'll force too many merges to
|
||||
occur.
|
||||
|
||||
`index.merge.policy.reclaim_deletes_weight`::
|
||||
|
||||
Controls how aggressively merges that reclaim more deletions are favored.
|
||||
Higher values favor selecting merges that reclaim deletions. A value of
|
||||
`0.0` means deletions don't impact merge selection. Defaults to `2.0`.
|
||||
|
||||
For normal merging, the policy first computes a "budget" of how many
|
||||
segments are allowed to be in the index. If the index is over-budget,
|
||||
then the policy sorts segments by decreasing size (proportionally considering percent
|
||||
deletes), and then finds the least-cost merge. Merge cost is measured by
|
||||
a combination of the "skew" of the merge (size of largest seg divided by
|
||||
smallest seg), total merge size and pct deletes reclaimed, so that
|
||||
merges with lower skew, smaller size and those reclaiming more deletes,
|
||||
are favored.
|
||||
|
||||
If a merge will produce a segment that's larger than
|
||||
`max_merged_segment` then the policy will merge fewer segments (down to
|
||||
1 at once, if that one has deletions) to keep the segment size under
|
||||
budget.
|
||||
|
||||
Note, this can mean that for large shards that holds many gigabytes of
|
||||
data, the default of `max_merged_segment` (`5gb`) can cause for many
|
||||
segments to be in an index, and causing searches to be slower. Use the
|
||||
indices segments API to see the segments that an index has, and
|
||||
possibly either increase the `max_merged_segment` or issue an optimize
|
||||
call for the index (try and aim to issue it on a low traffic time).
|
||||
|
||||
[float]
|
||||
[[merge-scheduling]]
|
||||
=== Merge scheduling
|
||||
|
||||
The merge scheduler (ConcurrentMergeScheduler) controls the execution of
|
||||
merge operations once they are needed (according to the merge policy). Merges
|
||||
run in separate threads, and when the maximum number of threads is reached,
|
||||
further merges will wait until a merge thread becomes available.
|
||||
|
||||
The merge scheduler supports the following _dynamic_ settings:
|
||||
|
||||
`index.merge.scheduler.max_thread_count`::
|
||||
|
||||
The maximum number of threads that may be merging at once. Defaults to
|
||||
`Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
|
||||
which works well for a good solid-state-disk (SSD). If your index is on
|
||||
spinning platter drives instead, decrease this to 1.
|
||||
|
||||
`index.merge.scheduler.auto_throttle`::
|
||||
|
||||
If this is true (the default), then the merge scheduler will rate-limit IO
|
||||
(writes) for merges to an adaptive value depending on how many merges are
|
||||
requested over time. An application with a low indexing rate that
|
||||
unluckily suddenly requires a large merge will see that merge aggressively
|
||||
throttled, while an application doing heavy indexing will see the throttle
|
||||
move higher to allow merges to keep up with ongoing indexing.
|
||||
|
|
@ -115,7 +115,7 @@ process all data -- it just needs to be long enough to process the previous
|
|||
batch of results. Each `scroll` request (with the `scroll` parameter) sets a
|
||||
new expiry time.
|
||||
|
||||
Normally, the <<index-modules-merge,background merge process>> optimizes the
|
||||
Normally, the background merge process optimizes the
|
||||
index by merging together smaller segments to create new bigger segments, at
|
||||
which time the smaller segments are deleted. This process continues during
|
||||
scrolling, but an open search context prevents the old segments from being
|
||||
|
|
|
@ -136,8 +136,8 @@ payloads or weights. This form does still work inside of multi fields.
|
|||
|
||||
NOTE: The suggest data structure might not reflect deletes on
|
||||
documents immediately. You may need to do an <<indices-optimize>> for that.
|
||||
You can call optimize with the `only_expunge_deletes=true` to only cater for deletes
|
||||
or alternatively call a <<index-modules-merge>> operation.
|
||||
You can call optimize with the `only_expunge_deletes=true` to only target
|
||||
deletions for merging.
|
||||
|
||||
[[querying]]
|
||||
==== Querying
|
||||
|
|
Loading…
Reference in New Issue