2013-08-28 19:24:34 -04:00
|
|
|
[[index-modules-merge]]
|
|
|
|
== Merge
|
|
|
|
|
|
|
|
A shard in elasticsearch is a Lucene index, and a Lucene index is broken
|
|
|
|
down into segments. Segments are internal storage elements in the index
|
|
|
|
where the index data is stored, and are immutable up to delete markers.
|
|
|
|
Segments are, periodically, merged into larger segments to keep the
|
|
|
|
index size at bay and expunge deletes.
|
|
|
|
|
2014-01-19 15:29:08 -05:00
|
|
|
The more segments one has in the Lucene index means slower searches and
|
|
|
|
more memory used. Segment merging is used to reduce the number of segments,
|
|
|
|
however merges can be expensive to perform, especially on low IO environments.
|
|
|
|
Merges can be throttled using <<store-throttling,store level throttling>>.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
|
|
|
|
[float]
|
2013-09-25 12:17:40 -04:00
|
|
|
[[policy]]
|
2013-08-28 19:24:34 -04:00
|
|
|
=== Policy
|
|
|
|
|
|
|
|
The index merge policy module allows one to control which segments of a
|
|
|
|
shard index are to be merged. There are several types of policies with
|
|
|
|
the default set to `tiered`.
|
|
|
|
|
|
|
|
[float]
|
2013-09-25 12:17:40 -04:00
|
|
|
[[tiered]]
|
2013-08-28 19:24:34 -04:00
|
|
|
==== tiered
|
|
|
|
|
|
|
|
Merges segments of approximately equal size, subject to an allowed
|
|
|
|
number of segments per tier. This is similar to `log_bytes_size` merge
|
|
|
|
policy, except this merge policy is able to merge non-adjacent segment,
|
|
|
|
and separates how many segments are merged at once from how many
|
|
|
|
segments are allowed per tier. This merge policy also does not
|
|
|
|
over-merge (i.e., cascade merges).
|
|
|
|
|
|
|
|
This policy has the following settings:
|
|
|
|
|
2013-10-15 07:30:56 -04:00
|
|
|
`index.merge.policy.expunge_deletes_allowed`::
|
|
|
|
|
|
|
|
When expungeDeletes is called, we only merge away a segment if its delete
|
|
|
|
percentage is over this threshold. Default is `10`.
|
|
|
|
|
|
|
|
`index.merge.policy.floor_segment`::
|
|
|
|
|
|
|
|
Segments smaller than this are "rounded up" to this size, i.e. treated as
|
|
|
|
equal (floor) size for merge selection. This is to prevent frequent
|
2014-03-07 08:21:45 -05:00
|
|
|
flushing of tiny segments, thus preventing a long tail in the index. Default
|
2013-10-15 07:30:56 -04:00
|
|
|
is `2mb`.
|
|
|
|
|
|
|
|
`index.merge.policy.max_merge_at_once`::
|
|
|
|
|
|
|
|
Maximum number of segments to be merged at a time during "normal" merging.
|
|
|
|
Default is `10`.
|
|
|
|
|
|
|
|
`index.merge.policy.max_merge_at_once_explicit`::
|
|
|
|
|
|
|
|
Maximum number of segments to be merged at a time, during optimize or
|
|
|
|
expungeDeletes. Default is `30`.
|
|
|
|
|
|
|
|
`index.merge.policy.max_merged_segment`::
|
|
|
|
|
|
|
|
Maximum sized segment to produce during normal merging (not explicit
|
|
|
|
optimize). This setting is approximate: the estimate of the merged segment
|
|
|
|
size is made by summing sizes of to-be-merged segments (compensating for
|
|
|
|
percent deleted docs). Default is `5gb`.
|
|
|
|
|
|
|
|
`index.merge.policy.segments_per_tier`::
|
|
|
|
|
|
|
|
Sets the allowed number of segments per tier. Smaller values mean more
|
|
|
|
merging but fewer segments. Default is `10`. Note, this value needs to be
|
2014-03-07 08:21:45 -05:00
|
|
|
>= than the `max_merge_at_once` otherwise you'll force too many merges to
|
2013-10-15 07:30:56 -04:00
|
|
|
occur.
|
|
|
|
|
|
|
|
`index.reclaim_deletes_weight`::
|
|
|
|
|
|
|
|
Controls how aggressively merges that reclaim more deletions are favored.
|
|
|
|
Higher values favor selecting merges that reclaim deletions. A value of
|
|
|
|
`0.0` means deletions don't impact merge selection. Defaults to `2.0`.
|
|
|
|
|
|
|
|
`index.compound_format`::
|
|
|
|
|
|
|
|
Should the index be stored in compound format or not. Defaults to `false`.
|
|
|
|
See <<index-compound-format,`index.compound_format`>> in
|
|
|
|
<<index-modules-settings>>.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
For normal merging, this policy first computes a "budget" of how many
|
2014-03-07 08:21:45 -05:00
|
|
|
segments are allowed to be in the index. If the index is over-budget,
|
|
|
|
then the policy sorts segments by decreasing size (proportionally considering percent
|
2013-08-28 19:24:34 -04:00
|
|
|
deletes), and then finds the least-cost merge. Merge cost is measured by
|
|
|
|
a combination of the "skew" of the merge (size of largest seg divided by
|
|
|
|
smallest seg), total merge size and pct deletes reclaimed, so that
|
|
|
|
merges with lower skew, smaller size and those reclaiming more deletes,
|
|
|
|
are favored.
|
|
|
|
|
|
|
|
If a merge will produce a segment that's larger than
|
|
|
|
`max_merged_segment` then the policy will merge fewer segments (down to
|
|
|
|
1 at once, if that one has deletions) to keep the segment size under
|
|
|
|
budget.
|
|
|
|
|
|
|
|
Note, this can mean that for large shards that holds many gigabytes of
|
|
|
|
data, the default of `max_merged_segment` (`5gb`) can cause for many
|
|
|
|
segments to be in an index, and causing searches to be slower. Use the
|
2014-03-07 08:21:45 -05:00
|
|
|
indices segments API to see the segments that an index has, and
|
2013-08-28 19:24:34 -04:00
|
|
|
possibly either increase the `max_merged_segment` or issue an optimize
|
|
|
|
call for the index (try and aim to issue it on a low traffic time).
|
|
|
|
|
|
|
|
[float]
|
2013-09-25 12:17:40 -04:00
|
|
|
[[log-byte-size]]
|
2013-08-28 19:24:34 -04:00
|
|
|
==== log_byte_size
|
|
|
|
|
|
|
|
A merge policy that merges segments into levels of exponentially
|
|
|
|
increasing *byte size*, where each level has fewer segments than the
|
|
|
|
value of the merge factor. Whenever extra segments (beyond the merge
|
|
|
|
factor upper bound) are encountered, all segments within the level are
|
|
|
|
merged.
|
|
|
|
|
|
|
|
This policy has the following settings:
|
|
|
|
|
|
|
|
[cols="<,<",options="header",]
|
|
|
|
|=======================================================================
|
|
|
|
|Setting |Description
|
|
|
|
|index.merge.policy.merge_factor |Determines how often segment indices
|
|
|
|
are merged by index operation. With smaller values, less RAM is used
|
|
|
|
while indexing, and searches on unoptimized indices are faster, but
|
|
|
|
indexing speed is slower. With larger values, more RAM is used during
|
|
|
|
indexing, and while searches on unoptimized indices are slower, indexing
|
|
|
|
is faster. Thus larger values (greater than 10) are best for batch index
|
|
|
|
creation, and smaller values (lower than 10) for indices that are
|
|
|
|
interactively maintained. Defaults to `10`.
|
|
|
|
|
|
|
|
|index.merge.policy.min_merge_size |A size setting type which sets the
|
|
|
|
minimum size for the lowest level segments. Any segments below this size
|
|
|
|
are considered to be on the same level (even if they vary drastically in
|
|
|
|
size) and will be merged whenever there are mergeFactor of them. This
|
|
|
|
effectively truncates the "long tail" of small segments that would
|
|
|
|
otherwise be created into a single level. If you set this too large, it
|
|
|
|
could greatly increase the merging cost during indexing (if you flush
|
|
|
|
many small segments). Defaults to `1.6mb`
|
|
|
|
|
|
|
|
|index.merge.policy.max_merge_size |A size setting type which sets the
|
|
|
|
largest segment (measured by total byte size of the segment's files)
|
|
|
|
that may be merged with other segments. Defaults to unbounded.
|
|
|
|
|
|
|
|
|index.merge.policy.max_merge_docs |Determines the largest segment
|
|
|
|
(measured by document count) that may be merged with other segments.
|
|
|
|
Defaults to unbounded.
|
|
|
|
|=======================================================================
|
|
|
|
|
|
|
|
[float]
|
2013-09-25 12:17:40 -04:00
|
|
|
[[log-doc]]
|
2013-08-28 19:24:34 -04:00
|
|
|
==== log_doc
|
|
|
|
|
|
|
|
A merge policy that tries to merge segments into levels of exponentially
|
|
|
|
increasing *document count*, where each level has fewer segments than
|
|
|
|
the value of the merge factor. Whenever extra segments (beyond the merge
|
|
|
|
factor upper bound) are encountered, all segments within the level are
|
|
|
|
merged.
|
|
|
|
|
|
|
|
[cols="<,<",options="header",]
|
|
|
|
|=======================================================================
|
|
|
|
|Setting |Description
|
|
|
|
|index.merge.policy.merge_factor |Determines how often segment indices
|
|
|
|
are merged by index operation. With smaller values, less RAM is used
|
|
|
|
while indexing, and searches on unoptimized indices are faster, but
|
|
|
|
indexing speed is slower. With larger values, more RAM is used during
|
|
|
|
indexing, and while searches on unoptimized indices are slower, indexing
|
|
|
|
is faster. Thus larger values (greater than 10) are best for batch index
|
|
|
|
creation, and smaller values (lower than 10) for indices that are
|
|
|
|
interactively maintained. Defaults to `10`.
|
|
|
|
|
|
|
|
|index.merge.policy.min_merge_docs |Sets the minimum size for the lowest
|
|
|
|
level segments. Any segments below this size are considered to be on the
|
|
|
|
same level (even if they vary drastically in size) and will be merged
|
|
|
|
whenever there are mergeFactor of them. This effectively truncates the
|
|
|
|
"long tail" of small segments that would otherwise be created into a
|
|
|
|
single level. If you set this too large, it could greatly increase the
|
|
|
|
merging cost during indexing (if you flush many small segments).
|
|
|
|
Defaults to `1000`.
|
|
|
|
|
|
|
|
|index.merge.policy.max_merge_docs |Determines the largest segment
|
|
|
|
(measured by document count) that may be merged with other segments.
|
|
|
|
Defaults to unbounded.
|
|
|
|
|=======================================================================
|
|
|
|
|
|
|
|
[float]
|
2013-09-25 12:17:40 -04:00
|
|
|
[[scheduling]]
|
2013-08-28 19:24:34 -04:00
|
|
|
=== Scheduling
|
|
|
|
|
|
|
|
The merge schedule controls the execution of merge operations once they
|
|
|
|
are needed (according to the merge policy). The following types are
|
Move to use serial merge schedule by default
Today, we use ConcurrentMergeScheduler, and this can be painful since it is concurrent on a shard level, with a max of 3 threads doing concurrent merges. If there are several shards being indexed, then there will be a minor explosion of threads trying to do merges, all being throttled by our merge throttling.
Moving to serial merge scheduler will still maintain concurrency of merges across shards, as we have the merge thread pool that schedules those merges. It will just be a serial one on a specific shard.
Also, on serial merge scheduler, we now have a limit of how many merges it will do at one go, so it will let other shards get their fair chance of merging. We use the pending merges on IW to check if merges are needed or not for it.
Note, that if a merge is happening, it will not block due to a sync on the maybeMerge call at indexing (flush) time, since we wrap our merge scheduler with the EnabledMergeScheduler, where maybeMerge is not activated during indexing, only with explicit calls to IW#maybeMerge (see Merges).
closes #5447
2014-03-17 13:22:51 -04:00
|
|
|
supported, with the default being the `SerialMergeScheduler`.
|
|
|
|
|
|
|
|
Note, the default is the serial merge scheduler since there is a merge
|
|
|
|
thread pool that explicitly schedules merges, and it makes sure that
|
|
|
|
merges are serial within a shard, yet concurrent across multiple shards.
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
[float]
|
|
|
|
==== ConcurrentMergeScheduler
|
|
|
|
|
2014-03-07 08:21:45 -05:00
|
|
|
A merge scheduler that runs merges using a separate thread. When the maximum
|
|
|
|
number of threads is reached, further merges will wait until a merge thread
|
|
|
|
becomes available.
|
|
|
|
|
2013-08-28 19:24:34 -04:00
|
|
|
|
|
|
|
The scheduler supports the following settings:
|
|
|
|
|
2014-03-07 08:21:45 -05:00
|
|
|
`index.merge.scheduler.max_thread_count`::
|
|
|
|
|
|
|
|
The maximum number of threads to perform the merge operation. Defaults to
|
2013-08-28 19:24:34 -04:00
|
|
|
`Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))`.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
==== SerialMergeScheduler
|
|
|
|
|
|
|
|
A merge scheduler that simply does each merge sequentially using the
|
2014-03-07 08:21:45 -05:00
|
|
|
calling thread (blocking the operations that triggered the merge or the
|
2013-08-28 19:24:34 -04:00
|
|
|
index operation).
|