Using files that must be specified on each node is an anti-pattern
from the API based goal of ES. This change removes the ability
to specify the default mapping with a file on each node.
closes#10620
In order to safely complete recoveries / relocations we have to keep all operation done since the recovery start at available for replay. At the moment we do so by preventing the engine from flushing and thus making sure that the operations are kept in the translog. A side effect of this is that the translog keeps on growing until the recovery is done. This is not a problem as we do need these operations but if the another recovery starts concurrently it may have an unneededly long translog to replay. Also, if we shutdown the engine for some reason at this point (like when a node is restarted) we have to recover a long translog when we come back.
To void this, the translog is changed to be based on multiple files instead of a single one. This allows recoveries to keep hold to the files they need while allowing the engine to flush and do a lucene commit (which will create a new translog files bellow the hood).
Change highlights:
- Refactor Translog file management to allow for multiple files.
- Translog maintains a list of referenced files, both by outstanding recoveries and files containing operations not yet committed to Lucene.
- A new Translog.View concept is introduced, allowing recoveries to get a reference to all currently uncommitted translog files plus all future translog files created until the view is closed. They can use this view to iterate over operations.
- Recovery phase3 is removed. That phase was replaying operations while preventing new writes to the engine. This is unneeded as standard indexing also send all operations from the start of the recovery to the recovering shard. Replay all ops in the view acquired in recovery start is enough to guarantee no operation is lost.
- IndexShard now creates the translog together with the engine. The translog is closed by the engine on close. ShadowIndexShards do not open the translog.
- Moved the ownership of translog fsyncing to the translog it self, changing the responsible setting to `index.translog.sync_interval` (was `index.gateway.local.sync`)
Closes#10624
* Removed the docs for `index.compound_format` and `index.compound_on_flush` - these are expert settings which should probably be removed (see https://github.com/elastic/elasticsearch/issues/10778)
* Removed the docs for `index.index_concurrency` - another expert setting
* Labelled the segments verbose output as experimental
* Marked the `compression`, `precision_threshold` and `rehash` options as experimental in the cardinality and percentile aggs
* Improved the experimental text on `significant_terms`, `execution_hint` in the terms agg, and `terminate_after` param on count and search
* Removed the experimental flag on the `geobounds` agg
* Marked the settings in the `merge` and `store` modules as experimental, rather than the modules themselves
Closes#10782
This commit brings the benefits of the `count` search type to search requests
that have a `size` of 0:
- a single round-trip to shards (no fetch phase)
- ability to use the query cache
Since `count` now provides no benefits over `query_then_fetch`, it has been
deprecated.
Close#7630
We now have a very useful annotation to mark features or parameters as
experimental. Let's use it! This commit replaces some custom text warnings with
this annotation and adds this annotation to some existing features/parameters:
- inner_hits (unreleased yet)
- terminate_after (released in 1.4)
- per-bucket doc count errors in the terms agg (released in 1.4)
I also tagged with this annotation settings which should either be not needed
(like the ability to evict entries from the filter cache based on time) or that
are too deep into the way that Elasticsearch works like the Directory
implementation or merge settings.
Close#9563
This particular note was about fielddata filtering but could cause confusion
that fields that have doc values enabled cannot be used for filtering (as in
a `filtered_query`).
When using the DiskThresholdDecider, it's possible that shards could
already be marked as relocating to the node being evaluated. This commit
adds a new setting `cluster.routing.allocation.disk.include_relocations`
which adds the size of the shards currently being relocated to this node
to the node's used disk space.
This new option defaults to `true`, however it's possible to
over-estimate the usage for a node if the relocation is already
partially complete, for instance:
A node with a 10gb shard that's 45% of the way through a relocation
would add 10gb + (.45 * 10) = 14.5gb to the node's disk usage before
examining the watermarks to see if a new shard can be allocated.
Fixes#7753
Relates to #6168
This documentation was dangerous because it felt like it was possible to gain
substantial performance by just switching the codec of the index.
However, non-default codecs are dangerous to use since they are not supported
in terms of backward compatibility, and most improvements that they bring have
been folded into the default codec anyway (for example, the default codec
"pulses" postings lists that contain a single document).
Adds a breaker for request BigArrays, which are used for parent/child
queries as well as some aggregations. Certain operations like Netty HTTP
responses and transport responses increment the breaker, but will not
trip.
This also changes the output of the nodes' stats endpoint to show the
parent breaker as well as the fielddata and request breakers.
There are a number of new settings for breakers now:
`indices.breaker.total.limit`: starting limit for all memory-use breaker,
defaults to 70%
`indices.breaker.fielddata.limit`: starting limit for fielddata breaker,
defaults to 60%
`indices.breaker.fielddata.overhead`: overhead for fielddata breaker
estimations, defaults to 1.03
(the fielddata breaker settings also use the backwards-compatible
setting `indices.fielddata.breaker.limit` and
`indices.fielddata.breaker.overhead`)
`indices.breaker.request.limit`: starting limit for request breaker,
defaults to 40%
`indices.breaker.request.overhead`: request breaker estimation overhead,
defaults to 1.0
The breaker service infrastructure is now generic and opens the path to
adding additional circuit breakers in the future.
Fixes#6129
Conflicts:
src/main/java/org/elasticsearch/index/fielddata/IndexFieldData.java
src/main/java/org/elasticsearch/index/fielddata/IndexFieldDataService.java
src/main/java/org/elasticsearch/index/fielddata/RamAccountingTermsEnum.java
src/main/java/org/elasticsearch/index/fielddata/ordinals/GlobalOrdinalsBuilder.java
src/main/java/org/elasticsearch/index/fielddata/ordinals/InternalGlobalOrdinalsBuilder.java
src/main/java/org/elasticsearch/index/fielddata/plain/AbstractIndexOrdinalsFieldData.java
src/main/java/org/elasticsearch/index/fielddata/plain/DisabledIndexFieldData.java
src/main/java/org/elasticsearch/index/fielddata/plain/IndexIndexFieldData.java
src/main/java/org/elasticsearch/index/fielddata/plain/NonEstimatingEstimator.java
src/main/java/org/elasticsearch/index/fielddata/plain/PackedArrayIndexFieldData.java
src/main/java/org/elasticsearch/index/fielddata/plain/ParentChildIndexFieldData.java
src/main/java/org/elasticsearch/index/fielddata/plain/SortedSetDVOrdinalsIndexFieldData.java
src/main/java/org/elasticsearch/node/internal/InternalNode.java
src/test/java/org/elasticsearch/index/aliases/IndexAliasesServiceTests.java
src/test/java/org/elasticsearch/index/codec/CodecTests.java
src/test/java/org/elasticsearch/index/fielddata/AbstractFieldDataTests.java
src/test/java/org/elasticsearch/index/fielddata/IndexFieldDataServiceTests.java
src/test/java/org/elasticsearch/index/mapper/MapperTestUtils.java
src/test/java/org/elasticsearch/index/query/IndexQueryParserFilterCachingTests.java
src/test/java/org/elasticsearch/index/query/SimpleIndexQueryParserTests.java
src/test/java/org/elasticsearch/index/query/guice/IndexQueryParserModuleTests.java
src/test/java/org/elasticsearch/index/search/FieldDataTermsFilterTests.java
src/test/java/org/elasticsearch/index/search/child/ChildrenConstantScoreQueryTests.java
src/test/java/org/elasticsearch/index/similarity/SimilarityTests.java
This change just changes the default for index.codec.bloom.load to
false: with recent performance improvements to ID lookup, such as
#6298, bloom filters don't give much of a performance gain anymore,
and they can consume non-trivial RAM when there are many tiny
documents.
For now, we still index the bloom filters, so if a given app wants
them back, it can just update the index.codec.bloom.load to true.
Closes#6959
`mmapfs` is really good for random access but can have sideeffects if
memory maps are large depending on the operating system etc. A hybrid
solution where only selected files are actually memory mapped but others
mostly consumed sequentially brings the best of both worlds and
minimizes the memory map impact.
This commit mmaps only the `dvd` and `tim` file for fast random access
on docvalues and term dictionaries.
Closes#6636