our idea is to apply it on the "filtered/constant" level, and not on compound filters, so we won't apply it multiple times. The solution is conservative a bit now, we can further optimize it in the future, for example, not to wrap it when no caching is done within the filter chain
- remove the DocSet abstraction, and use Bits where we can by getting it from DocIdSet
- better handling of acceptDocs, though still need to properly apply them when caching is involved
This feature adds the option to configure a `PostingsFormat` and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs.
## Configuring a postingsformat per field
There're several default postings formats configured by default which can be used in your mapping:
a* `direct` - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance.
* `memory` - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist.
* `bloom_default` - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the `default` postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist.
* `bloom_pulsing` - Similar to the `bloom_default` postings format, but builds on top of the `pulsing` postings format.
* `default` - The default postings format. The default if none is specified.
On all fields it possible to configure a `postings_format` attribute. Example mapping:
```
{
"person" : {
"properties" : {
"second_person_id" : {"type" : "string", "postings_format" : "pulsing"}
}
}
}
```
## Configuring a custom postingsformat
It is possible the instantiate custom postingsformats. This can be specified via the index settings.
```
{
"codec" : {
"postings_format" : {
"my_format" : {
"type" : "pulsing40"
"freq_cut_off" : "5"
}
}
}
}
```
In the above example the `freq_cut_off` is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary.
Closes#2411
no need to test for boost, we already have specific boost tests, in general, we should get rid of this test, and use more specialized tests if we are missing some
This commit enables setting boost for numeric fields. However, there is still no way to take advantage of boosted numeric fields during searching because all queries against numeric fields are translated into range queries wrapped in ConstantScore. Boost for numeric fields is broken on master as well https://gist.github.com/7ecedea4f6a5219efb89
The issue was that under these circumstances the delete by query operation would run forever.
What also is fixed is that during shard recovery when delete by query is replayed nested docs
are also deleted. Closes#2302
The NPE occurred when for an arbitrary segment no parent documents exist for a has_parent filter/query and no child documents exist for a has_child filter/query.
Closes#2297
introduce a new class, TransportRequest, which includes headers. This class can be used when sending requests over the transport layer, and ActionRequest also extends it now.
This is the first phase of the refactoring part in the transport layer and action layer to allow for simpler implementations of those as well as simpler "filtering" capabilities in the future
The types exists api checks whether one or more types exists in one or more indices.
## Example usage
curl -XHEAD 'localhost:9200/twitter/tweet'
## Options
* `index` - One or more indices. Either specified as query string parameter or in the uri path.
* `type` - One or more types. Either specified as query string parameter or in the uri path.
* `ignore_missing` - Determines what type of indices to exclude from a request. The option can have the following values: `none` or `missing`.
Closes#2273
If has_parent, has_child or top_children are executed incorrectly then a better exception is thrown. This gives a better error description when one of these queries or filters is being used in count api.
Closes#2261
When setting cluster.routing.allocation.disable_allocation, it causes new indices primary shards to not be allocated. By default, new indices created should allow to, at the very least, allocate primary shards so they become operations. A new setting, cluster.routing.allocation.disable_new_allocation, allows to also disable "new" allocations.
closes#2258.
Here is a short example of how a simple reroute API call:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [
{"move" : {"index" : "test", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}},
{"allocate" : {"index" : "test", "shard" : 1, "node" : "node3"}}
]
}'
An importnat aspect to remember is the fact that once when an allocation occurs, the cluster will aim at rebalancing its state back to an even state. For example, if the allocation includes moving a shard from `node1` to `node2`, in an "even" state, then another shard will be moved from `node2` to `node1` to even things out.
The cluster can be set to disable allocations, which means that only the explicitl allocations will be performed. Obviously, only once all commands has been applied, the cluster will aim to be rebalance its state.
Anohter option is to run the commands in "dry_run" (as a URI flag, or in the request body). This will cause the commands to apply to the current cluster state, and reutrn the resulting cluster after the comamnds (and rebalancing) has been applied.
The commands supporterd are:
* `move`: Move a started shard from one node to anotehr node. Accepts `index` and `shard` for index name and shard number, `from_node` for the node to move the shard "from", and `to_node` for the node to move the shard to.
* `cancel`: Cancel allocation of a shard (or recovery). Accepts `index` and `shard` for index name and shar number, and `node` for the node to cancel the shard allocation on.
* `allocate`: Allocate an unassigned shard to a node. Accepts the `index` and `shard` for index name and shard number, and `node` to allocate the shard to. It also accepts `allow_primary` flag to explciitly specify that it is allowed to explciitly allocate a primary shard (might result in data loss).
closes#2256
The `has_parent` filter accepts a query and a parent type. The query is executed in the parent document space, which is specified by the parent type. This filter return child documents which associated parents have matched. For the rest `has_parent` filter has the same options and works in the same manner as the `has_child` filter.
This is an experimental filter.
Filter example
###################
```
{
"has_parent" : {
"parent_type" : "blog"
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
```
The `parent_type` field name can also be abbreviated to `type`.
Memory considerations
###############
With the current implementation, all _id values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it.
This issue originates from issue #792
add support for internal custom allocation commands, including allocation, move, and cancel (shard).
also, fix#2242, which causes the cluster state to be in inconsistent state when a shard being the source of relocation is failed
* Fixed an issue where dynamic update to minimum_master_nodes settings would not take immediate effect
* Added LocalNodeMasterListener support to the ClusterService. Enables listening to when the local node becomes/stopped being a master
By introducing the Text abstraction, we can keep (long) text fields in their UTF8 bytes format, and no need to convert them to a string when serializing it back to Json for example.
The first place we can apply this is to highlighted text, which can be long.. . This does breaks backward comp. for people using the Java API where the HighlightField now has a Text as its content, and not String.
First phase at improving buffer management and reducing even further buffer copies. Introduce a BytesReference abstraction, allowing to more easily slice and "read/write references" from streams. This is the foundation for later using it to create smarter buffers on top of composite netty channels for example (which http now produces) as well as reducing buffer copies when sending transport/rest responses.
Allow the use of "doc" as the update source when a script is not
specified. New fields are added, existing fields are overwritten, and
maps are merged recursively.
When a node restarts, it might be canceling one recovery of a shard id only to get another one in the next cycle. We should detect this case and handle it properly.
This is a fix to the annoying message seen by users: suspect illegal state: trying to move shard from primary mode to replica mode.
Two main changes:
Improve cluster resiliency to disconnected sub clusters. If a node pings a master and that node is no longer registered with the master, improve the rejoin process of that node to the cluster. Also, if a master receives a message from another master, pick one to force to rejoin the cluster (based on cluster state versioning).
On quick rolling restart, without waiting for shard allocation, the shard allocation logic can mess up its counts, causing for strange logic in allocating shards, or validation failures on routing table allocation.
Compressing the stored fields file (the .fdt file) directly allows to have better compression on the size of the index, specifically when indexing (and storing) small documents. The compression will be considerably more effective compared to compressing each doc on its own (when setting compress on the _source mapper). The downside is that more data needs to be uncompressed when loading documents.
The settings to control it is `index.store.compress.stored_fields` set to `true` (it defaults to `false`), and can be enabled dynamically using the update settings API. This allows to enabled compression at a later stage (i.e. old time based indices), and then optimize the index to make sure it gets compressed.