Commit Graph

4017 Commits

Author SHA1 Message Date
Mayya Sharipova c6b73239ae
Limit the number of tokens produced by _analyze (#27529)
Add an index level setting `index.analyze.max_token_count` to control
the number of generated tokens in the  _analyze endpoint.
Defaults to 10000.

Throw an error if the number of generated tokens exceeds this limit.

Closes #27038
2017-11-30 11:54:39 -05:00
olcbean d25c9671de Deprecate `jarowinkler` in favor of `jaro_winkler` (#27526)
Jaro and Winkler are two people, so we should use the same naming convention as for Damerau–Levenshtein.
2017-11-30 12:49:34 +00:00
Philipp Krenn 64ca0fe9bb Update docs regarding SHA-512 checksums
This commit updates the docs for the new SHA-512 checksums that are
supported for official plugins.

Relates #27524
2017-11-29 21:29:06 -05:00
Jason Tedor 6655689b15 Move DNS cache docs to system configuration docs
When these docs were moved they should have been moved to the system
configuration docs. This commit does that, and also fixes a missing
heading that broke the docs build.
2017-11-29 19:57:26 -05:00
Jason Tedor ff3c19ed13
Move DNS cache settings to important configuration
This commit moves the DNS cache settings for the JVM to the important
settings section of the docs.

Relates #27592
2017-11-29 18:02:26 -05:00
Martijn van Groningen dbf17152d1
docs: use `doc_value_fields` fields as alternative for nested inner hits _source fetching
instead of stored fields as doc values are more likely to be enabled by default
2017-11-29 17:31:39 +01:00
Clinton Gormley 65e602c2be Update index-modules.asciidoc
Docs: Clarified `blocks.write` vs `blocks.read_only`
2017-11-29 13:05:12 +01:00
Christoph Büscher 0d11b9fe34
[Docs] Unify spelling of Elasticsearch (#27567)
Removes occurences of "elasticsearch" or "ElasticSearch" in favour of
"Elasticsearch" where appropriate.
2017-11-29 09:44:25 +01:00
Kanako Nakai 23f85fe6d4 Fix max number of threads bootstrap docs
Previously the bootstrap check for max number of threads was increased
from 2048 to 4096 yet the docs were never adjusted for this change. This
commit addresses this so the docs are in-line with the limit enforced in
the bootstrap check.

Relates #27511
2017-11-28 22:19:04 -05:00
Martijn van Groningen cb1204774b
Include the _index, _type and _id to nested search hits in the top_hits and inner_hits response.
Also include _type and _id for parent/child hits inside inner hits.

In the case of top_hits aggregation the nested search hits are
directly returned and are not grouped by a root or parent document, so
it is important to include the _id and _index attributes in order to know
to what documents these nested search hits belong to.

Closes #27053
2017-11-28 14:05:29 +01:00
David Turner a165d1df40
Minor improvements to docs for numeric types (#27553)
* Caps
* Fix awkward wording that took multiple passes to parse
* Floating point _number_
* Something more descriptive about the `scaled_float` scaling factor.
2017-11-28 11:36:07 +00:00
Jason Tedor d8c28044da
Forbid granting the all permission in production
Running with the all permission java.security.AllPermission granted is
equivalent to disabling the security manager. This commit adds a
bootstrap check that forbids running with this permission granted.

Relates #27548
2017-11-27 16:05:27 -05:00
Simon Willnauer f23ed6188d
Skip shard refreshes if shard is `search idle` (#27500)
Today we refresh automatically in the background by default very second.
This default behavior has a significant impact on indexing performance
if the refreshes are not needed.
This change introduces a notion of a shard being `search idle` which a
shard transitions to after (default) `30s` without any access to an
external searcher. Once a shard is search idle all scheduled refreshes
will be skipped unless there are any refresh listeners registered.
If a search happens on a `serach idle` shard the search request _park_
on a refresh listener and will be executed once the next scheduled refresh
occurs. This will also turn the shard into the `non-idle` state immediately.

This behavior is only applied if there is no explicit refresh interval set.
2017-11-27 18:16:10 +01:00
lcawley af971b3081 [DOCS] Fixed broken link in breaking changes 2017-11-24 09:16:14 -08:00
kel 4885acb048 Replace `delimited_payload_filter` by `delimited_payload` (#26625)
The `delimited_payload_filter` is renamed to `delimited_payload`, the old name is 
deprecated and should be replaced by `delimited_payload`.

Closes #21978
2017-11-24 13:03:19 +01:00
Nhat Nguyen 46b508d6c9
Add wait_for_no_initializing_shards to cluster health API (#27489)
This adds a new option to the cluster health request allowing to wait
until there is no initializing shards.

Closes #25623
2017-11-23 15:09:58 -05:00
Clinton Gormley d1b1d711df Update composite-aggregation.asciidoc
Fixed asciidoc typo
2017-11-23 15:05:14 +01:00
olcbean fd564b10db Deprecate `levenstein` in favor of `levenshtein` (#27409)
Support both spellings thoughout 6.x, reporting the incorrect one as deprecated.
2017-11-23 12:53:47 +00:00
Simon Willnauer fadbe0de08
Automatically prepare indices for splitting (#27451)
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.

This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.

NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.

This is a 7.0 only change.
2017-11-23 09:48:54 +01:00
Mayya Sharipova 57e4d10007
Limit the number of nested documents (#27405)
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.

Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.

Closes #26962
2017-11-22 10:16:28 -05:00
Takumasa Ochi eed8d1aee5 [DOC] Fix mathematical representation on interval (range) (#27450) 2017-11-21 17:06:26 +00:00
Luca Cavanna 29450de7b5
Cross Cluster Search: make remote clusters optional (#27182)
Today Cross Cluster Search requires at least one node in each remote cluster to be up once the cross cluster search is run. Otherwise the whole search request fails despite some of the data (either local and/or remote) is available. This happens when performing the _search/shards calls to find out which remote shards the query has to be executed on. This scenario is different from shard failures that may happen later on when the query is actually executed, in case e.g. remote shards are missing, which is not going to fail the whole request but rather yield partial results, and the _shards section in the response will indicate that.

This commit introduces a boolean setting per cluster called search.remote.$cluster_alias.skip_if_disconnected, set to false by default, which allows to skip certain clusters if they are down when trying to reach them through a cross cluster search requests. By default all clusters are mandatory.

Scroll requests support such setting too when they are first initiated (first search request with scroll parameter), but subsequent scroll rounds (_search/scroll endpoint) will fail if some of the remote clusters went down meanwhile.

The search API response contains now a new _clusters section, similar to the _shards section, that gets returned whenever one or more clusters were disconnected and got skipped:

"_clusters" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 1
}
Such section won't be part of the response if no clusters have been skipped.

The per cluster skip_unavailable setting value has also been added to the output of the remote/info API.
2017-11-21 11:41:47 +01:00
Ulrich Reffle dd0bb580b0 [Docs] Fix broken bulleted lists (#27470) 2017-11-21 11:10:35 +01:00
Jim Ferenczi d1093bd2fa #26800: Fix docs rendering 2017-11-20 08:41:02 +01:00
Michael Basnight 2949c53174
Remove config prompting for secrets and text (#27216)
This commit removes the ability to use ${prompt.secret} and
${prompt.text} as valid config settings. Secure settings has obsoleted
the need for this, and it cleans up some of the code in Bootstrap.
2017-11-19 22:33:17 -06:00
K. Daniel Newton 365dda8748 Correct usage of "an" to "a" in getting started docs
This commit corrects a word usage error in the getting started
docs. Since pronunciation is what determines when to use either "a" or
"an" and the word "ubiquitous" is pronounced /yo͞oˈbikwədəs/, it should
be preceded by "a."

Relates #27420
2017-11-18 07:36:43 -05:00
Mayya Sharipova 858b2c7cb8
Standardize underscore requirements in parameters (#27414)
Stardardize underscore requirements in parameters across different type of
requests:
_index, _type, _source, _id keep their underscores
params like version and retry_on_conflict will be without underscores
Throw an error if older versions of parameters are used

BulkRequest, MultiGetRequest, TermVectorcRequest, MoreLikeThisQuery
were changed

Closes #26886
2017-11-17 15:31:52 -05:00
Simon Willnauer a5df2ef538 peanut butter hamburgers 2017-11-17 20:51:39 +01:00
Jim Ferenczi 53462f6499
Make fields optional in multi_match query and rely on index.query.default_field by default (#27380)
* Make fields optional in multi_match query and rely on index.query.default_field by default

This commit adds the ability to send `multi_match` query without providing any `fields`.
When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field`
(which in turns defaults to `*`).
The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies
the heuristic to `multi_match` queries.
Relying on `index.query.default_field` rather than `*` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all
text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using
`multi_match`, `query_string` or `simple_query_string` do not throw an exception.
2017-11-17 10:25:21 +01:00
Jim Ferenczi 623367d793
Add composite aggregator (#26800)
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
  * A `terms` source, values are extracted from a field or a script.
  * A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:

````
"test_agg": {
  "terms": {
    "field": "field1"
  },
  "aggs": {
    "nested_test_agg":
      "terms": {
        "field": "field2"
      }
  }
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:

````
"composite_agg": {
  "composite": {
    "sources": [
      {
	"field1": {
          "terms": {
              "field": "field1"
            }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
        }
      },
    }
  }
````

The response of the aggregation looks like this:

````
"aggregations": {
  "composite_agg": {
    "buckets": [
      {
        "key": {
          "field1": "alabama",
          "field2": "almanach"
        },
        "doc_count": 100
      },
      {
        "key": {
          "field1": "alabama",
          "field2": "calendar"
        },
        "doc_count": 1
      },
      {
        "key": {
          "field1": "arizona",
          "field2": "calendar"
        },
        "doc_count": 1
      }
    ]
  }
}
````

By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:

````
"composite_agg": {
  "composite": {
    "after": {"field1": "alabama", "field2": "calendar"},
    "size": 100,
    "sources": [
      {
	"field1": {
          "terms": {
            "field": "field1"
          }
        }
      },
      {
	"field2": {
          "terms": {
            "field": "field2"
          }
	}
      }
    }
  }
````

This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:

````
"settings": {
  "index.sort.field": ["field1", "field2"]
}
````

In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
2017-11-16 15:13:36 +01:00
Jim Ferenczi bf72858ce8
[Docs] Restore section about multi-level parent/child relation in parent-join (#27392)
This section was removed to hide this ability to new users.
This change restores the section and adds a warning regarding the expected performance.

Closes #27336
2017-11-16 11:29:16 +01:00
Russ Cam c42899b27e
Docs/windows installer (#27369)
* Add additional command line parameters along with important note for INSTALLDIR when upgrading
* Update windows installer images
2017-11-15 21:35:54 +11:00
Alexander Reelsen 66b5a43d0e
Logging: Unify log rotation for index/search slow log (#27298)
The existing log rotation configuration allowed the index
and search slow log to grow unbounded. This commit removes the
date based rotation and adds the same size based rotation, that
the depreciation log already has.
2017-11-15 10:01:32 +01:00
Tal Levy 5c34533761
add json-processor support for non-map json types (#27335)
The Json Processor originally only supported parsing field values into Maps even
though the JSON spec specifies that strings, null-values, numbers, booleans, and arrays
are also valid JSON types. This commit enables parsing these values now.

response to #25972.
2017-11-13 10:28:19 -08:00
Alexander Reelsen 08037eebff
Tests: Improve size regex in documentation test (#26879)
The regex has been changed to not only be able to deal with something
like `260b`, but also support `1.1kb`.
2017-11-13 10:21:53 +01:00
lcawley 3ed558d718 [DOCS] Fixed link to docker content 2017-11-10 12:10:28 -08:00
Lisa Cawley 9f43d7329b
[DOCS] Move X-Pack-specific Docker content (#27333) 2017-11-10 09:38:32 -08:00
Jim Ferenczi 29331f1127
Fail queries with scroll that explicitely set request_cache (#27342)
Queries that create a scroll context cannot use the cache.
They modify the search context during their execution so using the cache
can lead to duplicate result for the next scroll query.

This change fails the entire request if the request_cache option is explictely set
on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never
use the cache for these queries when the option is not explicitely used.
For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint
will be ignored (forced to false).
2017-11-10 16:02:06 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Nicholas Knize 06ff92d237 Add ignore_malformed to geo_shape fields
This commit adds ignore_malformed support to geo_shape field types to skip malformed geoJson fields.

closes #23747
2017-11-09 17:59:05 -06:00
Dimitris Athanasiou 66bef26495
Aggregations: bucket_sort pipeline aggregation (#27152)
This commit adds a parent pipeline aggregation that allows
sorting the buckets of a parent multi-bucket aggregation.

The aggregation also offers [from] and [size] parameters
in order to truncate the result as desired.

Closes #14928
2017-11-09 17:59:57 +00:00
Tal Levy d22fd4ea58
Introduce templating support to timezone/locale in DateProcessor (#27089)
Sometimes systems like Beats would want to extract the date's timezone and/or locale
from a value in a field of the document. This PR adds support for mustache templating
to extract these values.

Closes #24024.
2017-11-09 09:45:32 -08:00
Mayya Sharipova abbe853f1e
Add limits for ngram and shingle settings (#27211) (#27318)
Relates to #25887
2017-11-08 10:12:57 -05:00
Mayya Sharipova 148376c2c5
Add limits for ngram and shingle settings (#27211)
* Add limits for ngram and shingle settings (#27211)

Create index-level settings:
max_ngram_diff - maximum allowed difference between max_gram and min_gram in
NGramTokenFilter/NGramTokenizer. Default is 1.
max_shingle_diff - maximum allowed difference between max_shingle_size and
 min_shingle_size in ShingleTokenFilter.  Default is 3.

Throw an IllegalArgumentException when
trying to create NGramTokenFilter, NGramTokenizer, ShingleTokenFilter
where difference between max_size and min_size exceeds the settings value.

Closes #25887
2017-11-07 08:14:55 -05:00
Zachary Tong 6e9e07d6f8
Fix profiling naming issues (#27133)
Some code-paths use anonymous classes (such as NonCollectingAggregator
in terms agg), which messes up the display name of the profiler.  If
we encounter an anonymous class, we need to grab the super's name.

Another naming issue was that ProfileAggs were not delegating to the
wrapped agg's name for toString(), leading to ugly display.

This PR also fixes up the profile documentation.  Some of the examples were
executing against empty indices, which shows different profile results
than a populated index (and made for confusing examples).

Finally, I switched the agg display names from the fully qualified name
to the simple name, so that it's similar to how the query profiles work.

Closes #26405
2017-11-06 16:37:33 -05:00
Shubham Aggarwal 5a925cd40c Fixed references to Multi Index Syntax (#27283) 2017-11-06 19:15:36 +01:00
Boris Tyukin 8e9b30417c Update to support bulk updates by query (#27172)
Getting started doc stated that bulk updates by query are not supported but they are now
2017-11-06 17:32:20 +01:00
Boaz Leskes a8ff4960f3 add split index reference in indices.asciidoc
Relates to #26931
2017-11-06 12:55:41 +01:00
Simon Willnauer bd7efa908a Add ability to split shards (#26931)
This change adds a new `_split` API that allows to split indices into a new
index with a power of two more shards that the source index.  This API works
alongside the `_shrink` API but doesn't require any shard relocation before
indices can be split.

The split operation is conceptually an inverse `_shrink` operation since we
initialize the index with a _syntetic_ number of routing shards that are used
for the consistent hashing at index time. Compared to indices created with
earlier versions this might produce slightly different shard distributions but
has no impact on the per-index backwards compatibility.  For now, the user is
required to prepare an index to be splittable by setting the
`index.number_of_routing_shards` at index creation time.  The setting allows the
user to prepare the index to be splittable in factors of
`index.number_of_routing_shards` ie. if the index is created with
`index.number_of_routing_shards: 16` and `index.number_of_shards: 2` it can be
split into `4, 8, 16` shards. This is an intermediate step until we can make
this the default. This also allows us to safely backport this change to 6.x.

The `_split` operation is implemented internally as a DeleteByQuery on the
lucene level that is executed while the primary shards execute their initial
recovery. Subsequent merges that are triggered due to this operation will not be
executed immediately. All merges will be deferred unti the shards are started
and will then be throttled accordingly.

This change is intended for the 6.1 feature release but will not support pre-6.1
indices to be split unless these indices have been shrunk before. In that case
these indices can be split backwards into their original number of shards.
2017-11-06 11:37:55 +01:00
Pablo Musa 7b03d68f9f [Docs] Fix minor paragraph indentation error for multiple Indices params (#25535) 2017-11-06 10:20:20 +01:00